negnevitsky, pearson education, 2002 1 chapter 6 artificial neural networks: n introduction, or how...

Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 1

Chapter 6 Chapter 6 Artificial neural networks:Artificial neural networks: Introduction, or how the brain worksIntroduction, or how the brain works The neuron as a simple computing elementThe neuron as a simple computing element The perceptronThe perceptron Multilayer neural networksMultilayer neural networks


A A neural networkneural network is a model of reasoning inspired by the is a model of reasoning inspired by the human brain. human brain.

The brain consists of a densely interconnected set of nerve The brain consists of a densely interconnected set of nerve cells, or basic information-processing units, called cells, or basic information-processing units, called neuronsneurons. .

The human brain incorporates nearly 10 billion neurons and The human brain incorporates nearly 10 billion neurons and 60 trillion connections, 60 trillion connections, synapsessynapses, between them. , between them.

By using multiple neurons simultaneously, the brain can By using multiple neurons simultaneously, the brain can perform its functions much faster than the fastest computers in perform its functions much faster than the fastest computers in existence today.existence today.

Each neuron has a very simple structure, but an army of such Each neuron has a very simple structure, but an army of such elements constitutes a tremendous processing power. elements constitutes a tremendous processing power.

A neuron consists of a cell body, A neuron consists of a cell body, somasoma, a number of fibers , a number of fibers called called dendritesdendrites, and a single long fiber called the , and a single long fiber called the axonaxon..

Neural Networks and the BrainNeural Networks and the Brain


Biological neural networkBiological neural network

Soma Soma

Synapse

Synapse

Dendrites

Axon

Synapse

Dendrites

Axon


Architecture of a typical artificial neural networkArchitecture of a typical artificial neural network

Input Layer Output Layer

Middle Layer

I n

p u

t S

i g

n a

l s

O u

t p

u t

S

i g n

a l

s


Analogy between biological and Analogy between biological and artificial neural networksartificial neural networks


The neuron as a simple computing elementThe neuron as a simple computing element

Diagram of a neuronDiagram of a neuron

Neuron Y

Input Signals

x1

x2

xn

Output Signals

Y

Y

Y

w2

w1

wn

Weights


The neuron computes the weighted sum of the input signals The neuron computes the weighted sum of the input signals and compares the result with a and compares the result with a threshold valuethreshold value, , . .

If the net input is less than the threshold, the neuron output If the net input is less than the threshold, the neuron output is –1. is –1.

if the net input is greater than or equal to the threshold, the if the net input is greater than or equal to the threshold, the neuron becomes activated and its output is +1.neuron becomes activated and its output is +1.

The neuron uses the following transfer or The neuron uses the following transfer or activationactivation functionfunction::

This type of activation function is called a This type of activation function is called a sign functionsign function. . (McCulloch and Pitts 1943)(McCulloch and Pitts 1943)

n

iiiwxX

1

X

XY

if ,1

if ,1

A Simple Activation Function – Sign Function


4 Common Activation functions of a neuron4 Common Activation functions of a neuron

S t e p f u n c t io n S ig n f u n c t io n

+ 1

-1

0

+ 1

-1

0X

Y

X

Y

+ 1

-1

0 X

Y

S ig m o id f u n c t io n

+ 1

-1

0 X

Y

L in e a r f u n c t io n

0 if ,0

0 if ,1

X

XY step

0 if ,1

0 if ,1

X

XY sign

Xsigmoid

eY

1

1XY linear

Most Common?


Can a single neuron learn a task?Can a single neuron learn a task?

Start off with earliest/ simplestStart off with earliest/ simplest In 1958, In 1958, Frank RosenblattFrank Rosenblatt introduced a training introduced a training

algorithm that provided the first procedure for algorithm that provided the first procedure for training a simple ANN: a training a simple ANN: a perceptronperceptron. .

The perceptron is the simplest form of a neural The perceptron is the simplest form of a neural network. It consists of a single neuron with network. It consists of a single neuron with adjustableadjustable synaptic weights and a synaptic weights and a hard limiterhard limiter. .


Threshold

Inputs

x1

x2

Output

Y

HardLimiter

w2

w1

LinearCombiner

Single-layer two-input perceptronSingle-layer two-input perceptron


The PerceptronThe Perceptron

The operation of Rosenblatt’s perceptron is based The operation of Rosenblatt’s perceptron is based on the on the McCulloch and Pitts neuron modelMcCulloch and Pitts neuron model. The . The model consists of a linear combiner followed by a model consists of a linear combiner followed by a hard limiter. hard limiter.

The weighted sum of the inputs is applied to the The weighted sum of the inputs is applied to the hard limiter, which produces an output equal to +1 hard limiter, which produces an output equal to +1 if its input is positive and if its input is positive and 1 if it is negative. 1 if it is negative.


The aim of the perceptron is to classify inputs, The aim of the perceptron is to classify inputs,

xx11, , xx22, . . ., , . . ., xxnn, into one of two classes, say , into one of two classes, say

AA11 and and AA22. .

In the case of an elementary perceptron, the n-In the case of an elementary perceptron, the n-dimensional space is divided by a dimensional space is divided by a hyperplanehyperplane into into two decision regions. The hyperplane is defined by two decision regions. The hyperplane is defined by the the linearly separablelinearly separable function function::

See next slideSee next slide

01

n

iiiwx


Linear separability in the perceptronsLinear separability in the perceptrons

x1

x2

Class A2

Class A1

1

2

x1w1 + x2w2 = 0

(a) Two-input perceptron. (b) Three-input perceptron.

x2

x1

x3x1w1 + x2w2 + x3w3 = 0

12

Changing θ shifts the boundary


making small adjustments in the weights making small adjustments in the weights to reduce the difference between the actual and to reduce the difference between the actual and

desired outputs of the perceptron. desired outputs of the perceptron. Learns weights such that Learns weights such that output is consistent with output is consistent with

the training examples.the training examples. The initial weights are randomly assigned, The initial weights are randomly assigned,

usually in the range [usually in the range [0.5, 0.5], 0.5, 0.5],

How does the perceptron learn its classification How does the perceptron learn its classification tasks?tasks?


If at iteration If at iteration pp, the actual output is , the actual output is YY((pp) and the ) and the desired output is desired output is YYd d ((pp), then the error is given by:), then the error is given by:

where where pp = 1, 2, 3, . . . = 1, 2, 3, . . .

Iteration Iteration pp here refers to the here refers to the ppth training example th training example presented to the perceptron.presented to the perceptron.

If the error, If the error, ee((pp), is positive, we need to increase ), is positive, we need to increase perceptron output perceptron output YY((pp), but if it is negative, we ), but if it is negative, we need to decrease need to decrease YY((pp).).

)()()( pYpYpe d


The perceptron learning ruleThe perceptron learning rule

where where pp is iteration # = 1, 2, 3, . . . is iteration # = 1, 2, 3, . . . is the is the learning ratelearning rate, a positive constant less than unity (1)., a positive constant less than unity (1). Intuition:Intuition:

Weight at next iteration is based on an adjustment from the current Weight at next iteration is based on an adjustment from the current weightweight

Adjustment amount is influenced by the amount of the error, the Adjustment amount is influenced by the amount of the error, the size of the input, and the learning ratesize of the input, and the learning rate

Learning rate is a free parameter that must be “Learning rate is a free parameter that must be “tunedtuned”” The perceptron learning rule was first proposed by The perceptron learning rule was first proposed by Rosenblatt Rosenblatt in in

1960. 1960. Using this rule we can derive the perceptron training algorithm for Using this rule we can derive the perceptron training algorithm for

classification tasks.classification tasks.

)()()()1( pepxpwpw iii


Step 1Step 1: Initialisation: InitialisationSet initial weights Set initial weights ww11, , ww22,…, ,…, wwnn and threshold and threshold to to

random numbers in the range [random numbers in the range [0.5, 0.5]. 0.5, 0.5].

(during training, If the error, (during training, If the error, ee((pp), is positive, we ), is positive, we need to increase perceptron output need to increase perceptron output YY((pp), but if it is ), but if it is negative, we need to decrease negative, we need to decrease YY((pp).)).)

Perceptron’s training algorithmPerceptron’s training algorithm


Step 2Step 2: Activation: ActivationActivate the perceptron by applying inputs Activate the perceptron by applying inputs xx11((pp), ),

xx22((pp),…, ),…, xxnn((pp) and desired output ) and desired output YYd d ((pp). ).

Calculate the actual output at iteration Calculate the actual output at iteration pp = 1 = 1

where where nn is the number of the perceptron inputs, is the number of the perceptron inputs, and and stepstep is a step activation function. is a step activation function.

Perceptron’s training algorithm (continued)Perceptron’s training algorithm (continued)

n

iii pwpxsteppY

1

)( )()(


Step 3Step 3: Weight training: Weight trainingUpdate the weights of the perceptronUpdate the weights of the perceptron

where where ΔΔ wwii ( (pp) ) is the weight correction for weight is the weight correction for weight ii

at iteration at iteration pp..

The weight correction is computed by the The weight correction is computed by the delta ruledelta rule::

Step 4Step 4: Iteration: IterationIncrease iteration Increase iteration pp by one, go back to by one, go back to Step 2Step 2 and and repeat the process until convergence.repeat the process until convergence.

)()()1( pwpwpw iii

Perceptron’s training algorithm (continued)Perceptron’s training algorithm (continued)

)()()( pepxpw ii


Example of perceptron learning: the logical operation Example of perceptron learning: the logical operation ANDANDInputs

x1 x2

0011

0101

000

EpochDesiredoutput

Yd

1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w2

0.30.30.20.3

0.1 0.1 0.1 0.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.0 0.0 0.0 0.0

0011

0101

000

3

1

0.20.20.20.1

0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0

0010

00

11

0.20.20.10.2

0.0 0.0 0.0 0.1

0011

0101

000

4

1

0.20.20.20.1

0.1 0.1 0.1 0.1

0011

00

10

0.20.20.10.1

0.1 0.1 0.1 0.1

0011

0101

000

5

1

0.10.10.10.1

0.1 0.1 0.1 0.1

0001

00

0

0.10.10.10.1

0.1 0.1 0.1 0.1

0

Threshold: = 0.2; learning rate: = 0.1


Two-dimensional plots of basic logical operationsTwo-dimensional plots of basic logical operations

x1

x2

1

(a) AND (x1 x2)

1

x1

x2

1

1

(b) OR (x1 x2)

x1

x2

1

1

(c) Exclusive-OR(x1 x2)

00 0

A perceptron can learn the operations A perceptron can learn the operations ANDAND and and OROR, but not , but not Exclusive-ORExclusive-OR. .

Exclusive-OR Exclusive-OR is NOT linearly separableis NOT linearly separable This limitation stalled neural network research for more This limitation stalled neural network research for more

than a decadethan a decade


Multilayer neural networksMultilayer neural networks

A multilayer perceptron is a feedforward neural A multilayer perceptron is a feedforward neural network with one or more hidden layers. network with one or more hidden layers.

The network consists of an The network consists of an input layerinput layer of source of source neurons, at least one middle or neurons, at least one middle or hidden layerhidden layer of of computational neurons, and an computational neurons, and an output layeroutput layer of of computational neurons. computational neurons.

The input signals are propagated in a forward The input signals are propagated in a forward direction on a layer-by-layer basis.direction on a layer-by-layer basis.


Multilayer perceptron with two hidden layersMultilayer perceptron with two hidden layers

Inputlayer

Firsthiddenlayer

Secondhiddenlayer

Outputlayer

O u

t p

u t

S

i g n

a l

s

I n

p u

t S

i g

n a

l s


Hidden LayerHidden Layer

Detects features in the inputs – hidden Detects features in the inputs – hidden patternspatterns

With one hidden layer, can represent any With one hidden layer, can represent any continuous function of the inputscontinuous function of the inputs

With two hidden layers even discontinuous With two hidden layers even discontinuous functions can be representedfunctions can be represented


Back-propagation neural networkBack-propagation neural network

Most popular of 100+ ANN learning algorithmsMost popular of 100+ ANN learning algorithms Learning in a multilayer network proceeds the same Learning in a multilayer network proceeds the same

way as for a perceptron. way as for a perceptron. A training set of input patterns is presented to the A training set of input patterns is presented to the

network. network. The network computes its output pattern, and if there The network computes its output pattern, and if there

is an error is an error or in other words a difference between or in other words a difference between actual and desired output patterns actual and desired output patterns the weights are the weights are adjusted to reduce this error.adjusted to reduce this error.

The difference is in the number of weights and The difference is in the number of weights and architecture …architecture …


In a back-propagation neural network, the learning algorithm has In a back-propagation neural network, the learning algorithm has two phases: two phases: a training input pattern is presented to the network input a training input pattern is presented to the network input

layer. layer. The network propagates the input pattern from layer to The network propagates the input pattern from layer to

layer until the output pattern is generated by the output layer until the output pattern is generated by the output layer. layer.

Activation function generally sigmoid Activation function generally sigmoid If this pattern is different from the desired output, an error is If this pattern is different from the desired output, an error is

calculated and calculated and then propagated backwards through the then propagated backwards through the network from the output layer to the input layer.network from the output layer to the input layer. The weights The weights are modified as the error is propagated.are modified as the error is propagated.

See next slide for picture …See next slide for picture …

Back-propagation neural networkBack-propagation neural network


Three-layer back-propagation neural networkThree-layer back-propagation neural network

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m


Step 1Step 1: Initialisation: InitialisationSet all the weights and threshold levels of the Set all the weights and threshold levels of the network to random numbers uniformly network to random numbers uniformly distributed inside a small range:distributed inside a small range:

where where FFii is the total number of inputs of neuron is the total number of inputs of neuron ii

in the network. The weight initialisation is done in the network. The weight initialisation is done on a neuron-by-neuron basis.on a neuron-by-neuron basis.

The back-propagation training algorithmThe back-propagation training algorithm

ii FF

4.2 ,

4.2


Step 2Step 2: Activation: ActivationActivate the back-propagation neural network by Activate the back-propagation neural network by applying inputs applying inputs xx11((pp), ), xx22((pp),…, ),…, xxnn((pp) and desired ) and desired

outputs outputs yydd,1,1((pp), ), yydd,2,2((pp),…, ),…, yydd,,nn((pp).).

((aa) Calculate the actual outputs of the neurons in ) Calculate the actual outputs of the neurons in the hidden layer:the hidden layer:

where where nn is the number of inputs of neuron is the number of inputs of neuron jj in the in the hidden layer, and hidden layer, and sigmoidsigmoid is the is the sigmoidsigmoid activation activation function.function.

j

n

iijij pwpxsigmoidpy

1

)()()(


((bb) Calculate the actual outputs of the neurons in ) Calculate the actual outputs of the neurons in the output layer:the output layer:

where where mm is the number of inputs of neuron is the number of inputs of neuron kk in the in the output layer.output layer.

k

m

jjkjkk pwpxsigmoidpy

1

)()()(

Step 2Step 2: Activation (continued): Activation (continued)


Step 3Step 3: Weight training: Weight trainingUpdate the weights in the back-propagation network Update the weights in the back-propagation network propagating backward the errors associated with output propagating backward the errors associated with output neurons.neurons.((aa) Calculate the error gradient for the neurons in the ) Calculate the error gradient for the neurons in the output layer:output layer:

wherewhere (error at output unit k)(error at output unit k)

Calculate the weight corrections:Calculate the weight corrections:

(weight change for j to k link)Update the weights at the output neurons:Update the weights at the output neurons:

)()(1)()( pepypyp kkkk

)()()( , pypype kkdk

)()()( ppypw kjjk

)()()1( pwpwpw jkjkjk


((bb) Calculate the error gradient for the neurons in ) Calculate the error gradient for the neurons in the hidden layer:the hidden layer:

Calculate the weight corrections:Calculate the weight corrections:

Update the weights at the hidden neurons:Update the weights at the hidden neurons:

)()()(1)()(1

][ p wppypyp jk

l

kkjjj

)()()( ppxpw jiij

)()()1( pwpwpw ijijij

Step 3Step 3: Weight training (continued): Weight training (continued)


Step 4Step 4: Iteration: IterationIncrease iteration Increase iteration pp by one, go back to by one, go back to Step 2Step 2 and and repeat the process until the selected error criterion repeat the process until the selected error criterion is satisfied.is satisfied.


ExampleExample• network is required to perform logical operation network is required to perform logical operation

Exclusive-ORExclusive-OR. . • Recall that a single-layer perceptron could not Recall that a single-layer perceptron could not

do this operation. do this operation. • Now we will apply the three-layer back-Now we will apply the three-layer back-

propagation networkpropagation network• See BackPropLearningXor.xlsSee BackPropLearningXor.xls


Three-layer network for solving the Three-layer network for solving the Exclusive-OR operationExclusive-OR operation

y55

x1 31

x2

Inputlayer

Outputlayer

Hidden layer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1


The effect of the threshold applied to a neuron in the The effect of the threshold applied to a neuron in the hidden or output layer is represented by its weight, hidden or output layer is represented by its weight, , , connected to a fixed input equal to connected to a fixed input equal to 1.1.

The initial weights and threshold levels are set The initial weights and threshold levels are set randomly as follows:randomly as follows:ww1313 = 0.5, = 0.5, ww1414 = 0.9, = 0.9, ww2323 = 0.4, = 0.4, ww2424 = 1.0, = 1.0, ww3535 = = 1.2, 1.2,

ww4545 = 1.1, = 1.1, 33 = 0.8, = 0.8, 44 = = 0.1 and 0.1 and 55 = 0.3. = 0.3.

Example Example (con)(con)


Learning curve for operation Learning curve for operation Exclusive-ORExclusive-OR

0 50 100 150 200

101

Epoch

Su

m-S

qu

are

d E

rro

r

Sum-Squared Network Error for 224 Epochs

100

10-1

10-2

10-3

10-4


Final results of three-layer network learningFinal results of three-layer network learning

Inputs

x1 x2

1010

1100

011

Desiredoutput

yd

0

0.0155

Actualoutput

y5Y

Error

e

Sum ofsquarederrors

e 0.9849 0.9849 0.0175

0.0155 0.0151 0.0151 0.0175

0.0010


Network represented by McCulloch-Pitts model Network represented by McCulloch-Pitts model for solving the for solving the Exclusive-ORExclusive-OR operation operation

y55

x1 31

x2 42

+1.0

1

1

1+1.0

+1.0

+1.0

+1.5

+1.0

+1.0

+0.5

+0.5

-1.0


((aa) Decision boundary constructed by hidden neuron 3;) Decision boundary constructed by hidden neuron 3;((bb) Decision boundary constructed by hidden neuron 4; ) Decision boundary constructed by hidden neuron 4; ((cc) Decision boundaries constructed by the complete) Decision boundaries constructed by the complete three-layer networkthree-layer network

x1

x2

1

(a)

1

x2

1

1

(b)

00

x1 + x2 – 1.5 = 0 x1 + x2 – 0.5 = 0

x1 x1

x2

1

1

(c)

0

Decision boundariesDecision boundaries


Neural Nets in WekaNeural Nets in Weka

Xor – with default hidden layerXor – with default hidden layer Xor – with two hidden nodesXor – with two hidden nodes Basketball ClassBasketball Class Broadway Stratified – defaultBroadway Stratified – default Broadway Stratified – 10 hidden nodesBroadway Stratified – 10 hidden nodes


Accelerated learning in multilayer Accelerated learning in multilayer neural networksneural networks

A multilayer network learns much faster when the A multilayer network learns much faster when the sigmoidal activation function is represented by a sigmoidal activation function is represented by a hyperbolic tangenthyperbolic tangent::

where where aa and and bb are constants. are constants.

Suitable values for Suitable values for aa and and bb are: are: aa = 1.716 and = 1.716 and bb = 0.667 = 0.667

ae

aY

bXhtan

1

2


We also can accelerate training by including a We also can accelerate training by including a momentum termmomentum term in the delta rule: in the delta rule:

where where is a positive number (0 is a positive number (0 1) called the 1) called the momentum constantmomentum constant. Typically, the momentum . Typically, the momentum constant is set to 0.95.constant is set to 0.95.

This iteration’s change in weight is influenced by This iteration’s change in weight is influenced by last iteration’s change in weight !!!last iteration’s change in weight !!!

This equation is called the This equation is called the generalised delta rulegeneralised delta rule..

)()()1()( ppypwpw kjjkjk

Accelerated learning in multilayer neural networksAccelerated learning in multilayer neural networks

Basic version


Learning with momentum for operation Learning with momentum for operation Exclusive-ORExclusive-OR

0 20 40 60 80 100 12010-4

10-2

100

102

Epoch

Su

m-S

qu

are

d E

rro

r

Training for 126 Epochs

0 100 140-1

-0.5

0

0.5

1

1.5

Epoch

Lea

rnin

g R

ate

10-3

101

10-1

20 40 60 80 120


Learning with adaptive learning rateLearning with adaptive learning rate

To accelerate the convergence and yet avoid the To accelerate the convergence and yet avoid the

danger of instability, we can apply two heuristics:danger of instability, we can apply two heuristics:

Heuristic 1Heuristic 1If the change of the sum of squared errors has the same If the change of the sum of squared errors has the same algebraic sign for several consequent epochs, then the algebraic sign for several consequent epochs, then the learning rate parameter, learning rate parameter, , should be increased., should be increased.

Heuristic 2Heuristic 2If the algebraic sign of the change of the sum of If the algebraic sign of the change of the sum of squared errors alternates for several consequent squared errors alternates for several consequent epochs, then the learning rate parameter, epochs, then the learning rate parameter, , should be , should be decreased.decreased.


If the sum of squared errors at the current epoch If the sum of squared errors at the current epoch exceeds the previous value by more than a exceeds the previous value by more than a predefined ratio (typically 1.04), the learning rate predefined ratio (typically 1.04), the learning rate parameter is decreased (typically by multiplying parameter is decreased (typically by multiplying by 0.7) and new weights and thresholds are by 0.7) and new weights and thresholds are calculated. calculated.

If the error is less than the previous one, the If the error is less than the previous one, the learning rate is increased (typically by multiplying learning rate is increased (typically by multiplying by 1.05).by 1.05).

Learning with adaptive learning rate (con)Learning with adaptive learning rate (con)


Learning with adaptive learning rateLearning with adaptive learning rate

0 10 20 30 40 50 60 70 80 90 100Epoch


0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

Epoch

Le

arn

ing

Ra

te

10-4

10-2

100

102

Su

m-S

qu

are

d E

rro

r

10-3

101

10-1


Learning with momentum and adaptive learning rateLearning with momentum and adaptive learning rate

0 10 20 30 40 50 60 70 80Epoch


0 10 20 30 40 50 60 70 80 900

0.5

1

2.5

Epoch

Lea

rnin

g R

ate

10-4

10-2

100

102S

um

-Sq

ua

red

Err

or

10-3

101

10-1

1.5

2


End Neural NetworksEnd Neural Networks

negnevitsky, pearson education, 2002 1 chapter 6 artificial neural networks: n introduction, or how...

Documents

n information

neuron slide

pearson education

n introduction

n learning

brain slide

simple computing element

neuron output