negnevitsky, pearson education, 2002 1 chapter 6 artificial neural networks: n introduction, or how...
TRANSCRIPT
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 1
Chapter 6 Chapter 6 Artificial neural networks:Artificial neural networks: Introduction, or how the brain worksIntroduction, or how the brain works The neuron as a simple computing elementThe neuron as a simple computing element The perceptronThe perceptron Multilayer neural networksMultilayer neural networks
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 2
A A neural networkneural network is a model of reasoning inspired by the is a model of reasoning inspired by the human brain. human brain.
The brain consists of a densely interconnected set of nerve The brain consists of a densely interconnected set of nerve cells, or basic information-processing units, called cells, or basic information-processing units, called neuronsneurons. .
The human brain incorporates nearly 10 billion neurons and The human brain incorporates nearly 10 billion neurons and 60 trillion connections, 60 trillion connections, synapsessynapses, between them. , between them.
By using multiple neurons simultaneously, the brain can By using multiple neurons simultaneously, the brain can perform its functions much faster than the fastest computers in perform its functions much faster than the fastest computers in existence today.existence today.
Each neuron has a very simple structure, but an army of such Each neuron has a very simple structure, but an army of such elements constitutes a tremendous processing power. elements constitutes a tremendous processing power.
A neuron consists of a cell body, A neuron consists of a cell body, somasoma, a number of fibers , a number of fibers called called dendritesdendrites, and a single long fiber called the , and a single long fiber called the axonaxon..
Neural Networks and the BrainNeural Networks and the Brain
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 3
Biological neural networkBiological neural network
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 6
Architecture of a typical artificial neural networkArchitecture of a typical artificial neural network
Input Layer Output Layer
Middle Layer
I n
p u
t S
i g
n a
l s
O u
t p
u t
S
i g n
a l
s
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 7
Analogy between biological and Analogy between biological and artificial neural networksartificial neural networks
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 8
The neuron as a simple computing elementThe neuron as a simple computing element
Diagram of a neuronDiagram of a neuron
Neuron Y
Input Signals
x1
x2
xn
Output Signals
Y
Y
Y
w2
w1
wn
Weights
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 9
The neuron computes the weighted sum of the input signals The neuron computes the weighted sum of the input signals and compares the result with a and compares the result with a threshold valuethreshold value, , . .
If the net input is less than the threshold, the neuron output If the net input is less than the threshold, the neuron output is –1. is –1.
if the net input is greater than or equal to the threshold, the if the net input is greater than or equal to the threshold, the neuron becomes activated and its output is +1.neuron becomes activated and its output is +1.
The neuron uses the following transfer or The neuron uses the following transfer or activationactivation functionfunction::
This type of activation function is called a This type of activation function is called a sign functionsign function. . (McCulloch and Pitts 1943)(McCulloch and Pitts 1943)
n
iiiwxX
1
X
XY
if ,1
if ,1
A Simple Activation Function – Sign Function
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 10
4 Common Activation functions of a neuron4 Common Activation functions of a neuron
S t e p f u n c t io n S ig n f u n c t io n
+ 1
-1
0
+ 1
-1
0X
Y
X
Y
+ 1
-1
0 X
Y
S ig m o id f u n c t io n
+ 1
-1
0 X
Y
L in e a r f u n c t io n
0 if ,0
0 if ,1
X
XY step
0 if ,1
0 if ,1
X
XY sign
Xsigmoid
eY
1
1XY linear
Most Common?
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 11
Can a single neuron learn a task?Can a single neuron learn a task?
Start off with earliest/ simplestStart off with earliest/ simplest In 1958, In 1958, Frank RosenblattFrank Rosenblatt introduced a training introduced a training
algorithm that provided the first procedure for algorithm that provided the first procedure for training a simple ANN: a training a simple ANN: a perceptronperceptron. .
The perceptron is the simplest form of a neural The perceptron is the simplest form of a neural network. It consists of a single neuron with network. It consists of a single neuron with adjustableadjustable synaptic weights and a synaptic weights and a hard limiterhard limiter. .
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 12
Threshold
Inputs
x1
x2
Output
Y
HardLimiter
w2
w1
LinearCombiner
Single-layer two-input perceptronSingle-layer two-input perceptron
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 13
The PerceptronThe Perceptron
The operation of Rosenblatt’s perceptron is based The operation of Rosenblatt’s perceptron is based on the on the McCulloch and Pitts neuron modelMcCulloch and Pitts neuron model. The . The model consists of a linear combiner followed by a model consists of a linear combiner followed by a hard limiter. hard limiter.
The weighted sum of the inputs is applied to the The weighted sum of the inputs is applied to the hard limiter, which produces an output equal to +1 hard limiter, which produces an output equal to +1 if its input is positive and if its input is positive and 1 if it is negative. 1 if it is negative.
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 14
The aim of the perceptron is to classify inputs, The aim of the perceptron is to classify inputs,
xx11, , xx22, . . ., , . . ., xxnn, into one of two classes, say , into one of two classes, say
AA11 and and AA22. .
In the case of an elementary perceptron, the n-In the case of an elementary perceptron, the n-dimensional space is divided by a dimensional space is divided by a hyperplanehyperplane into into two decision regions. The hyperplane is defined by two decision regions. The hyperplane is defined by the the linearly separablelinearly separable function function::
See next slideSee next slide
01
n
iiiwx
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 15
Linear separability in the perceptronsLinear separability in the perceptrons
x1
x2
Class A2
Class A1
1
2
x1w1 + x2w2 = 0
(a) Two-input perceptron. (b) Three-input perceptron.
x2
x1
x3x1w1 + x2w2 + x3w3 = 0
12
Changing θ shifts the boundary
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 16
making small adjustments in the weights making small adjustments in the weights to reduce the difference between the actual and to reduce the difference between the actual and
desired outputs of the perceptron. desired outputs of the perceptron. Learns weights such that Learns weights such that output is consistent with output is consistent with
the training examples.the training examples. The initial weights are randomly assigned, The initial weights are randomly assigned,
usually in the range [usually in the range [0.5, 0.5], 0.5, 0.5],
How does the perceptron learn its classification How does the perceptron learn its classification tasks?tasks?
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 17
If at iteration If at iteration pp, the actual output is , the actual output is YY((pp) and the ) and the desired output is desired output is YYd d ((pp), then the error is given by:), then the error is given by:
where where pp = 1, 2, 3, . . . = 1, 2, 3, . . .
Iteration Iteration pp here refers to the here refers to the ppth training example th training example presented to the perceptron.presented to the perceptron.
If the error, If the error, ee((pp), is positive, we need to increase ), is positive, we need to increase perceptron output perceptron output YY((pp), but if it is negative, we ), but if it is negative, we need to decrease need to decrease YY((pp).).
)()()( pYpYpe d
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 18
The perceptron learning ruleThe perceptron learning rule
where where pp is iteration # = 1, 2, 3, . . . is iteration # = 1, 2, 3, . . . is the is the learning ratelearning rate, a positive constant less than unity (1)., a positive constant less than unity (1). Intuition:Intuition:
Weight at next iteration is based on an adjustment from the current Weight at next iteration is based on an adjustment from the current weightweight
Adjustment amount is influenced by the amount of the error, the Adjustment amount is influenced by the amount of the error, the size of the input, and the learning ratesize of the input, and the learning rate
Learning rate is a free parameter that must be “Learning rate is a free parameter that must be “tunedtuned”” The perceptron learning rule was first proposed by The perceptron learning rule was first proposed by Rosenblatt Rosenblatt in in
1960. 1960. Using this rule we can derive the perceptron training algorithm for Using this rule we can derive the perceptron training algorithm for
classification tasks.classification tasks.
)()()()1( pepxpwpw iii
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 19
Step 1Step 1: Initialisation: InitialisationSet initial weights Set initial weights ww11, , ww22,…, ,…, wwnn and threshold and threshold to to
random numbers in the range [random numbers in the range [0.5, 0.5]. 0.5, 0.5].
(during training, If the error, (during training, If the error, ee((pp), is positive, we ), is positive, we need to increase perceptron output need to increase perceptron output YY((pp), but if it is ), but if it is negative, we need to decrease negative, we need to decrease YY((pp).)).)
Perceptron’s training algorithmPerceptron’s training algorithm
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 20
Step 2Step 2: Activation: ActivationActivate the perceptron by applying inputs Activate the perceptron by applying inputs xx11((pp), ),
xx22((pp),…, ),…, xxnn((pp) and desired output ) and desired output YYd d ((pp). ).
Calculate the actual output at iteration Calculate the actual output at iteration pp = 1 = 1
where where nn is the number of the perceptron inputs, is the number of the perceptron inputs, and and stepstep is a step activation function. is a step activation function.
Perceptron’s training algorithm (continued)Perceptron’s training algorithm (continued)
n
iii pwpxsteppY
1
)( )()(
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 21
Step 3Step 3: Weight training: Weight trainingUpdate the weights of the perceptronUpdate the weights of the perceptron
where where ΔΔ wwii ( (pp) ) is the weight correction for weight is the weight correction for weight ii
at iteration at iteration pp..
The weight correction is computed by the The weight correction is computed by the delta ruledelta rule::
Step 4Step 4: Iteration: IterationIncrease iteration Increase iteration pp by one, go back to by one, go back to Step 2Step 2 and and repeat the process until convergence.repeat the process until convergence.
)()()1( pwpwpw iii
Perceptron’s training algorithm (continued)Perceptron’s training algorithm (continued)
)()()( pepxpw ii
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 22
Example of perceptron learning: the logical operation Example of perceptron learning: the logical operation ANDANDInputs
x1 x2
0011
0101
000
EpochDesiredoutput
Yd
1
Initialweights
w1 w2
1
0.30.30.30.2
0.1 0.1 0.1 0.1
0010
Actualoutput
Y
Error
e
00
11
Finalweights
w1 w2
0.30.30.20.3
0.1 0.1 0.1 0.0
0011
0101
000
2
1
0.30.30.30.2
0011
00
10
0.30.30.20.2
0.0 0.0 0.0 0.0
0011
0101
000
3
1
0.20.20.20.1
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0010
00
11
0.20.20.10.2
0.0 0.0 0.0 0.1
0011
0101
000
4
1
0.20.20.20.1
0.1 0.1 0.1 0.1
0011
00
10
0.20.20.10.1
0.1 0.1 0.1 0.1
0011
0101
000
5
1
0.10.10.10.1
0.1 0.1 0.1 0.1
0001
00
0
0.10.10.10.1
0.1 0.1 0.1 0.1
0
Threshold: = 0.2; learning rate: = 0.1
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 23
Two-dimensional plots of basic logical operationsTwo-dimensional plots of basic logical operations
x1
x2
1
(a) AND (x1 x2)
1
x1
x2
1
1
(b) OR (x1 x2)
x1
x2
1
1
(c) Exclusive-OR(x1 x2)
00 0
A perceptron can learn the operations A perceptron can learn the operations ANDAND and and OROR, but not , but not Exclusive-ORExclusive-OR. .
Exclusive-OR Exclusive-OR is NOT linearly separableis NOT linearly separable This limitation stalled neural network research for more This limitation stalled neural network research for more
than a decadethan a decade
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 24
Multilayer neural networksMultilayer neural networks
A multilayer perceptron is a feedforward neural A multilayer perceptron is a feedforward neural network with one or more hidden layers. network with one or more hidden layers.
The network consists of an The network consists of an input layerinput layer of source of source neurons, at least one middle or neurons, at least one middle or hidden layerhidden layer of of computational neurons, and an computational neurons, and an output layeroutput layer of of computational neurons. computational neurons.
The input signals are propagated in a forward The input signals are propagated in a forward direction on a layer-by-layer basis.direction on a layer-by-layer basis.
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 25
Multilayer perceptron with two hidden layersMultilayer perceptron with two hidden layers
Inputlayer
Firsthiddenlayer
Secondhiddenlayer
Outputlayer
O u
t p
u t
S
i g n
a l
s
I n
p u
t S
i g
n a
l s
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 27
Hidden LayerHidden Layer
Detects features in the inputs – hidden Detects features in the inputs – hidden patternspatterns
With one hidden layer, can represent any With one hidden layer, can represent any continuous function of the inputscontinuous function of the inputs
With two hidden layers even discontinuous With two hidden layers even discontinuous functions can be representedfunctions can be represented
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 28
Back-propagation neural networkBack-propagation neural network
Most popular of 100+ ANN learning algorithmsMost popular of 100+ ANN learning algorithms Learning in a multilayer network proceeds the same Learning in a multilayer network proceeds the same
way as for a perceptron. way as for a perceptron. A training set of input patterns is presented to the A training set of input patterns is presented to the
network. network. The network computes its output pattern, and if there The network computes its output pattern, and if there
is an error is an error or in other words a difference between or in other words a difference between actual and desired output patterns actual and desired output patterns the weights are the weights are adjusted to reduce this error.adjusted to reduce this error.
The difference is in the number of weights and The difference is in the number of weights and architecture …architecture …
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 29
In a back-propagation neural network, the learning algorithm has In a back-propagation neural network, the learning algorithm has two phases: two phases: a training input pattern is presented to the network input a training input pattern is presented to the network input
layer. layer. The network propagates the input pattern from layer to The network propagates the input pattern from layer to
layer until the output pattern is generated by the output layer until the output pattern is generated by the output layer. layer.
Activation function generally sigmoid Activation function generally sigmoid If this pattern is different from the desired output, an error is If this pattern is different from the desired output, an error is
calculated and calculated and then propagated backwards through the then propagated backwards through the network from the output layer to the input layer.network from the output layer to the input layer. The weights The weights are modified as the error is propagated.are modified as the error is propagated.
See next slide for picture …See next slide for picture …
Back-propagation neural networkBack-propagation neural network
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 30
Three-layer back-propagation neural networkThree-layer back-propagation neural network
Inputlayer
xi
x1
x2
xn
1
2
i
n
Outputlayer
1
2
k
l
yk
y1
y2
yl
Input signals
Error signals
wjk
Hiddenlayer
wij
1
2
j
m
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 31
Step 1Step 1: Initialisation: InitialisationSet all the weights and threshold levels of the Set all the weights and threshold levels of the network to random numbers uniformly network to random numbers uniformly distributed inside a small range:distributed inside a small range:
where where FFii is the total number of inputs of neuron is the total number of inputs of neuron ii
in the network. The weight initialisation is done in the network. The weight initialisation is done on a neuron-by-neuron basis.on a neuron-by-neuron basis.
The back-propagation training algorithmThe back-propagation training algorithm
ii FF
4.2 ,
4.2
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 32
Step 2Step 2: Activation: ActivationActivate the back-propagation neural network by Activate the back-propagation neural network by applying inputs applying inputs xx11((pp), ), xx22((pp),…, ),…, xxnn((pp) and desired ) and desired
outputs outputs yydd,1,1((pp), ), yydd,2,2((pp),…, ),…, yydd,,nn((pp).).
((aa) Calculate the actual outputs of the neurons in ) Calculate the actual outputs of the neurons in the hidden layer:the hidden layer:
where where nn is the number of inputs of neuron is the number of inputs of neuron jj in the in the hidden layer, and hidden layer, and sigmoidsigmoid is the is the sigmoidsigmoid activation activation function.function.
j
n
iijij pwpxsigmoidpy
1
)()()(
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 33
((bb) Calculate the actual outputs of the neurons in ) Calculate the actual outputs of the neurons in the output layer:the output layer:
where where mm is the number of inputs of neuron is the number of inputs of neuron kk in the in the output layer.output layer.
k
m
jjkjkk pwpxsigmoidpy
1
)()()(
Step 2Step 2: Activation (continued): Activation (continued)
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 34
Step 3Step 3: Weight training: Weight trainingUpdate the weights in the back-propagation network Update the weights in the back-propagation network propagating backward the errors associated with output propagating backward the errors associated with output neurons.neurons.((aa) Calculate the error gradient for the neurons in the ) Calculate the error gradient for the neurons in the output layer:output layer:
wherewhere (error at output unit k)(error at output unit k)
Calculate the weight corrections:Calculate the weight corrections:
(weight change for j to k link)Update the weights at the output neurons:Update the weights at the output neurons:
)()(1)()( pepypyp kkkk
)()()( , pypype kkdk
)()()( ppypw kjjk
)()()1( pwpwpw jkjkjk
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 35
((bb) Calculate the error gradient for the neurons in ) Calculate the error gradient for the neurons in the hidden layer:the hidden layer:
Calculate the weight corrections:Calculate the weight corrections:
Update the weights at the hidden neurons:Update the weights at the hidden neurons:
)()()(1)()(1
][ p wppypyp jk
l
kkjjj
)()()( ppxpw jiij
)()()1( pwpwpw ijijij
Step 3Step 3: Weight training (continued): Weight training (continued)
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 36
Step 4Step 4: Iteration: IterationIncrease iteration Increase iteration pp by one, go back to by one, go back to Step 2Step 2 and and repeat the process until the selected error criterion repeat the process until the selected error criterion is satisfied.is satisfied.
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 37
ExampleExample• network is required to perform logical operation network is required to perform logical operation
Exclusive-ORExclusive-OR. . • Recall that a single-layer perceptron could not Recall that a single-layer perceptron could not
do this operation. do this operation. • Now we will apply the three-layer back-Now we will apply the three-layer back-
propagation networkpropagation network• See BackPropLearningXor.xlsSee BackPropLearningXor.xls
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 38
Three-layer network for solving the Three-layer network for solving the Exclusive-OR operationExclusive-OR operation
y55
x1 31
x2
Inputlayer
Outputlayer
Hidden layer
42
3
w13
w24
w23
w24
w35
w45
4
5
1
1
1
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 39
The effect of the threshold applied to a neuron in the The effect of the threshold applied to a neuron in the hidden or output layer is represented by its weight, hidden or output layer is represented by its weight, , , connected to a fixed input equal to connected to a fixed input equal to 1.1.
The initial weights and threshold levels are set The initial weights and threshold levels are set randomly as follows:randomly as follows:ww1313 = 0.5, = 0.5, ww1414 = 0.9, = 0.9, ww2323 = 0.4, = 0.4, ww2424 = 1.0, = 1.0, ww3535 = = 1.2, 1.2,
ww4545 = 1.1, = 1.1, 33 = 0.8, = 0.8, 44 = = 0.1 and 0.1 and 55 = 0.3. = 0.3.
Example Example (con)(con)
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 44
Learning curve for operation Learning curve for operation Exclusive-ORExclusive-OR
0 50 100 150 200
101
Epoch
Su
m-S
qu
are
d E
rro
r
Sum-Squared Network Error for 224 Epochs
100
10-1
10-2
10-3
10-4
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 45
Final results of three-layer network learningFinal results of three-layer network learning
Inputs
x1 x2
1010
1100
011
Desiredoutput
yd
0
0.0155
Actualoutput
y5Y
Error
e
Sum ofsquarederrors
e 0.9849 0.9849 0.0175
0.0155 0.0151 0.0151 0.0175
0.0010
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 46
Network represented by McCulloch-Pitts model Network represented by McCulloch-Pitts model for solving the for solving the Exclusive-ORExclusive-OR operation operation
y55
x1 31
x2 42
+1.0
1
1
1+1.0
+1.0
+1.0
+1.5
+1.0
+1.0
+0.5
+0.5
-1.0
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 47
((aa) Decision boundary constructed by hidden neuron 3;) Decision boundary constructed by hidden neuron 3;((bb) Decision boundary constructed by hidden neuron 4; ) Decision boundary constructed by hidden neuron 4; ((cc) Decision boundaries constructed by the complete) Decision boundaries constructed by the complete three-layer networkthree-layer network
x1
x2
1
(a)
1
x2
1
1
(b)
00
x1 + x2 – 1.5 = 0 x1 + x2 – 0.5 = 0
x1 x1
x2
1
1
(c)
0
Decision boundariesDecision boundaries
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 48
Neural Nets in WekaNeural Nets in Weka
Xor – with default hidden layerXor – with default hidden layer Xor – with two hidden nodesXor – with two hidden nodes Basketball ClassBasketball Class Broadway Stratified – defaultBroadway Stratified – default Broadway Stratified – 10 hidden nodesBroadway Stratified – 10 hidden nodes
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 49
Accelerated learning in multilayer Accelerated learning in multilayer neural networksneural networks
A multilayer network learns much faster when the A multilayer network learns much faster when the sigmoidal activation function is represented by a sigmoidal activation function is represented by a hyperbolic tangenthyperbolic tangent::
where where aa and and bb are constants. are constants.
Suitable values for Suitable values for aa and and bb are: are: aa = 1.716 and = 1.716 and bb = 0.667 = 0.667
ae
aY
bXhtan
1
2
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 50
We also can accelerate training by including a We also can accelerate training by including a momentum termmomentum term in the delta rule: in the delta rule:
where where is a positive number (0 is a positive number (0 1) called the 1) called the momentum constantmomentum constant. Typically, the momentum . Typically, the momentum constant is set to 0.95.constant is set to 0.95.
This iteration’s change in weight is influenced by This iteration’s change in weight is influenced by last iteration’s change in weight !!!last iteration’s change in weight !!!
This equation is called the This equation is called the generalised delta rulegeneralised delta rule..
)()()1()( ppypwpw kjjkjk
Accelerated learning in multilayer neural networksAccelerated learning in multilayer neural networks
Basic version
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 51
Learning with momentum for operation Learning with momentum for operation Exclusive-ORExclusive-OR
0 20 40 60 80 100 12010-4
10-2
100
102
Epoch
Su
m-S
qu
are
d E
rro
r
Training for 126 Epochs
0 100 140-1
-0.5
0
0.5
1
1.5
Epoch
Lea
rnin
g R
ate
10-3
101
10-1
20 40 60 80 120
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 52
Learning with adaptive learning rateLearning with adaptive learning rate
To accelerate the convergence and yet avoid the To accelerate the convergence and yet avoid the
danger of instability, we can apply two heuristics:danger of instability, we can apply two heuristics:
Heuristic 1Heuristic 1If the change of the sum of squared errors has the same If the change of the sum of squared errors has the same algebraic sign for several consequent epochs, then the algebraic sign for several consequent epochs, then the learning rate parameter, learning rate parameter, , should be increased., should be increased.
Heuristic 2Heuristic 2If the algebraic sign of the change of the sum of If the algebraic sign of the change of the sum of squared errors alternates for several consequent squared errors alternates for several consequent epochs, then the learning rate parameter, epochs, then the learning rate parameter, , should be , should be decreased.decreased.
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 53
If the sum of squared errors at the current epoch If the sum of squared errors at the current epoch exceeds the previous value by more than a exceeds the previous value by more than a predefined ratio (typically 1.04), the learning rate predefined ratio (typically 1.04), the learning rate parameter is decreased (typically by multiplying parameter is decreased (typically by multiplying by 0.7) and new weights and thresholds are by 0.7) and new weights and thresholds are calculated. calculated.
If the error is less than the previous one, the If the error is less than the previous one, the learning rate is increased (typically by multiplying learning rate is increased (typically by multiplying by 1.05).by 1.05).
Learning with adaptive learning rate (con)Learning with adaptive learning rate (con)
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 54
Learning with adaptive learning rateLearning with adaptive learning rate
0 10 20 30 40 50 60 70 80 90 100Epoch
Training for 103 Epochs
0 20 40 60 80 100 1200
0.2
0.4
0.6
0.8
1
Epoch
Le
arn
ing
Ra
te
10-4
10-2
100
102
Su
m-S
qu
are
d E
rro
r
10-3
101
10-1
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 55
Learning with momentum and adaptive learning rateLearning with momentum and adaptive learning rate
0 10 20 30 40 50 60 70 80Epoch
Training for 85 Epochs
0 10 20 30 40 50 60 70 80 900
0.5
1
2.5
Epoch
Lea
rnin
g R
ate
10-4
10-2
100
102S
um
-Sq
ua
red
Err
or
10-3
101
10-1
1.5
2
Negnevitsky, Pearson Education, 2002Negnevitsky, Pearson Education, 2002 56
End Neural NetworksEnd Neural Networks