back propagation: variations
TRANSCRIPT
-
8/3/2019 Back Propagation: Variations
1/21
Back Propagation: Variations
Layer m
Input Output
Layer 0 Layer m+1
X1
Xn
Neuron#1
Neuron#2
.
.
.
.
.
.
Neuron#k
Neuron#1
Neuron#p
x'2
x'1
x'i
Y p(m+1)
Y 1(m+1)
-
8/3/2019 Back Propagation: Variations
2/21
BP- Improvements
Second order derivatives(Parker, 1982)
Dynamic range modification (Stronetta
and Huberman, 1987)
F(x) = -1/2 + 1/1+e-x
Meta Learning (Jacobs, 1987; Hagiwara, 1990)
Selective updates (Huang and Huang, 1990)
-
8/3/2019 Back Propagation: Variations
3/21
BP- Improvements (Cont.)
Use of Momentum weight change(Rumelhart, 1986)
wkmi(t+1) = * km* xi(t) +wkmi(t)
Exponential smoothing (Sejnowski and
Rosenburg, 1987)
wkmi(t+1) = (1)* km* xi(t) +wkmi(t)
-
8/3/2019 Back Propagation: Variations
4/21
BP- Improvements (Cont.)
Accelerating the BP algorithm (Kothari,Klinkhachorn, and Nutter, 1991)
Gradual increase in learning accuracy Without incorporating the disadvantage
of increased network size, more complex
neurons or otherwise violating the parallel
structure of computation
-
8/3/2019 Back Propagation: Variations
5/21
Gradual increase in learning accuracy
Temporal instability
Absence of tru direction of descent
Void Acc_BackProp (Struct Network *N, struct Train_Set *T)
{Assume_coarse_error ()while ( < Eventual_Accuracy) {
while (not_all_trained) {
Present_Next_Pattern;
while (!Trained)
Train_Pattern;}
Increase_Accuracy ( -= Step);
}
}
-
8/3/2019 Back Propagation: Variations
6/21
Training with gradual increase in accuracy
Direction of
Steepest descent
Direction of descent suggested by examplar 1
Direction of descent suggested
by examplar 2
Direction of descent suggested
by examplar 3...
Direction of descent suggested
by examplar M
-
8/3/2019 Back Propagation: Variations
7/21
Error VS Trainning Passes
0 10000 20000 30000 40000 50000 60000
0
2
4
6
8
10
12
BP
BPGIA
BP+Mom
BPGIA+Mom
Training passes
Overallerror
Minimization of the error for a 4 bit 1's complementor(Graph has been curtailed to show detail)
-
8/3/2019 Back Propagation: Variations
8/21
Error VS Trainning Passes
0 100000 200000 300000
0
1
2
3
4
5
BP
BPGIA
BP+Mom
BPGIA+Mom
Training passes
Overall
error
Minimization of the error for a 3-to-8 Decoder
-
8/3/2019 Back Propagation: Variations
9/21
Error VS Trainning Passes
0 100000 200000
0.0
0.2
0.4
0.6
0.8
BP
BPGIA
BP+Mom
BPGIA+Mom
Training passes
Overallerror
Minimization of the error for the Xor problem
-
8/3/2019 Back Propagation: Variations
10/21
Error VS Trainning Passes
Minimization of the error for simple shape recognizer
0 20000 40000 60000 80000 100000 120000
0
1
2
3
BP
BPGIA
BP+Mom
BPGIA+Mom
Training passes
Overallerror
-
8/3/2019 Back Propagation: Variations
11/21
Error VS Trainning Passes
Minimization of the error for a 3 bit rotate register
0 10000 20000 30000 40000 50000
0
1
2
3
4
BP
BPGIA
BP+Mom
BPGIA+Mom
Training passes
Overallerro
r
-
8/3/2019 Back Propagation: Variations
12/21
Error VS Trainning Passes
Problem
(network size)
1s complement
(4x8x4)
3 to 8 decoder(3x8x8)
Exor(2x2x1)
Rotate register
(3x6x3)
Differentiation between asquare, circle and triangle
(16x20x1)
BP
9.7
(134922)
5.4(347634)
4.5(211093)
4.3
(72477)
2.3(71253)
BPGIA
6.6
(92567)
4.2(268833)
1.8(88207)
2.0
(33909)
1.3(33909)
BP+Mom.
2.2
(25574)
1.1(61366)
2.5(107337)
1.1
(15929)
6.11(145363)
BPGIA+Mom.
1.0
(11863)
1.0(53796)
1.0(45916)
1.0
(14987)
1.0(25163)
-
8/3/2019 Back Propagation: Variations
13/21
Training with gradual increase in accuracy
On an average, doubles the convergence
rate of back propagation or the back
propagation algorithm utilizing a
momentum weight change withoutrequiring additional or more complex
neurons
-
8/3/2019 Back Propagation: Variations
14/21
Nonsaturating Activation Functions
For some applications, where saturation
is not especially beneficial, a
nonsaturating activation function may be
used. One suitable example is
F(x) = log(1+x) for x>0
=-log(1-x) for x0
= 1/(1-x) for x
-
8/3/2019 Back Propagation: Variations
15/21
Nonsaturating Activation Functions
Example: BP for the XOR
Problem Logarithmic BipolarSigmoidStandard
bipolar XOR144 epochs 387 epochs
Modified
bipolar XOR(+.8 or -.8)
77 epochs 264 epochs
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall
-
8/3/2019 Back Propagation: Variations
16/21
Nonsaturating Activation Functions
Example: Product of sine functions
(continuous single output)
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall
Y=sin(2x1)*sin(2x2)*Training 5000 epochs @ mean squared error 0.024, =.05
-
8/3/2019 Back Propagation: Variations
17/21
Strictly Local Backpropagation
Standard BP Requires sharing of information among
processors (violation of accepted theories on
the functioning of biological neurons)
lacks biological plausibility
Strictly Local BP (Fausett, 1990)
Alleviates the standard BP deficiency
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall
-
8/3/2019 Back Propagation: Variations
18/21
Strictly Local BP Architecture
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall
-
8/3/2019 Back Propagation: Variations
19/21
Strictly Local BP Architecture
Cortical unit Sums its inputs and sends the resulting value as a
signal to the next unit above it
Synaptic units Receive a single input signal, apply an activation
function to the input, multiply the result by a weight,
and send the result to a single unit above
Thalamic unit Compare the computed output with the target value. If
they do not match, the thalamic unit sends an error
signal to the output synaptic unit below it
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall
-
8/3/2019 Back Propagation: Variations
20/21
BP VS. Strictly Local BP
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall
-
8/3/2019 Back Propagation: Variations
21/21
BP VS. Strictly Local BP
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall