back propagation: variations

8/3/2019 Back Propagation: Variations

1/21

Back Propagation: Variations

Layer m

Input Output

Layer 0 Layer m+1

X1

Xn

Neuron#1

Neuron#2

.

.

.

.

.

.

Neuron#k

Neuron#1

Neuron#p

x'2

x'1

x'i

Y p(m+1)

Y 1(m+1)


2/21

BP- Improvements

Second order derivatives(Parker, 1982)

Dynamic range modification (Stronetta

and Huberman, 1987)

F(x) = -1/2 + 1/1+e-x

Meta Learning (Jacobs, 1987; Hagiwara, 1990)

Selective updates (Huang and Huang, 1990)


3/21

BP- Improvements (Cont.)

Use of Momentum weight change(Rumelhart, 1986)

wkmi(t+1) = * km* xi(t) +wkmi(t)

Exponential smoothing (Sejnowski and

Rosenburg, 1987)

wkmi(t+1) = (1)* km* xi(t) +wkmi(t)


4/21

BP- Improvements (Cont.)

Accelerating the BP algorithm (Kothari,Klinkhachorn, and Nutter, 1991)

Gradual increase in learning accuracy Without incorporating the disadvantage

of increased network size, more complex

neurons or otherwise violating the parallel

structure of computation


5/21

Gradual increase in learning accuracy

Temporal instability

Absence of tru direction of descent

Void Acc_BackProp (Struct Network *N, struct Train_Set *T)

{Assume_coarse_error ()while ( < Eventual_Accuracy) {

while (not_all_trained) {

Present_Next_Pattern;

while (!Trained)

Train_Pattern;}

Increase_Accuracy ( -= Step);

}

}


6/21

Training with gradual increase in accuracy

Direction of

Steepest descent

Direction of descent suggested by examplar 1

Direction of descent suggested

by examplar 2


by examplar 3...


by examplar M


7/21

Error VS Trainning Passes

0 10000 20000 30000 40000 50000 60000

0

2

4

6

8

10

12

BP

BPGIA

BP+Mom

BPGIA+Mom

Training passes

Overallerror

Minimization of the error for a 4 bit 1's complementor(Graph has been curtailed to show detail)


8/21


0 100000 200000 300000

0

1

2

3

4

5

BP

BPGIA

BP+Mom

BPGIA+Mom

Training passes

Overall

error

Minimization of the error for a 3-to-8 Decoder


9/21


0 100000 200000

0.0

0.2

0.4

0.6

0.8

BP

BPGIA

BP+Mom

BPGIA+Mom

Training passes

Overallerror

Minimization of the error for the Xor problem


10/21


Minimization of the error for simple shape recognizer

0 20000 40000 60000 80000 100000 120000

0

1

2

3

BP

BPGIA

BP+Mom

BPGIA+Mom

Training passes

Overallerror


11/21


Minimization of the error for a 3 bit rotate register

0 10000 20000 30000 40000 50000

0

1

2

3

4

BP

BPGIA

BP+Mom

BPGIA+Mom

Training passes

Overallerro

r


12/21


Problem

(network size)

1s complement

(4x8x4)

3 to 8 decoder(3x8x8)

Exor(2x2x1)

Rotate register

(3x6x3)

Differentiation between asquare, circle and triangle

(16x20x1)

BP

9.7

(134922)

5.4(347634)

4.5(211093)

4.3

(72477)

2.3(71253)

BPGIA

6.6

(92567)

4.2(268833)

1.8(88207)

2.0

(33909)

1.3(33909)

BP+Mom.

2.2

(25574)

1.1(61366)

2.5(107337)

1.1

(15929)

6.11(145363)

BPGIA+Mom.

1.0

(11863)

1.0(53796)

1.0(45916)

1.0

(14987)

1.0(25163)


13/21

Training with gradual increase in accuracy

On an average, doubles the convergence

rate of back propagation or the back

propagation algorithm utilizing a

momentum weight change withoutrequiring additional or more complex

neurons


14/21

Nonsaturating Activation Functions

For some applications, where saturation

is not especially beneficial, a

nonsaturating activation function may be

used. One suitable example is

F(x) = log(1+x) for x>0

=-log(1-x) for x0

= 1/(1-x) for x


15/21


Example: BP for the XOR

Problem Logarithmic BipolarSigmoidStandard

bipolar XOR144 epochs 387 epochs

Modified

bipolar XOR(+.8 or -.8)

77 epochs 264 epochs

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall


16/21


Example: Product of sine functions

(continuous single output)


Y=sin(2x1)*sin(2x2)*Training 5000 epochs @ mean squared error 0.024, =.05


17/21

Strictly Local Backpropagation

Standard BP Requires sharing of information among

processors (violation of accepted theories on

the functioning of biological neurons)

lacks biological plausibility

Strictly Local BP (Fausett, 1990)

Alleviates the standard BP deficiency



18/21

Strictly Local BP Architecture



19/21

Strictly Local BP Architecture

Cortical unit Sums its inputs and sends the resulting value as a

signal to the next unit above it

Synaptic units Receive a single input signal, apply an activation

function to the input, multiply the result by a weight,

and send the result to a single unit above

Thalamic unit Compare the computed output with the target value. If

they do not match, the thalamic unit sends an error

signal to the output synaptic unit below it



20/21

BP VS. Strictly Local BP



21/21

BP VS. Strictly Local BP


back propagation: variations

Documents