financial informatics –xvi: backpropagation learning · backpropagation is a procedure for...
TRANSCRIPT
1
Fin
an
cia
l In
form
ati
cs –
XV
I:Su
pervis
ed
Backpropagati
on
Learn
ing
1
Kh
ursh
id A
hm
ad
, P
rofe
ssor o
f C
om
pu
ter S
cie
nce,
Dep
artm
en
t of C
om
pu
ter S
cie
nce
Trin
ity C
oll
ege,
Du
bli
n-2
, IR
EL
AN
DN
ovem
ber 1
9th
, 20
08
.h
ttp
s://
ww
w.c
s.tc
d.i
e/K
hu
rsh
id.A
hm
ad
/Tea
chin
g.h
tml
2
Preamble
Neural Networks 'learn' by adapting in accordance with a
training regimen: Five key algorithms.
ER
RO
R-C
OR
RE
CT
ION
OR
PE
RF
OR
MA
NC
E L
EA
RN
ING
HE
BB
IAN
OR
CO
INC
IDE
NC
E L
EA
RN
ING
BO
LT
ZM
AN
LE
AR
NIN
G (
ST
OC
HA
ST
IC N
ET
L
EA
RN
ING
)
CO
MP
ET
ITIV
E L
EA
RN
ING
FIL
TE
R L
EA
RN
ING
(G
RO
SSB
ER
G'S
NE
TS)
3
ANN Learning Algorithms
EN
VIR
ON
ME
NT
LE
AR
NIN
G
SY
ST
EM
Error Signal
Vector
describing the
environment
Desired
response
Actual
Response
Σ ΣΣΣ-
+
TE
AC
HE
R
4E
NV
IRO
NM
EN
TL
EA
RN
ING
S
YS
TE
M
Vector describing
state of the
environment
ANN Learning Algorithms
5
ANN Learning Algorithms
EN
VIR
ON
ME
NT
CR
ITIC
LE
AR
NIN
G
SY
ST
EM
Actions
State-vector input
Primary
Reinforcement
Heuristic
Reinforcement
6
Back-propagation Algorithm:
Supervised Learning
Βac
kpro
pag
atio
n (
BP
) is
am
ongst
the
‘most
popula
r al
gori
thm
s
for
AN
Ns’
: it
has
bee
n e
stim
ated
by P
aul
Wer
bos,
the
per
son
who f
irst
work
ed o
n t
he
algori
thm
in t
he
1970’s
, th
at b
etw
een
40%
and 9
0%
of
the
real
worl
d A
NN
appli
cati
ons
use
the
BP
algori
thm
. W
erbos
trac
es t
he
algori
thm
to t
he
psy
cholo
gis
t
Sig
mund F
reud’s
theo
ry o
f psy
chodyn
am
ics.
W
erbos
appli
ed
the
algori
thm
in p
olitica
l fo
reca
stin
g.
•Dav
id R
um
elhar
t, G
eoff
ery
Hin
ton a
nd o
ther
s ap
pli
ed t
he
BP
algori
thm
in t
he
1980’s
to p
roble
ms
rela
ted t
o s
uper
vis
ed
lear
nin
g, par
ticu
larl
y patter
n rec
ognitio
n.
•The
most
use
ful
exam
ple
of
the
BP
alg
ori
thm
has
bee
n i
n
dea
ling w
ith p
roble
ms
rela
ted t
o pre
diction
and control.
7
Back-propagation Algorithm:
Architecture of a BP system
8
Back-propagation Algorithm
ΒA
SIC
DE
FIN
ITIO
NS
1.
Bac
kpro
pag
atio
n i
s a
pro
cedure
for
effi
cien
tly c
alcu
lati
ng
the
der
ivat
ives
of
som
e outp
ut
quan
tity
of
a non-l
inea
r
syst
em, w
ith r
espec
t to
all
inputs
and p
aram
eter
s of
that
syst
em, th
rough c
alcu
lati
ons
pro
ceed
ing b
ack
ward
s fr
om
outp
uts
to i
nputs
.
2.
Bac
kpro
pag
atio
n i
s an
y t
echniq
ue
for
adap
ting t
he
wei
ghts
or
par
amet
ers
of
a nonli
nea
r sy
stem
by s
om
eho
w u
sing s
uch
der
ivat
ives
or
the
equiv
alen
t.
Acc
ord
ing t
o P
au
l W
erb
os
ther
e is
no s
uch
th
ing a
s
a “
back
pro
pagati
on
net
work
”, h
e u
sed
an
AN
N
des
ign
call
ed a
multilaye
r per
ceptron
.
9
Back-propagation Algorithm
Pau
l W
erbos
pro
vid
ed a
‘ru
le f
or
updat
ing t
he
wei
ghts
of
a
mult
i-la
yer
ed n
etw
ork
under
goin
g s
up
erv
ised
lea
rnin
g.
It i
s th
e w
eight
adap
tati
on r
ule
whic
h i
s ca
lled
back
pro
pagation.
Typic
ally
, a
full
y c
onnec
ted f
eedfo
rwar
d n
etw
ork
is
use
d t
o
be
trai
ned
usi
ng t
he
BP
alg
ori
thm
: ac
tivat
ion i
n s
uch
net
work
s
trav
els
in a
dir
ecti
on f
rom
the
input
to t
he
outp
ut
layer
and t
he
unit
s in
one
layer
are
connec
ted t
o e
ver
y o
ther
unit
in t
he
nex
t
layer
.
Ther
e ar
e tw
o swee
ps
of
the
full
y c
onnec
ted n
etw
ork
: fo
rward
swee
p a
nd b
ack
ward
swee
p.
10
Back-propagation Algorithm
Ther
e ar
e tw
o swee
ps
of
the
full
y c
onnec
ted n
etw
ork
: fo
rward
swee
p a
nd b
ack
ward
swee
p.
Forw
ard
Swee
p:
This
sw
eep i
s si
mil
ar t
o a
ny o
ther
feed
forw
ard A
NN
–th
e in
put
stim
uli
is
giv
en t
o t
he
net
work
,
the
net
work
com
pute
s th
e w
eighte
d a
ver
age
from
all
the
input
unit
s an
d t
hen
pas
ses
the
aver
age
thro
ugh a
squash
funct
ion.
The
AN
N g
ener
ates
an o
utp
ut
subse
quen
tly.
The
AN
N m
ay h
ave
a num
ber
of hid
den
laye
rs, fo
r ex
ample
,
the
mult
i-net
per
ceptr
on, an
d t
he
outp
ut
from
eac
h h
idden
layer
bec
om
es t
he
input
to t
he
nex
t la
yer
forw
ard.
11
Back-propagation Algorithm
Ther
e ar
e tw
o swee
ps
of
the
full
y c
onnec
ted n
etw
ork
: fo
rward
swee
p a
nd b
ack
ward
swee
p.
Back
ward
s Swee
p:
This
sw
eep i
s si
mil
ar t
o t
he
forw
ard s
wee
p,
exce
pt
what
is
‘sw
ept’
are
the
erro
r val
ues
. T
hes
e val
ues
esse
nti
ally
are
the
dif
fere
nce
s bet
wee
n t
he
actu
al o
utp
ut
and a
des
ired
outp
ut.
The
AN
N m
ay h
ave
a num
ber
of hid
den
laye
rs, fo
r ex
ample
, th
e
mult
i-net
per
ceptr
on, an
d t
he
outp
ut
fro
m e
ach h
idden
lay
er
bec
om
es t
he
input
to t
he
nex
t la
yer
forw
ard.
In t
he
bac
kw
ard s
wee
p o
utp
ut
unit
sen
ds
erro
rs b
ack t
o t
he
firs
t
pro
xim
ate
hid
den
lay
er w
hic
h i
n t
urn
pas
ses
it o
nto
the
nex
t
hid
den
lay
er. N
o e
rror
signal
is
sent
to t
he
input
unit
s.
)(
jj
jo
de
−=
⇒
12
Back-propagation Algorithm
Fo
r ea
ch i
np
ut
vec
tor
asso
ciat
e a
targ
et o
utp
ut
vec
tor
wh
ile
no
t S
TO
P
ST
OP
= T
RU
E
for
each
in
pu
t v
ecto
r
•p
erfo
rm a
fo
rwar
d s
wee
p t
o f
ind
th
e ac
tual
ou
tpu
t
•ob
tain
an
err
or
vec
tor
by
co
mp
arin
g t
he
actu
al a
nd t
arg
et o
utp
ut
•if
th
e ac
tual
ou
tpu
t is
no
t w
ith
in t
ole
ran
ce s
et S
TO
P=
FA
LS
E
•u
se t
he
bac
kw
ard
sw
eep
to
det
erm
ine
wei
gh
t ch
ang
es
•up
dat
e w
eig
hts
13
Back-propagation Algorithm
Exam
ple
: P
erfo
rm a
com
ple
te f
orw
ard a
nd b
ackw
ard s
wee
p o
f
a 2-2
-1 (
2 i
nput
unit
s, 2
hid
den
lay
er u
nit
s an
d 1
outp
ut
unit
)
wit
h t
he
foll
ow
ing a
rchit
ectu
re. T
he
targ
et o
utp
ut d=
0.9
. T
he
input
is [
0.1
0.9
].
1 2
4 5
6
0
−0.2
0.3
0.3
−0.1
0.2
0.1
0.1
0.1
0.1
0.1
d0.
9
0.2
0.2
0.2
0.2
0.1
0.1
0.1
0.1
14
Back-propagation Algorithm
Exam
ple of a forw
ard
pass
and a back
ward
pass
thro
ugh a 2-2
-2-1
fee
dfo
rward
network
.
Inputs, outputs and err
ors
are
shown in boxes
.
0.2
88
-0.9
06
Forw
ard P
ass
-2 0.1
22
0.7
28
-1.9
72
0.9
86
1
3-2
3
0.9
93
0.5
00
-22
2-4
05
.00
02
2
0.1
0.9
3-2
-23
Input
to u
nit
Outp
ut
from
unit
0.1
25
-2 0.3
75
0.1
25
0.0
40
0.0
25
1
3-2
3
-0.1
10
-0.0
30
-22
2-4
-0.0
07
-0.0
01
22
3-2
-23
Tar
get
val
ue
is 0
.9 s
o e
rror
for
the
outp
ut
unit
is:
(0.9
00 –
0.2
88)
x 0
.288 x
(1 –
0.2
88)
= 0
.125
15
Back-propagation Algorithm
New
weights calculated followin
g the er
rors der
ived
above
-2 +
(0.8
x 0
.125)
x 1
= -
1.9
00
1.0
73 =
1 +
(0.8
x 0
.125)
x 0
.728
3 +
(0.8
x 0
.04)
x 1
= 3
.032
-1.9
8 =
-2 +
(0.8
x 0
.025)
x 1
3 +
(0.8
x 0
.125)
x 0
.122 =
3.0
12
-2 +
(0.8
x 0
.04)
x 0
.5 =
-1.9
84
2.0
19 =
2 +
(0.8
x 0
.025)
x 0
.993
2 +
(0.8
x –
0.0
07)
x 1
= 1
.994
1.9
99 =
2 +
(0.8
x –
0.0
01)
x 1
2.9
99 =
3 +
(0.8
x –
0.0
01)
x 0
.9-2
+ (
0.8
x –
0.0
07)
x 0
.1)
= -
2.0
01
-2.0
05 =
-2 +
(0.8
x –
0.0
07)
x 0
.93 +
(0.8
x –
0.0
01)
x 0
.1 =
2.9
99
-3.9
68 =
-4 +
(0.8
x 0
.04)
x 0
.993
2 +
(0.8
x 0
.025)
x 0
.5)
= 2
.010
16
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The
erro
r si
gnal
at
the
outp
ut
of
neu
ron j
at t
he
nth
trai
nin
g c
ycl
e is
giv
en a
s:
)(
)(
)(
ny
nd
ne
jj
j−
=T
he
inst
anta
neo
us
val
ue
of
erro
r en
ergy f
or
neu
ron j
is
)(
212
ne
j
The
tota
l er
ror
ener
gy E
(n)
can b
e co
mpute
d b
y s
um
min
g u
p
the
inst
anta
neo
us
ener
gy o
ver
all
the
neu
rons
in t
he
outp
ut
layer
neu
rons
outp
ut
all
incl
udes
Cse
tth
e
ne
nE
Cj
j;
)(
21)
(2
∑ ∈
=
17
Back-propagation Algorithm
Derivative or differential coefficient
For a function f(x) at the argument x, the limit of the
differential coefficient
as
∆x→0
x
xf
xx
f
∆−
∆+
)(
)(
y=f(x)
(x+
∆x,y+
∆y)
(x,y)
Q
P
∆y
∆x
x
Qap
pro
aches
Pas
lim
xy
dx
dy
∆∆=
18
Back-propagation Algorithm
Derivative or differential coefficient
Typically defined for a function of a single
variable, if the left and right hand limits
exist and are equal, it is the gradient of the
curve at x, and is the limit of the gradient
of the chord adjoining the points (x,f(x))
and (x+
∆x,f(x+ ∆x)).The function of x
defined as this limit for each argument xis
the first derivative y=f(x).
19
Back-propagation Algorithm
Partial derivative or partial differential coefficient
The derivative of a function of two or more
variables with respect to one of these
variables, the others being regarded as
constant; written as: xf ∂∂
20
Back-propagation Algorithm
Total Derivative
The derivative of a function of two or more
variables with regard to a single parameter
in terms of which these variables are
expressed as:
if z=f(x,y)with parametric equations:
x=U(t), y=V(t)
then under appropriate conditions the total
derivative is:
dt
dy
yz
dt
dx
xz
dt
dz
∂∂+
∂∂=
21
Back-propagation Algorithm
Total or Exact Differential
The differential of a function of two or more
variables with regard to a single parameter in terms
of which these variables are expressed, equal to the
sum of the products of each partial derivative of the
function with the corresponding increment. If
z=f(x,y), x=U(t), y=V(t)
then under appropriate conditions, the total
differential is:
dy
yzdx
xzdz
∂∂+
∂∂=
22
Back-propagation Algorithm
Chain rule of calculus
A theorem that may be used in the differentiation of
a function of a function
where yis a differentiable function of t, and tis a
differentiable function of x. This enables a function
of y=f(x)to be differentiated by finding a suitable
function x, such that fis a composition of yand yis
a differentiable function of u, and uis a
differentiable function of x.
dxdt
dt
dy
dx
dy
.=
23
Back-propagation Algorithm
Chain rule of calculus
Similarly for partial differentiation
where f is a function of uand uis a function of x.
xu
uf
xf
∂∂∂∂
=∂∂
.
24
Back-propagation Algorithm
Chain rule of calculus
Now consider the error signal at the output of a
neuron jat iteration n-i.e. presentation of the nth
training examples:
ej(n)=dj(n)-y j(n)
where djis the desired output and yjis the actual
output and the total error energy overall the
neurons in the output layer:
where ‘C’is the number of all the neurons in the
output layer
)(
21)
(2
ne
nE
Cj
j∑ ∈
=
25
Back-propagation Algorithm
Chain rule of calculus
If Nis the total number of patterns, the
averaged squared error energy is:
Note that ejis a function of yjand wij(the
weights of connections between neurons in
two adjacent layers ‘i’and ‘j’)
∑ =
=N n
av
nE
NE
1
)(
1
26
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The
tota
l er
ror
ener
gy E
(n)
can b
e co
mpute
d b
y s
um
min
g u
p
the
inst
anta
neo
us
ener
gy o
ver
all
the
neu
rons
in t
he
outp
ut
layer
;)
(21
)(
2∑ ∈
=C
jj
ne
nE
The
tota
l er
ror
ener
gy E
(n)
can b
e co
mpute
d b
y s
um
min
g u
p t
he
inst
anta
neo
us
ener
gy o
ver
all
the
neu
rons
in t
he
outp
ut
layer
wher
e th
e se
t C
incl
udes
all
the
neu
rons
in t
he
outp
ut
lay
er.
∑ =
=N
n
av
nE
NE
1
)(
1
Th
e in
stan
tan
eou
s er
ror
ener
gy, E(n
)an
d t
her
efore
th
e
aver
age
ener
gy E
av
is a
fu
nct
ion
of
the
free
para
met
ers,
incl
ud
ing s
yn
ap
tic
wei
gh
ts a
nd
bia
s le
vel
s
27
Back-propagation Algorithm
The originators of the BP algorithmsuggest
that
where ηis the learning rate parameter of the BP
algorithm. The minus sign indicates the use of
gradient descent in the weight; seeking a direction
for weight change that reduces the value of E(n)
ij
ijwE
nw
∂∂−
=∆
η)
(
)(
)(
)(
ny
nn
wi
jij
ηδ
=∆
∴
28
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
)(
of
funct
ion
ais
)(
)(
of
funct
ion
ais
)(
)(
of
funct
ion
ais
)(
)(
of
funct
ion
ais
)(
nw
nv
nv
ny
ny
ne
ne
nE
jij
jj
jj
j
jv
E(n
) is
a f
un
ctio
n o
f a f
un
ctio
n o
f a f
un
ctio
n o
f a f
un
ctio
n o
f w
ji(n
)
29
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The back-propagation algorithm trains a
multilayer perceptron by propagating back
some measure of responsibilityto a hidden
(non-output) unit.
Back-propagation:
•Is a local rule for synaptic adjustment;
•Takes into account the position of a neuron in
a network to indicate how a neuron’s weight are
to change
30
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
Layers in back-propagating multi-layer perceptron
1.First layer –comprises fixedinput units;
2.Second (and possibly several subsequent layers)
comprise trainable‘hidden’units carrying an internal
representation.
3.Last layer –comprises the trainableoutput units of
the multi-layer perceptron
31
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
Modern back-propagation algorithms are based on a
formalism for propagating back the changes in the error
energy E, with respect to all the weights wijfrom a unit
(i) to its inputs (j).
More precisely, what is being computed during the
backwards propagation is the rate of change of the error
energy Ewith respects to the networks weight. This
computation is also called the computation of the
gradient:
ijwE
∂∂
32
Back-propagation Algorithm
A der
ivation of the BP alg
orith
mMore precisely, what is being computed during the backwards
propagation is the rate of change of the error energy Ewith
respects to the networks weight. This computation is also called
the computation of the gradient:
ij
jj
j
jj
ij
jj
j
ij
jj
j
ij
jj
ij
w
yd
yd
w
yd
w
yd
we
wE
∂
−∂
−=
∂
∑−
∂=
∂
∑−
∂=
∂∑∂
=∂∂
∑)
().
(.
2.21
))
((
21
)2
/)
((
)2
/(
2
22
33
Back-propagation Algorithm
A der
ivation of the BP alg
orith
mMore precisely, what is being computed during the backwards
propagation is the rate of change of the error energy Ewith
respects to the networks weight. This computation is also called
the computation of the gradient:
∑∑∑
∂∂−
=
∂∂−
=
∂∂−
∂∂=∂∂
jijj
j
ijj
j
j
ijj
ijj
j
j
ij
wye
wye
wy
wde
wE
0.
)(
.
34
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The back-propagation learning rule is formulated
to change the connection weights wijso as to
reduce the error energy E by gradient descent
∑∑
∂∂−
=∂∂
=
∂∂−
=
−=
∆
jijj
jj
jijj
jij
old
ij
new
ijij
wyy
dwy
ewE
ww
w
)(
ηη
η
35
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The
erro
r en
ergy E
is a
funct
ion o
f th
e er
ror e,
the
outp
ut y,
the
wei
ghte
d s
um
of
all
the
inputs
v, a
nd o
f th
e w
eights
wij:
Acc
ord
ing t
o t
he
chai
n r
ule
then
:
),
,,
(j
jji
jv
yw
eE
E≡
∴
jij
jji
we
e
nE
w
nE
∂∂
∂∂=
∂∂)
()
(
36
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
Furt
her
appli
cati
ons
of
the
chai
n r
ule
sugges
ts t
hat
:
jv
jij
jj
jj
jji
jij
jj
jji
wv
vy
ye
e
nE
w
nE
wy
ye
e
nE
w
nE
∂∂
∂∂
∂∂
∂∂=
∂∂
∂∂
∂∂
∂∂=
∂∂
)(
)(
)(
)(
37
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
Eac
h o
f th
e par
tial
der
ivat
ives
can
be
sim
pli
fied
as:
jv
[]
()
()
)(
)(
)(
)(
)(
1
)(
21)
(
'
2
ny
ny
nw
wwv
nv
nv
vvy
yd
yye
en
ee
e
nE
i
i
iji
jijij
jj
jjj
jj
jjj
j
Cj
j
jj
=
∂∂
=∂∂
=∂∂
=∂∂
−=
−∂∂
=∂∂
=∂∂
=∂
∂
∑
∑ ∈
ϕϕ
38
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The
rate
of
chan
ge
of
the er
ror en
ergy E
wit
h r
espec
t to
chan
ges
in th
e sy
nap
tic
wei
ghts
is:
jv
()
jj
jji
yn
vn
ew
nE
)(
)(
)(
'ϕ
−=
∂∂
39
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The
so-c
alle
d d
elta
rule
sugges
ts t
hat
jv
)(
)(
)(
∆
)(
)(
∆
nn
yn
w
or
w
nE
nw
jji
jiji
δη
−=
∂∂η
−=
δis
cal
led
th
e lo
cal
gra
die
nt
40
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
Th
e lo
cal
gra
die
ntδ
is g
iven
by
jv
))(
()
()
(
)(
)(
'n
vn
en
or
v
nE
n
jj
jj
jj
ϕ=
δ
∂∂−
=δ
41
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The
case
of
the
outp
ut
neu
ron j:
jv
)(
))(
()
()
(∆
'n
yn
vn
en
wj
jj
jji
ϕη
−=
Th
e w
eig
ht
ad
just
men
t re
qu
ires
th
e
com
pu
tati
on
of
the e
rro
r si
gn
al
42
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
Th
e ca
se o
f th
e o
utp
ut
neu
ron
j:
Th
e so
-cal
led
del
ta
rule
sug
ges
ts t
hat
jv
−+
−
−+
=
−
−+
=
−+
=
−=
))(
exp(
1
11
))(
exp(
1
1))
((
)(
exp(
.))
(ex
p(
1
1))
((
)(
..
.))
((
))(
exp(
1
1))
((
)(
))(
()
()
(∆
'
2
'
'
nv
nv
nv
nv
nv
nv
nv
tr
wn
vof
Der
ivative
nv
nv
funct
ion
Logis
itic
ny
nv
ne
nw
jj
j
j
j
j
jj
j
j
jj
jj
ji
ϕϕ
ϕ
ϕ
ϕη
43
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
() (
))
()
(1
)(
)(
)(
∆
)(
1)
())
((
))(
()
(
)(
))(
()
()
(∆
'
'
ny
ny
ny
ne
nw
and
ny
ny
nv
nv
ny
ny
nv
ne
nw
jj
jj
ji
jj
j
jj
jj
jj
ji
−−
=
−=
∴
=
−=
η
ϕ
ϕ
ϕη
Th
e ca
se o
f th
e o
utp
ut
neu
ron
j:
Th
e so
-cal
led
del
ta
rule
sug
ges
ts t
hat
44
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The case of the hidden neuron (j) -
the delta rule suggests that
)(
))(
()
()
(∆
'n
yn
vn
en
wi
jj
jji
ϕη
−=
The weight adjustment requires the computation of
the error signal, but the situation is complicatedin
that we do not have a (set of) desired output(s) for
the hidden neurons and consequently we will have
difficulty in computing the error signal ejfor the
hidden neuron j.
45
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The case of the hidden neurons (j):
Recall that the local gradient of the
hidden neuron jis given as δjand that y j
is the output and equals
))(
()
(
)(
)(
)(
)(
)(
)(
)(
)(
)(
'n
vn
y
nE
n
nv
ny
ny
nE
nv
nE
n
j
j
j
jj
jj
j
ϕδδ
∂∂−
=
∂∂
∂∂−
=∂∂
−=
We
ha
ve
use
d t
he
cha
in r
ule
of
calc
ulu
s
her
e
' jϕ
46
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The case of the hidden neurons (j): The ‘error’energy
related to the hidden neurons is given as
)(
21)
(2
ne
nE
k
k∑
=
The rate of change of the error (energy) with respect to
the input during the backward pass –the input during
the pass yk(n)-is given as: ∑
∑
∂∂=
∂∂
∂∂=
∂∂
kj
kk
j
kj
k
j
ny
ne
en
y
nE
ny
ne
ny
nE
)(
)(
)(
)(
)(
)(
21
)(
)(
2
47
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The case of the hidden neurons (j):The rate of change
of the error (energy) with respect to the input during
the backward pass –the input during the pass yk(n)-is
given as:
)(
)(
)(
)(
)(
)(
)(
ny
nv
nv
ne
ne
ny
nE
jj
kjk
k
j∂∂
∂∂=
∂∂∑
48
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
The
case
of
the
hid
den
neu
rons
(k)-
the
del
ta r
ule
sugges
ts t
hat
)(
))(
()
()
(∆
'n
yn
vn
en
wj
kk
kki
ϕη
−=
Th
e er
ror s
ign
al
for
a h
idd
en n
euro
n h
as,
th
us,
to
be
com
pu
ted
rec
urs
ivel
y i
n t
erm
s o
f th
e e
rro
r
sig
nals
of
AL
L t
he
neu
ron
s to
wh
ich
th
e h
idd
en
neu
ron
jis
dir
ectl
y c
on
nec
ted
. T
her
efo
re, th
e b
ack
-
pro
pa
ga
tio
n a
lgo
rith
m b
eco
mes
co
mp
lica
ted
.
49
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
Th
e ca
se o
f th
e h
idd
en n
euro
ns
(j)
Th
e lo
cal
gra
die
ntδ
is g
iven
by
))(
()
()
(
)(
)(
'n
vn
en
or
v
nE
n
jj
jj
jj
ϕ=
δ
∂∂−
=δ
50
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
Th
e ca
se o
f th
e h
idd
en n
euro
ns
(j)
Th
e lo
cal
gra
die
ntδ
is g
iven
by
))(
()
()
(
)(
)(
)(
)(
'n
vy
nE
n
vy
y
nE
n
as
redef
ined
be
can
whic
h
v
nE
n
j
j
j
jj
j
j
j
j
ϕδδδ
∂∂
−=
∂∂
∂∂
−=
∂∂
−=
51
Back-propagation Algorithm
A der
ivation of the BP alg
orith
m
Th
e ca
se o
f th
e h
idd
en n
euro
ns
(j)
Th
e er
ror
ener
gy f
or
the
hid
den
neu
ron c
an b
e
giv
en a
s
No
te t
hat
we
hav
e u
sed t
he
index
kin
stea
d o
f
the
ind
ex j i
n o
rder
to
av
oid
con
fusi
on
.
;)
(21
)(
2
∑ ∈
=C
k
kn
en
E