ch7p-bpn

Upload: anitha-perumalsamy

Post on 04-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 ch7P-BPN

    1/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    Learning in Recurrent NetworksLearning in Recurrent Networks

    CHAPTERCHAPTER VIIVII

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    Introduction

    We have examined the dynamics of recurrent neural networks in detail in Chapter 2.

    Then in Chapter 3, we used them as associative memory with fixed weights.

    In this chapter, the backpropagation learning algorithm that we have considered forfeedforward networks in Chapter 6 will be extended to recurrent neural networks[Almeida 87, 88].

    Therefore, the weights of the recurrent network will be adapted in order to use it as

    associative memory.

    Such a network is expected to converge to the desired output pattern when theassociated pattern is applied at the network inputs.

  • 7/31/2019 ch7P-BPN

    2/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Consider the recurrent system shown in theFigure 7.1, in which there are n neurons,some of them being input units, and someothers outputs.

    In such a network, the units, which are neitherinput nor output, are called hidden neurons.

    u1

    u2

    1

    uj

    uN

    ...

    ...

    ...

    ...

    x1

    x2

    xi

    xM

    input hidden output

    neurons neurons neurons

    Figure 7.1 Recurrent networkarchitecture

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    We will assume a network dynamic defined as:

    (7.1.1)

    This may be written equivalently as

    (7.1.2)

    through a linear transformation.

    )( ++=j

    ijjii

    i xwfxdt

    dx

    da

    dta w f ai i ji i i

    j

    = + + ( )

  • 7/31/2019 ch7P-BPN

    3/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Our goal is to update the weights of the network so that it will be able to remember

    predefined associations, k=(uk,yk), ukRN, ykRN, k=1.. K.

    With no loss of generality, we extended here the input vector u such that ui=0 if theneuron i is not an input neuron. Furthermore, we will simply ignore the outputs of the

    unrelated neurons.

    We apply an input uk to the network by setting

    (7.1.3)

    Therefore, we desire the network with an initial state x(0)=xk0 to converge to

    xk()=xk =yk (7.1.4)

    whenever ukis applied as input to the network.

    Niu kii ..1==

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    The recurrent backpropagation algorithm, updates the connection weights aiming tominimize the error

    (7.1.5)

    so that the mean error is also minimized

    e = (7.1.6)

    ek

    iik= 12 2( )

  • 7/31/2019 ch7P-BPN

    4/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Notice that, ekand eare scalar values while k is a vector defined as

    k=yk-xk (7.1.7)

    whose ith component ik, i=1..M, is

    (7.1.8)

    In equation (7.1.8) the coefficient i used to discriminate between the output neurons andthe others by setting its value as

    (7.1.9)

    Therefore, the neurons, which are not output, will have no effect on the error.

    ik

    i ik

    ik

    y x= ( )

    =otherwise

    neuronoutputanisiifi

    0

    1

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Notice that, if an input uk is applied to the network and if it is let to converge to a fixedpoint xk, the error depends on the weight matrix through these fixed points. The learning

    algorithm should modify the connection weights so that the fixed points satisfy(7.1.10)

    For this purpose, we let the system to evolve in the weight space along trajectories in theopposite direction of the gradient, that is

    (7.1.11)

    In particular wij should satisfy

    (7.1.12)

    Here is a positive constant named the learning rate, which should be chosen so small.

    x yik

    ik =

    k

    dt

    de=

    w

    d w

    dt w

    ijk

    ij

    =

    e

  • 7/31/2019 ch7P-BPN

    5/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Since,

    (7.1.13)

    the partial derivative of ekgiven in Eq. (7.1.5) with respect to wsrbecomes:

    (7.1.14)

    i i i=

    =i sr

    k

    ik

    i

    sr

    k

    w

    x

    w

    e

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    On the other hand, since xk is a fixed point, it should satisfy

    (7.1.15)

    for which Eq. (7.1.1) becomes

    (7.1.16)

    Therefore ,

    (7.1.17)

    where

    (7.1.18)

    dx

    dt

    ik

    = 0

    x f w x uik

    ji jk

    ik

    j

    = +( )

    x

    wf a x w

    ww x

    w

    ik

    srik

    jk

    j

    ji

    srji

    jk

    sr

    = +( ) ( )

    += =

    j

    ukj

    xij

    wa

    k

    i kida

    afdaf

    )()(

  • 7/31/2019 ch7P-BPN

    6/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Notice that,

    (7.1.19)

    where ij is the Kronecker delta which have value 1 if i=j and 0 otherwise, resulting

    (7.1.20)

    Hence,

    (7.1.21)

    By reorganizing the above equation, we obtain

    (7.1.22)

    w

    w

    ij

    srjs ir=

    x xjk

    j

    js ir ir sk =

    x

    wf a x w

    x

    w

    ik

    srik

    ir sk

    j

    jijk

    sr

    = + ( )( )

    = ksirkisr

    k

    j

    ji

    j

    k

    i

    sr

    k

    i xafw

    xwaf

    w

    x

    )()(

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Remember

    (7.1.22)

    Notice that,

    (7.1.23)

    Therefore, Eq. (7.1.22), can be written equivalently as,

    (7.1.24)

    or,

    (7.1.25)

    x

    w

    x

    w

    ik

    srji

    jk

    srj

    =

    = ksirkisr

    kj

    ji

    j

    k

    i

    sr

    k

    i

    j

    ji xafw

    xwaf

    w

    x

    )()(

    = kskiirsr

    k

    jk

    ijiji

    j

    xafw

    xafw )())(((

    = ksirkisr

    k

    j

    ji

    j

    k

    i

    sr

    k

    i xafw

    xwaf

    w

    x

    )()(

  • 7/31/2019 ch7P-BPN

    7/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Remember

    (7.1.25)

    If we define matrix Lkand vector Rksuch that

    (7.1.26)

    and

    (7.1.27)

    the equation (7.1.25) results in

    (7.1.28)

    L f a wijk

    ij ik

    ji = ( )

    )( = kiirk

    i afR

    = ks

    kk

    sr

    kx

    wRxL

    = kskiirsr

    k

    jk

    ijiji

    j

    xafw

    xafw )())(((

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Hence, we obtain,

    (7.1.29)

    In particular, if we consider the ith row we observe that

    (7.1.30)

    Since

    (7.1.31)

    we obtain

    (7.1.32)

    = ks

    kk

    sr

    xw

    RLx 1)(

    wx L R x

    srik k

    ijj

    j sk = ( ( ) )1

    ==j

    krir

    kkjjrij

    k

    j

    jijk afLafLRL )()()()()( 111

    wx L f a x

    srik k

    ir rk

    sk = ( ) ( )1

  • 7/31/2019 ch7P-BPN

    8/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Remember

    (7.1.12)

    (7.1.14)

    (7.1.32)

    Insertion of (7.1.32) in equation (7.1.14) and then (7.1.12), results in

    (7.1.33)d wdt

    L f a xsr

    iik k

    ir rk

    sk= ( ) ( )1

    d w

    dt w

    ijk

    ij

    =

    e

    =i sr

    k

    ik

    i

    sr

    k

    w

    x

    w

    e

    wx L f a x

    srik k

    ir rk

    sk = ( ) ( )1

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    When the network with input uk has converged to xk, the local gradient for recurrentbackpropagation at the output of the rth neuron may be defined in analogy with the

    standard backpropagation as

    (7.1.34)

    So, it becomes simply

    (7.1.35)

    1)()( = irkkii

    k

    r

    k

    r Laf

    d w

    dtxsr rk sk=

  • 7/31/2019 ch7P-BPN

    9/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    In order to reach the minimum of the error ek, instead of solving the above equation, weapply the delta rule as it is explained for the steepest descent algorithm:

    w(k+1)=w(k)-ek (7.1.36)

    in which

    (7.1.37)

    for s=1..N, r=1..N

    The recurrent backpropagation algorithm for recurrent neural network is summarized inthe following.

    +=+ ksk

    rsrsr xkwkw )()1(

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Step 0. Initialize weights:to small random values

    Step 1. Apply a sample:apply to the input a sample vector ukhaving desired output vector yk

    Step 2. Forward Phase:Let the network relax according to the state transition equation

    to a fixed point xk

    )( k

    i

    k

    jj ji

    k

    i

    k

    i

    uxwfxxdt

    d++=

  • 7/31/2019 ch7P-BPN

    10/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.1. Recurrent Backpropagation

    Step 3. Local Gradients:Compute the local gradient for each unit as:

    Step 4. Update weights according to the equation

    Step 5. Repeat steps 1-4 for k+1, until mean error

    is sufficiently small

    rk

    rk

    ik

    i

    kir

    f a = ( ) ( )L 1

    +=+ ksk

    rsrsr xkwkw )()1(

    >>=

  • 7/31/2019 ch7P-BPN

    11/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.2 Backward Phase

    Remember(7.1.34)

    The local gradient given in Eq (7.1.34) can be redefined as(7.2.1)

    by introducing a new vector variable v into the system whose rth component defined by

    the equation

    (7.2.2)

    in which * is used instead of in the right handside to denote the fixed values of therecurrent network in order to prevent confusion with the fixed points of the adjointnetwork.

    They have constant values in the derivations related to the fixed-point vkof the adjointdynamic system.

    = krk

    r

    k

    r vaf )(*

    vrk

    i

    kir i

    k = ( )* *L 1

    1)()( = irkkii

    k

    r

    k

    r Laf

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.2 Backward Phase

    The equation (7.2.2) may be written in the matrix form as

    vk=((Lk*)-1)Tk* (7.2.3)

    or equivalently

    (Lk*)Tvk= k*. (7.2.4)

    that implies

    (7.2.5)** k

    r

    k

    j

    j

    k

    jrvL =

  • 7/31/2019 ch7P-BPN

    12/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.2 Backward Phase

    Remember(7.1.26)

    By using the definition of Lij given in Eq. (7.1.26), we obtain,

    (7.2.6)

    that is

    (7.2.7)

    Such a set of equations may be assumed as a fixed-point solution to the dynamicalsystem defined by the equation

    (7.2.8)

    ** ))(( krk

    jrj

    k

    j

    j

    jr vwaf =

    **)(0

    k

    r

    k

    jrj

    j

    k

    j

    k

    r vwafv ++=

    L f a wijk

    ij ik

    ji = ( )

    ** )( krjj

    rj

    k

    jrr vwafv

    dt

    dv++=

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.2 Backward Phase

    Remember(7.2.1)

    Therefore vk and then k in equation (7.2.1) can be obtained by the relaxation of theadjoint dynamical system instead of computing L-1.

    Hence, a backward phase is introduced to the recurrent backpropagation as summarized

    in the following:

    = krk

    r

    k

    r vaf )(*

  • 7/31/2019 ch7P-BPN

    13/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.2 Backward Phase:Recurrent BP having backward phase

    Step 0. Initialize weights: to small random values

    Step 1. Apply a sample: apply to the input a sample vector ukhaving desired outputvector yk

    Step 2. Forward Phase:Let the network to relax according to the state transition equation

    to a fixed point xk

    Step 3. Compute:

    d

    dtx t x f w x u

    ik

    ik

    ji

    jjk

    ik( ) ( )= + +

    Nixy

    a

    faf

    uxwaa

    k

    i

    k

    ii

    k

    i

    k

    i

    kiaa

    k

    i

    ki

    j

    kjji

    ki

    ki

    ..1)(

    )(

    *

    **'

    *

    ===

    =

    +==

    =

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.2 Backward Phase

    Step 4. Backward phase for local gradients :Compute the local gradient for each unit as:

    where is the fixed point solution of the dynamic system defined by theequation:

    Step 4. Weight update: update weights according to the equation

    Step 5. Repeat steps 1-4 for k+1, until the mean error

    is sufficiently small.

    = krk

    r

    k

    r vaf )(*

    vrk

    ** )()( krj

    j

    rj

    k

    jrr tvwafv

    dt

    dv++=

    w k w k xsr sr rk

    sk( ) ( )+ = + 1

    >>=

  • 7/31/2019 ch7P-BPN

    14/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.3. Stability of Recurrent Backpropagation

    Remember

    (7.1.1)

    (7.2.8)

    Due to difficulty in constructing a Lyapunov function for recurrent backpropagation, alocal stability analysis [Almeida 87] is provided in the following. In recurrentbackpropagation, we have two adjoint dynamic systems defined by Eqs. (7.1.1) and(7.2.8).

    Let x* and v* be stable attractors of these systems.

    Now we will introduce small disturbances x and v at these stable attractors andobserve the behaviors of the systems.

    )( ++=j

    ijjiii xwfx

    dt

    dx

    **)( krj

    j

    rj

    k

    jrr vwafv

    dt

    dv++=

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.3. Stability of Recurrent Backpropagation

    First, consider the dynamic system defined by the Eq. (7.1.1) for the forward phase and

    insert x*+x instead of x, which results in:

    (7.3.1)

    satisfying

    (7.3.2)

    If the disturbance x is small enough, then a function g (.) at x*+ x can be linearizedapproximately by using the first two terms of the Taylor expansion of the function aroundx*, which is

    (7.3.3)

    where is the gradient of g(.) evaluated at x*.

    ))(()()( *** ijjj

    jiiiii uxxwfxxxxdt

    d++++=+

    x f w x ui ji

    jj i

    * *( )= +

    xxxxx ++ T*** )()()( ggg

    g( )*x

  • 7/31/2019 ch7P-BPN

    15/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.3. Stability of Recurrent Backpropagation

    Therefore, f(.) in Eq. (7.3.1) can be approximated as

    (7.3.4)

    where f'(.) is the derivative of f(.).

    Notice that

    (7.3.5)

    Therefore, insertion of Eqs. (7.3.2) and (7.3.5) in equation (7.3.4) results in

    (7.3.6)

    +++=

    ++

    j

    jjii

    j

    jji

    j

    ijji

    ijj

    j

    ji

    xwuxwfuxwf

    uxxwf

    )()(

    ))((

    **

    *

    a w x ui ji

    ji i

    * *= +

    f w x x u x f a w xjij

    j j i i i ji j

    j

    ( ( ) ) ( )* * * + + = +

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.3. Stability of Recurrent Backpropagation

    Furthermore, notice that

    (7.3.7)

    Therefore, by inserting equations (7.3.6) and (7.3.7) in equation (7.3.1), it becomes

    (7.3.8)

    This may be written equivalently as

    (7.3.9)

    Referring to the definition of Lijgiven by Eq. (7.1.26), it becomes

    (7.3.10)

    d

    dtx x

    d

    dtx

    i i i( )* + =

    +=

    j

    jjiiii xwafx

    dt

    xd)( *

    d x

    dtf a w xi ij i ji j

    j

    = ( ( ) )*

    d x

    dtL xi

    ij j

    j

    = *

  • 7/31/2019 ch7P-BPN

    16/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    EE543 - ANN - CHAPTER 7

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.3. Stability of Recurrent Backpropagation

    In a similar manner, the dynamic system defined for the backward phase by Eq. (7.2.8) at

    v*+v becomes

    (7.3.11)

    satisfying

    (7.3.12)

    When the disturbance v in is small enough, then linearization in Eq. (7.3.11) results in

    (7.3.13)

    This can be written shortly

    (7.3.14)

    ***** )()()()(ijjij

    j

    jiiiivvwafvvvv

    dt

    d++++=+

    **** )( ijj

    ijji vwafv +=

    =

    j

    jijjjii vwaf

    dt

    vd))(( *

    d v

    dtL vi

    ji j

    j

    = *

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.3. Stability of Recurrent Backpropagation

    In matrix notation, the equation (7.3.10) may be written as

    (7.3.15)

    In addition, the equation (7.3.14) is

    (7.3.16)

    If the matrix L* has distinct eigenvalues, then the complete solution for the system of

    homogeneous linear differential equation given by (7.3.15) is in the form

    (7.3.17)

    where j is the eigenvector corresponding to the eigenvalue j of L* and j is any realconstant to be determined by the initial condition.

    d

    dt x L x= *

    d

    dt

    vL v= ( )* T

    tj

    j

    jj et

    = )(x

  • 7/31/2019 ch7P-BPN

    17/17

    Ugur HALICI - METU EEE - ANKARA 11/18/20

    CHAPTERCHAPTER VI :VI :Learning inLearning inRecurrentRecurrentNetworksNetworks

    7.3. Stability of Recurrent Backpropagation

    On the other hand, since L*T has the same eigenvalues as L*, the solution (7.3.16) willbe the same as given in Eq. (7.3.17) except the coefficients, that is

    (7.3.18)

    If it is true that each j has a positive real value then the convergence of both x(t) andy(t) to vector 0 are guaranteed.

    It should be noticed that, if weight vector w is symmetric, it has real eigenvalues.

    Since L can be written as

    (7.3.19)

    where D(ci) represents diagonal matrix having ith diagonal entry as ci, real eigenvalues of

    W imply that they are also real for L.

    tj

    j

    jj et

    = )(v

    WDDL ))(()1( iaf=