neura networks for solving systems of linear

Upload: luthfya-umahati

Post on 03-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    1/56

    Ar ti ficial Neural Networks (Spring 2007)

    Neural Networks for Solving Systems of

    Linear Equations

    Seyed Jalal Kazemitabar

    Reza Sadraei

    Instructor: Dr. Saeed Bagheri

    Artificial Neural Networks Course (Spring 2007)

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    2/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Outline

    Historical Introduction

    Problem FormulationStandard Least Squares Solution

    General ANN Solution

    Minimax Solution

    Least Absolute Value Solution

    Conclusion

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    3/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Outline

    Historical Introduction

    Problem Formulation

    Standard Least Squares Solution

    General ANN Solution

    Minimax Solution

    Least Absolute Value Solution

    Conclusion

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    4/56Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    History

    70s:

    Kohonen solved optimization problemsusing Neural Networks.

    80s:

    Hopfield used Lyapunov function (Energy

    function) for proving the convergence of

    iterative methods in optimization problems.

    Differential Eq. Neural Networksmapping

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    5/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    History

    Many problems in science and engineeringinvolve solving a large system of linear

    equations:Machine Learning

    Physics

    Image Processing

    Statistics,

    In many applications an on-line solution ofa set of linear equations is desired.

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    6/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    History

    40s:

    Kaczmarz introduced a method to solve linear

    equations

    50s 80s:

    Different methods based on Kaczmarzs hasbeen proposed in different fields.

    Conjugate Gradient method.

    No good method for on-line solution of

    large systems.

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    7/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    1990:

    Andrzej Cichocki:a Mathematician who received

    his PhD in Electrical

    Engineering

    Proposed a Neural Network

    for solving systems of linearequations in real time

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    8/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Outline

    Historical Introduction

    Problem Formulation

    Standard Least Squares Solution

    General ANN Solution

    Minimax Solution

    Least Absolute Value Solution

    Conclusion

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    9/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Problem Formulation

    Linear Parameter Estimation model :

    : Linear Equation

    : Model matrix

    : Unknown vector of the systemparameters to be estimated

    : Vector of observations

    : Unknown measurement errors: Vector of true values (usually unknown)

    nm

    ij R]a[A =

    truebrbAx =+=

    mRb

    m

    r mtrue Rb

    nTn21 R]x,...,x,x[x =

    =

    +

    =

    m

    2

    1

    true

    true

    true

    m

    2

    1

    m

    2

    1

    n

    2

    1

    mn2m1m

    n22221

    n11211

    b

    b

    b

    r

    r

    r

    b

    b

    b

    x

    x

    x

    aaa

    aaa

    aaa

    MMMM

    L

    MOM

    L

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    10/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Types of Equations

    A set of linear equations is said to be overdetermined ifm > n. Usually inconsistent due to noise and errors.

    e.g. Linear parameter estimation problems arising in signal

    processing, biology, medicine and automatic control.

    A set of linear equations is said to be underdetermined ifm < n (due to the lack of information). Inverse and extrapolation problems.

    Involves much less problems than overdetermined case

    nm

    ij R]a[A =

    truebrbAx =+=

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    11/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Mathematical Solutions

    Why not use ?

    It is not applicable since mn most of the time whichresults in irreversibility ofA.

    What if we use least square error method?

    Inversing is considered to be time consuming for

    largeA in real-time systems.

    bAx -1=

    ;bA)AA(x

    ,bAAxA

    ,0)bAx(A'y),bAx()bAx(y

    T1T

    TT

    T

    T

    =

    =

    ===

    AAT

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    12/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Outline

    Historical Introduction

    Problem Formulation

    Standard Least Squares Solution

    General ANN Solution

    Minimax Solution

    Least Absolute Value Solution

    Conclusion

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    13/56

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    14/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Gradient Descent Approach

    Basic idea: compute a trajectory starting at the initial pointthat has the solution x* as a limit point ( for )

    General gradient approach for minimization of a function:

    is chosen in a way that ensures the stability of the differential equationsand an appropriate convergence speed

    t)t(x

    )0(x

    )x(Edt

    dX

    =

    =

    n

    2

    1

    mn2m1m

    n22221

    n11211

    n

    2

    1

    xE

    xE

    xE

    dtdx

    dt

    dxdt

    dx

    M

    L

    MOM

    L

    M

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    15/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Solving LE Using Least Squares Criterion

    Gradient of the energy function:

    So

    Scalar representation:

    )bAx(Ax

    E

    x

    E

    x

    EE

    TT

    n21

    =

    = L

    n,...,2,1j,x)0(x

    bxaadt

    dx

    )0(

    jj

    n

    1p

    n

    1k

    ikik

    m

    1i

    ipjp

    j

    ==

    =

    = ==

    )bAx(AdtdX T =

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    16/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    = ==

    =

    n

    1p

    n

    1k

    ikik

    m

    1i

    ipjp

    jbxaa

    dt

    dx

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    17/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    ANN With Identity Activation Function

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    18/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Outline

    Historical Introduction

    Problem Formulation

    Standard Least Squares Solution

    General ANN Solution

    Minimax Solution

    Least Absolute Value Solution

    Conclusion

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    19/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    General ANN Solution

    The key step in designing an algorithm for

    neural networks:

    Construct an appropriate computational energyfunction (Lyapunov function)

    Lowest energy state will correspond to the

    desired solutionx*

    Using derivation, the energy function

    minimization problem is transformed into a set

    of ordinary differential equations

    )x(E

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    20/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    General ANN Solution

    In general, the optimization problem can be formulated

    as:

    Find the vector that minimizes the energy function

    is called weighting function.Weighting function derivation is called activation function

    nRx *

    ))x(r()bxA()x(E

    m

    1i

    i

    m

    1i

    ii == ==

    ))x(r( i

    ii

    i

    ii r

    E

    r

    )r(

    )r(g

    =

    =

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    21/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    General ANN Solution

    Gradient descent approach:

    The minimization of the energy function leads to the set ofdifferential equation

    )x(Edt

    dX =

    =

    n

    2

    1

    mn2m1m

    n22221

    n11211

    n

    2

    1

    xE

    xE

    xE

    dtdx

    dtdx

    dtdx

    M

    L

    MOM

    L

    M

    =

    =

    =

    = ==

    ===

    m

    1i

    n

    1k

    ikikiip

    n

    1p

    jp

    j

    m

    1i ip

    in

    1pjp

    p

    n

    1pjp

    j

    bxagadt

    dx

    r

    E

    x

    r

    x

    E

    dt

    dx

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    22/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    General ANN Architecture

    = = ==

    m

    1i

    n

    1k

    ikikiip

    n

    1p

    jpj bxaga

    dtdx

    Remember that this is

    he activation function

    g1

    g2

    gm

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    23/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Drawbacks of Least Square Error Criterion

    Why not always use least square energyfunction? Not so good in case of existence of large outliers.

    Only optimal for Gaussian distribution of error.

    The proper choice of the criterion depends on Specific applications.

    Distribution of the errors in the measurement vector b

    Gaussian dist*. Least squares criterion Uniform dist. Chebyshev norm criterion

    *However the assumption that the set of measurements or observations has aGaussian error distribution is frequently unrealistic due to different sources oferrors such as instrument errors, modeling errors, sampling errors, and humanerrors.

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    24/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Hubers Function:

    Weighting Function Activation Function

    Special Energy Functions

    >

    =

    e:

    2

    e

    e:2

    e

    )e(2

    2

    H

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    25/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Special Energy Functions

    Talvars Function:

    This Function has direct implementation

    Weighting Function Activation Function

    >

    =

    e:

    2

    e:2

    e

    )e(2

    2

    T

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    26/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Special Energy Functions

    Logistic Function:

    Iterative Reweigheted method uses this activation

    function.

    Weighting Function Activation Function

    =

    eCoshln)e(

    2

    L

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    27/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Special Energy Functions

    Lp-normed function:

    Activation Function

    ==m

    1i

    pip r

    p1)x(E

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    28/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Lp-Norm Energy Functions

    A well-known criterion is

    energy functionNormL1

    Weighting Function Activation Function

    =

    =m

    1i

    i1 )x(r)x(E

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    29/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Special Energy Functions

    Another well-known criterion is(chebyshev) criterion which can be

    formulated as the minimax problem:

    This criterion is optimal for uniform distribution

    of error.

    NormL

    { })x(rmaxmin imi1Rx

    n

    O

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    30/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Outline

    Historical Introduction

    Problem Formulation

    Standard Least Squares Solution

    General ANN Solution

    Minimax Solution

    Least Absolute Value Solution

    Conclusion

    Mi i (L N ) C i i

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    31/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Minimax (L

    -Norm) Criterion

    For the casep= of theLp-Norm problem the activationfunction g[ri(x)] can not be explicitly mathematicallyexpressed by

    Error function can be define as

    resulting in following activation function:

    mi1

    i })x(rmax{)x(E

    =

    1)( p

    i xr

    ==

    otherwise0

    })x(r{max)x(rif)]x(r[sign)]x(r[g

    kmk1

    ii

    i

    Mi i (L N ) C it i

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    32/56

    Jalal Kazemitabar

    Reza Sadraei Artificial Neural Networks (Spring 2007)

    Minimax (L

    -Norm) Criterion

    Although straightforward, some problems arise inpractical implementations of the system of

    differential equations:

    Exact realization of the signum functions is ratherdifficult (electrically).

    E

    has a derivative discontinuity atxif for some i k

    *This is often responsible for various anomalousresults (e.g. hysteresis phenomena)

    )()()( xExrxr ki ==

    T f i th bl t i l t

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    33/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Transforming the problem to an equivalent one

    Rather than directly implementing the proposed system, we transformthe minimax problem

    into an equivalent one:

    Minimize

    subject to the constraints

    Thus the problem can be viewed as finding the smallest non-negativevalue of

    wherex* is a vector of the optimal values of the parameters

    )(maxmin

    1

    xri

    miRx

    n

    )(xri 0

    0)( ** = xE

    N E F ti

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    34/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    New Energy Function

    Applying the standard quadratic function we can considerthe cost function as:

    where are coefficients and

    { }=

    +++=m

    i

    ii xrxrxE1

    22 ))](([))](([2

    ),(

    0,0 >>

    },0min{][ yy =

    New Energy Function

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    35/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    New Energy Function

    Applying now the gradient strategy we obtain theassociated system of differential equations

    +++

    =

    =

    ]S))x(r(S))x(r[(dt

    d2ii

    m

    1i

    1ii0

    { }=

    +=m

    i

    iiiiijj

    jSxrSxra

    dt

    dx

    1

    21 ]))(())([( ),...,2,1( nj=

    +=

    otherwise;1

    0)x(r;0S i

    1i

    =

    otherwise;1

    0)x(r;0S

    i

    2i

    Simplifying architecture

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    36/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Simplifying architecture

    It is interesting to note that the system of differentialequations can be simplified by:

    This nonlinear function represent a typical dead zonefunction.

    >+

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    37/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Simplifying architecture

    It is easy to check:

    Thus the system of differential equations can be simplified to theform:

    )),(())(())(( 21 xrSxrSxr iiiiii =++

    )),(())(())(( 21 xrSxrSxr iiiiii =+

    )0(

    1

    0 )0(,)),((

    =

    =

    =

    m

    i

    ii xr

    dt

    d

    ,)),x(r(adt

    dx m

    1i

    iiijj

    j =

    = )n,...,2,1j(x)0(x )0(jj ==

    m

    jdx))((

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    38/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    =

    =i

    iiijj

    jxra

    dt 1)),((

    m

    ))((d

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    39/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    =

    =1iii0 )),x(r(

    dt

    Outline

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    40/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Outline

    Historical Introduction

    Problem Formulation

    Standard Least Squares Solution

    General ANN Solution

    Minimax Solution

    Least Absolute Value Solution

    Conclusion

    Least Absolute Values ( L Norm) Energy Function

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    41/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Least Absolute Values ( L1-Norm) Energy Function

    Find the design vector that minimizes the errorfunction

    where

    Why should one choose this function knowing

    that it has differentiation problems?

    =

    =m

    i

    i xrxE1

    1 )()(

    =

    =n

    j

    ijiji bxaxr

    1

    )(

    Important L -Norm Properties

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    42/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Important L1-Norm Properties

    1. Least absolute value problems are equivalent to linearprogramming problems and vice versa.

    2. Although the energy function E1(x) is not differentiable, the terms

    can be approximated very closely by smoothly differentiable functions

    3. For a full rank* matrixA, there always exists a minimum L1-Normsolution which passes through at least n of the m data points.L2-Norm does not in general interpolate any of the points.

    These properties are not shared by L2-Norm.

    * MatrixA is said to be of full rank if all its rows or columns arelinearly independent.

    )(xri

    Important L -Norm Properties

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    43/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Important L1-Norm Properties

    Theorem: There is a minimizer of the energyfunction for which the residuals forat least n values of i, say i1, i2, , in, where n denotes the

    rank of the matrix A.

    We can say that L1-Norm solution is themedian solution while the L2-Normsolution is the mean solution.

    n* Rx

    =

    =m

    1i

    i1 )x(r)x(E 0)x(r *

    i =

    Least Absolute Error Implementation

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    44/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Least Absolute Error Implementation

    The algorithm is as follows:

    1. First phase: Solving the problem using ordinary least-square technique and

    computing all m residuals Selecting from them the n residuals which are smallest in

    absolute value

    2. Second phase: Discarding the rest of equations, n equations related to selected

    residuals are solved by minimizing the residuals to zero

    ANN implementation is done in three layers usinginhibition control circuit.

    ANN Architecture for Solving L1-Norm Estimation

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    45/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Phase #1

    Problem

    Phase #2

    ANN Architecture for Solving L1-Norm Estimation

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    46/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Phase #1

    Problem

    Phase #2

    ANN Architecture for Solving L1-Norm Estimation

    P bl

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    47/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Phase #1

    Problem

    Phase #2

    Example

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    48/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Example

    Consider matrixA and observation b as below. Find

    the solution toAx=b using the least absolute error

    energy function.

    =

    1641

    931

    421111

    001

    A

    =

    10

    1-

    12

    1

    b, 0bAx, =

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    49/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    In the first phase all the switches ( S1-S5 ) were closed and the

    network was able to find the following standard least-squares

    solution:

    In this case it is impossible to select two largest, in absolutevalue, residuals because

    Phase one was rerun while switch S4 was opened and the

    network found then

    =

    5.1

    5.36.0

    x *

    I

    =

    6.0

    4.1

    6.06.0

    4.0

    )x(r *

    I

    6.0rrr 532 ===

    =

    3409.1

    6404.2

    9182.0

    x II*

    =

    0273.0

    2273.3

    01636

    2182.0

    0818.0

    )x(r II*

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    50/56

    Cichockis Circuit Simulation Results

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    51/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Cichocki s Circuit Simulation Results

    Residuals for n=3 of the m=5equations

    converges to zero in 50 nano-seconds.

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    52/56

    Outline

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    53/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Historical Introduction

    Problem Formulation

    Standard Least Squares Solution

    General ANN Solution

    Minimax Solution

    Least Absolute Value Solution

    Conclusion

    Conclusion

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    54/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Great need for real-time solution of linear equations.

    Cichockis proposal ANN is different from classical ANNs.

    Consider a proper energy function, reducing which resultsin the optimal solution toAx=b.

    Proper function may have different meaning in differentapplications.

    Standard least square error function gives the optimalanswer for Gaussian distribution of error.

    Conclusion (Cont.)

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    55/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

    Least square function doesnt have a good behavior when havinglarge outliers in observations.

    Various energy functions have been proposed to solve the outlier

    problem (e.g. logistic function).

    Minimax results in the optimal answer for the uniform distribution oferror. It also has some implementation and mathematical problems

    that results in an indirect approach to solving the problem.

    Least absolute error function has some properties that makes itdistinguishable from other error functions.

  • 8/11/2019 Neura Networks for Solving Systems of Linear

    56/56

    Reza Sadraei

    Jalal Kazemitabar Artificial Neural Networks (Spring 2007)