neura networks for solving systems of linear

8/11/2019 Neura Networks for Solving Systems of Linear

1/56

Ar ti ficial Neural Networks (Spring 2007)

Neural Networks for Solving Systems of

Linear Equations

Seyed Jalal Kazemitabar

Reza Sadraei

Instructor: Dr. Saeed Bagheri

Artificial Neural Networks Course (Spring 2007)


2/56

Reza Sadraei

Jalal Kazemitabar Artificial Neural Networks (Spring 2007)

Outline

Historical Introduction

Problem FormulationStandard Least Squares Solution

General ANN Solution

Minimax Solution

Least Absolute Value Solution

Conclusion


3/56

Reza Sadraei


Outline


Problem Formulation

Standard Least Squares Solution


Minimax Solution


Conclusion


4/56Reza Sadraei


History

70s:

Kohonen solved optimization problemsusing Neural Networks.

80s:

Hopfield used Lyapunov function (Energy

function) for proving the convergence of

iterative methods in optimization problems.

Differential Eq. Neural Networksmapping


5/56

Reza Sadraei


History

Many problems in science and engineeringinvolve solving a large system of linear

equations:Machine Learning

Physics

Image Processing

Statistics,

In many applications an on-line solution ofa set of linear equations is desired.


6/56

Reza Sadraei


History

40s:

Kaczmarz introduced a method to solve linear

equations

50s 80s:

Different methods based on Kaczmarzs hasbeen proposed in different fields.

Conjugate Gradient method.

No good method for on-line solution of

large systems.


7/56

Reza Sadraei


1990:

Andrzej Cichocki:a Mathematician who received

his PhD in Electrical

Engineering

Proposed a Neural Network

for solving systems of linearequations in real time


8/56

Reza Sadraei


Outline


Problem Formulation



Minimax Solution


Conclusion


9/56

Reza Sadraei


Problem Formulation

Linear Parameter Estimation model :

: Linear Equation

: Model matrix

: Unknown vector of the systemparameters to be estimated

: Vector of observations

: Unknown measurement errors: Vector of true values (usually unknown)

nm

ij R]a[A =

truebrbAx =+=

mRb

m

r mtrue Rb

nTn21 R]x,...,x,x[x =

=

+

=

m

2

1

true

true

true

m

2

1

m

2

1

n

2

1

mn2m1m

n22221

n11211

b

b

b

r

r

r

b

b

b

x

x

x

aaa

aaa

aaa

MMMM

L

MOM

L


10/56

Reza Sadraei


Types of Equations

A set of linear equations is said to be overdetermined ifm > n. Usually inconsistent due to noise and errors.

e.g. Linear parameter estimation problems arising in signal

processing, biology, medicine and automatic control.

A set of linear equations is said to be underdetermined ifm < n (due to the lack of information). Inverse and extrapolation problems.

Involves much less problems than overdetermined case

nm

ij R]a[A =

truebrbAx =+=


11/56

Reza Sadraei


Mathematical Solutions

Why not use ?

It is not applicable since mn most of the time whichresults in irreversibility ofA.

What if we use least square error method?

Inversing is considered to be time consuming for

largeA in real-time systems.

bAx -1=

;bA)AA(x

,bAAxA

,0)bAx(A'y),bAx()bAx(y

T1T

TT

T

T

=

=

===

AAT


12/56

Reza Sadraei


Outline


Problem Formulation



Minimax Solution


Conclusion


13/56


14/56

Reza Sadraei


Gradient Descent Approach

Basic idea: compute a trajectory starting at the initial pointthat has the solution x* as a limit point ( for )

General gradient approach for minimization of a function:

is chosen in a way that ensures the stability of the differential equationsand an appropriate convergence speed

t)t(x

)0(x

)x(Edt

dX

=

=

n

2

1

mn2m1m

n22221

n11211

n

2

1

xE

xE

xE

dtdx

dt

dxdt

dx

M

L

MOM

L

M


15/56

Reza Sadraei


Solving LE Using Least Squares Criterion

Gradient of the energy function:

So

Scalar representation:

)bAx(Ax

E

x

E

x

EE

TT

n21

=

= L

n,...,2,1j,x)0(x

bxaadt

dx

)0(

jj

n

1p

n

1k

ikik

m

1i

ipjp

j

==

=

= ==

)bAx(AdtdX T =


16/56

Reza Sadraei


= ==

=

n

1p

n

1k

ikik

m

1i

ipjp

jbxaa

dt

dx


17/56

Reza Sadraei


ANN With Identity Activation Function


18/56

Reza Sadraei


Outline


Problem Formulation



Minimax Solution


Conclusion


19/56

Reza Sadraei



The key step in designing an algorithm for

neural networks:

Construct an appropriate computational energyfunction (Lyapunov function)

Lowest energy state will correspond to the

desired solutionx*

Using derivation, the energy function

minimization problem is transformed into a set

of ordinary differential equations

)x(E


20/56

Reza Sadraei



In general, the optimization problem can be formulated

as:

Find the vector that minimizes the energy function

is called weighting function.Weighting function derivation is called activation function

nRx *

))x(r()bxA()x(E

m

1i

i

m

1i

ii == ==

))x(r( i

ii

i

ii r

E

r

)r(

)r(g

=

=


21/56

Reza Sadraei



Gradient descent approach:

The minimization of the energy function leads to the set ofdifferential equation

)x(Edt

dX =

=

n

2

1

mn2m1m

n22221

n11211

n

2

1

xE

xE

xE

dtdx

dtdx

dtdx

M

L

MOM

L

M

=

=

=

= ==

===

m

1i

n

1k

ikikiip

n

1p

jp

j

m

1i ip

in

1pjp

p

n

1pjp

j

bxagadt

dx

r

E

x

r

x

E

dt

dx


22/56

Reza Sadraei


General ANN Architecture

= = ==

m

1i

n

1k

ikikiip

n

1p

jpj bxaga

dtdx

Remember that this is

he activation function

g1

g2

gm


23/56

Reza Sadraei


Drawbacks of Least Square Error Criterion

Why not always use least square energyfunction? Not so good in case of existence of large outliers.

Only optimal for Gaussian distribution of error.

The proper choice of the criterion depends on Specific applications.

Distribution of the errors in the measurement vector b

Gaussian dist*. Least squares criterion Uniform dist. Chebyshev norm criterion

*However the assumption that the set of measurements or observations has aGaussian error distribution is frequently unrealistic due to different sources oferrors such as instrument errors, modeling errors, sampling errors, and humanerrors.


24/56

Reza Sadraei


Hubers Function:

Weighting Function Activation Function

Special Energy Functions

>

=

e:

2

e

e:2

e

)e(2

2

H


25/56

Reza Sadraei



Talvars Function:

This Function has direct implementation


>

=

e:

2

e:2

e

)e(2

2

T


26/56

Reza Sadraei



Logistic Function:

Iterative Reweigheted method uses this activation

function.


=

eCoshln)e(

2

L


27/56

Reza Sadraei



Lp-normed function:

Activation Function

==m

1i

pip r

p1)x(E


28/56

Reza Sadraei


Lp-Norm Energy Functions

A well-known criterion is

energy functionNormL1


=

=m

1i

i1 )x(r)x(E


29/56

Reza Sadraei



Another well-known criterion is(chebyshev) criterion which can be

formulated as the minimax problem:

This criterion is optimal for uniform distribution

of error.

NormL

{ })x(rmaxmin imi1Rx

n

O


30/56

Reza Sadraei


Outline


Problem Formulation



Minimax Solution


Conclusion

Mi i (L N ) C i i


31/56

Reza Sadraei


Minimax (L

-Norm) Criterion

For the casep= of theLp-Norm problem the activationfunction g[ri(x)] can not be explicitly mathematicallyexpressed by

Error function can be define as

resulting in following activation function:

mi1

i })x(rmax{)x(E

=

1)( p

i xr

==

otherwise0

})x(r{max)x(rif)]x(r[sign)]x(r[g

kmk1

ii

i

Mi i (L N ) C it i


32/56

Jalal Kazemitabar

Reza Sadraei Artificial Neural Networks (Spring 2007)

Minimax (L

-Norm) Criterion

Although straightforward, some problems arise inpractical implementations of the system of

differential equations:

Exact realization of the signum functions is ratherdifficult (electrically).

E

has a derivative discontinuity atxif for some i k

*This is often responsible for various anomalousresults (e.g. hysteresis phenomena)

)()()( xExrxr ki ==

T f i th bl t i l t


33/56

Reza Sadraei


Transforming the problem to an equivalent one

Rather than directly implementing the proposed system, we transformthe minimax problem

into an equivalent one:

Minimize

subject to the constraints

Thus the problem can be viewed as finding the smallest non-negativevalue of

wherex* is a vector of the optimal values of the parameters

)(maxmin

1

xri

miRx

n

)(xri 0

0)( ** = xE

N E F ti


34/56

Reza Sadraei


New Energy Function

Applying the standard quadratic function we can considerthe cost function as:

where are coefficients and

{ }=

+++=m

i

ii xrxrxE1

22 ))](([))](([2

),(

0,0 >>

},0min{][ yy =

New Energy Function


35/56

Reza Sadraei


New Energy Function

Applying now the gradient strategy we obtain theassociated system of differential equations

+++

=

=

]S))x(r(S))x(r[(dt

d2ii

m

1i

1ii0

{ }=

+=m

i

iiiiijj

jSxrSxra

dt

dx

1

21 ]))(())([( ),...,2,1( nj=

+=

otherwise;1

0)x(r;0S i

1i

=

otherwise;1

0)x(r;0S

i

2i

Simplifying architecture


36/56

Reza Sadraei



It is interesting to note that the system of differentialequations can be simplified by:

This nonlinear function represent a typical dead zonefunction.

>+


37/56

Reza Sadraei



It is easy to check:

Thus the system of differential equations can be simplified to theform:

)),(())(())(( 21 xrSxrSxr iiiiii =++

)),(())(())(( 21 xrSxrSxr iiiiii =+

)0(

1

0 )0(,)),((

=

=

=

m

i

ii xr

dt

d

,)),x(r(adt

dx m

1i

iiijj

j =

= )n,...,2,1j(x)0(x )0(jj ==

m

jdx))((


38/56

Reza Sadraei


=

=i

iiijj

jxra

dt 1)),((

m

))((d


39/56

Reza Sadraei


=

=1iii0 )),x(r(

dt

Outline


40/56

Reza Sadraei


Outline


Problem Formulation



Minimax Solution


Conclusion

Least Absolute Values ( L Norm) Energy Function


41/56

Reza Sadraei


Least Absolute Values ( L1-Norm) Energy Function

Find the design vector that minimizes the errorfunction

where

Why should one choose this function knowing

that it has differentiation problems?

=

=m

i

i xrxE1

1 )()(

=

=n

j

ijiji bxaxr

1

)(

Important L -Norm Properties


42/56

Reza Sadraei


Important L1-Norm Properties

1. Least absolute value problems are equivalent to linearprogramming problems and vice versa.

2. Although the energy function E1(x) is not differentiable, the terms

can be approximated very closely by smoothly differentiable functions

3. For a full rank* matrixA, there always exists a minimum L1-Normsolution which passes through at least n of the m data points.L2-Norm does not in general interpolate any of the points.

These properties are not shared by L2-Norm.

* MatrixA is said to be of full rank if all its rows or columns arelinearly independent.

)(xri

Important L -Norm Properties


43/56

Reza Sadraei


Important L1-Norm Properties

Theorem: There is a minimizer of the energyfunction for which the residuals forat least n values of i, say i1, i2, , in, where n denotes the

rank of the matrix A.

We can say that L1-Norm solution is themedian solution while the L2-Normsolution is the mean solution.

n* Rx

=

=m

1i

i1 )x(r)x(E 0)x(r *

i =

Least Absolute Error Implementation


44/56

Reza Sadraei


Least Absolute Error Implementation

The algorithm is as follows:

1. First phase: Solving the problem using ordinary least-square technique and

computing all m residuals Selecting from them the n residuals which are smallest in

absolute value

2. Second phase: Discarding the rest of equations, n equations related to selected

residuals are solved by minimizing the residuals to zero

ANN implementation is done in three layers usinginhibition control circuit.

ANN Architecture for Solving L1-Norm Estimation


45/56

Reza Sadraei


Phase #1

Problem

Phase #2



46/56

Reza Sadraei


Phase #1

Problem

Phase #2


P bl


47/56

Reza Sadraei


Phase #1

Problem

Phase #2

Example


48/56

Reza Sadraei


Example

Consider matrixA and observation b as below. Find

the solution toAx=b using the least absolute error

energy function.

=

1641

931

421111

001

A

=

10

1-

12

1

b, 0bAx, =


49/56

Reza Sadraei


In the first phase all the switches ( S1-S5 ) were closed and the

network was able to find the following standard least-squares

solution:

In this case it is impossible to select two largest, in absolutevalue, residuals because

Phase one was rerun while switch S4 was opened and the

network found then

=

5.1

5.36.0

x *

I

=

6.0

4.1

6.06.0

4.0

)x(r *

I

6.0rrr 532 ===

=

3409.1

6404.2

9182.0

x II*

=

0273.0

2273.3

01636

2182.0

0818.0

)x(r II*


50/56

Cichockis Circuit Simulation Results


51/56

Reza Sadraei


Cichocki s Circuit Simulation Results

Residuals for n=3 of the m=5equations

converges to zero in 50 nano-seconds.


52/56

Outline


53/56

Reza Sadraei



Problem Formulation



Minimax Solution


Conclusion

Conclusion


54/56

Reza Sadraei


Great need for real-time solution of linear equations.

Cichockis proposal ANN is different from classical ANNs.

Consider a proper energy function, reducing which resultsin the optimal solution toAx=b.

Proper function may have different meaning in differentapplications.

Standard least square error function gives the optimalanswer for Gaussian distribution of error.

Conclusion (Cont.)


55/56

Reza Sadraei


Least square function doesnt have a good behavior when havinglarge outliers in observations.

Various energy functions have been proposed to solve the outlier

problem (e.g. logistic function).

Minimax results in the optimal answer for the uniform distribution oferror. It also has some implementation and mathematical problems

that results in an indirect approach to solving the problem.

Least absolute error function has some properties that makes itdistinguishable from other error functions.


56/56

Reza Sadraei


neura networks for solving systems of linear

Documents