matrix differential calculus by dr. md. nurul haque mollah, professor, dept. of statistics,...

45
Matrix Differential Matrix Differential Calculus Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-11 1 Dr. M. N. H. MOLLAH

Upload: piers-sanders

Post on 04-Jan-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Matrix Differential Matrix Differential CalculusCalculus

By

Dr. Md. Nurul Haque Mollah,Professor,

Dept. of Statistics,University of Rajshahi,

Bangladesh

01-10-11 1Dr. M. N. H. MOLLAH

Page 2: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

OutlineOutline

Differentiable FunctionsClassification of Functions and Variables for DerivativesDerivatives of Scalar Functions w. r. to Vector VariableDerivatives of Scalar Functions w. r. to Vector VariableDerivative of Scalar Functions w. r. to a Matrix VariableDerivative of Scalar Functions w. r. to a Matrix VariableDerivatives of Vector Function w. r. to a Scalar VariableDerivatives of Vector Function w. r. to a Scalar VariableDerivatives of Vector Function w. r. to a Vector VariableDerivatives of Vector Function w. r. to a Vector VariableDerivatives of Vector Function w. r. to a Matrix VariableDerivatives of Vector Function w. r. to a Matrix VariableDerivatives of Matrix Function w. r. to a Scalar VariableDerivatives of Matrix Function w. r. to a Scalar VariableDerivatives of Matrix Function w. r. to a Vector VariableDerivatives of Matrix Function w. r. to a Vector VariableDerivatives of Matrix Function w. r. to a Matrix VariableDerivatives of Matrix Function w. r. to a Matrix VariableSome Applications of Matrix Differential Calculus

01-10-11 2Dr. M. N. H. MOLLAH

Page 3: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

1. Differentiable Functions1. Differentiable FunctionsA real-valued function where is an open set is said to be continuously differentiable if the partial derivatives exist for each and are continuous functions of x over X. In this case we writeover X.

Generally, we write over X if all partial derivatives of order p exist and are continuous as functions of x over X.

If over Rn, we simply write

If on X, the gradient of at a point is defined as

,X:f nX

nxxfxxf /,...,/ 1 Xx1Cf

pCf

nx

xf

x

xf

xDf 1

.Cf ppCf

1Cf

Xx

f

Dr. M. N. H. MOLLAH01-10-11 3

Page 4: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

If over X, the Hessian of at x is defined to be the

symmetric matrix having as the ijth

element

If where then is represented by

the column vector of its component functions as

If X is open, we can write on X if

on X. Then the derivative of the vector function with

respect to the vector variable x is defined by

2Cf f

nn ji xxxf 2

ji xx

xfxfD

22

,mRX:f ,nX f

mfff ,...,, 21

xf

xf

xf

m

1

pCf pm

pp CfCfCf ,...,, 21

mn m1 x....DfxDfxDf

f(x)

Dr. M. N. H. MOLLAH01-10-11 4

Page 5: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

If is real-valued functions of (x,y) where

we write

RRf rn :

,Ry,...,y,y,Rx,...,x,xx rn21

nn21

rrji

yy

rnjixy

nnjixx

1rr

1

y

1nn

1

x

yy

yx,fyx,fD

,yx

yx,fyx,fD,

xx

yx,fyx,fD

,

y

yx,f

y

yx,f

yx,fD,

x

yx,f

x

yx,f

yx,fD

Dr. M. N. H. MOLLAH01-10-11 5

Page 6: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

If then

For and consider the function

defined by

Then if the chain rule of differentiation is stated as

,,...,,,: 21 mmrn ffffRRf

mrmy1yy

mnmx1xx

yx,f...Dyx,fDyx,fD

,yx,f...Dyx,fDyx,fD

mr RRh : ,RR:g rn

mn RRf : .xghxf

,and ppp CfCg,Ch

xgDxDxDf hg

Dr. M. N. H. MOLLAH01-10-11 6

Page 7: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

22. . Classification of Functions and Variables for DerivativesClassification of Functions and Variables for Derivatives

Let us consider scalar functions g, vector functions ƒ and matrix functions F. Each of these may depend on one real variable x, a vector of real variables x, or a matrix of real variables X. We thus obtain the classification of function and variables shown in the following Table.

Table ScalarVariable

Vectorvariable

Matrixvariable

Scalar function

Vector function

Matrix function )(

)(

)(

x

x

x

F

f

g

)(

)(

)(

xF

xf

xg

)(

)(

)(

X

X

X

F

f

g

Dr. M. N. H. MOLLAH01-10-11 7

Page 8: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Some Examples of Scalar, Vector and Matrix FunctionsSome Examples of Scalar, Vector and Matrix Functions

XXAXBX

xxx

XX

Axx

XXXXbaX

Axxxax

'

''

,,:)F

F

xx

xF

AXB,Af

f

)bx,ax(f

),(tr,g

,g

axxg

'

''

2

2

2

(

:)(

1:(x)

functionMatrix

:)(

:)(

:(x)

functionFunction Vector

:)(

:)(

:)(

function Scalar

Dr. M. N. H. MOLLAH01-10-11 8

Page 9: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Consider a scalar valued function ‘g’ of ‘m’ variables

g = g(x1, x2,…, xm) = g(x),

where x = (x1, x2,…, xm)/. Assuming function g is differentiable, then its vector gradient with respect to x is the m-dimensional column vector of partial derivatives as follows

mx

g

x

g

g

1

x

3. Derivatives of Scalar Functions w. r. to Vector Variable3. Derivatives of Scalar Functions w. r. to Vector Variable

3.1 Definition3.1 Definition

Dr. M. N. H. MOLLAH01-10-11 9

Page 10: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Consider the simple linear functional of x as

where is a constant vector. Then the gradient of g is w. r. to x is given by

Also we can write it as

Because the gradient is constant (independent of x), the Hessian matrix of is zero.

ax

1

1

mma

ag

3.2 Example 1

xax

m

iii xa)(g

1

Tm )aa( 1a

aa

x

x

xx Tag )(Dr. M. N. H. MOLLAH01-10-11 10

Page 11: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Example 2

Consider the quadratic form

where A=(aij) is a m x m square matrix. Then the gradient of

g(x) w.r. to x is given by

∑∑m

i

m

jjiji

T axxAg1 1= =

== xxx )(

matrix)symmetricisA(if2

11 1

1 1

11

1

1

∑ ∑

∑ ∑

Ax

axax

axax

x

g

x

g

g

m

m

j

m

i

miijmj

m

j

m

i

iijj

mm

xAAx

x

Dr. M. N. H. MOLLAH01-10-11 11

Page 12: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Then the second- order gradient or Hessian matrix of g(x)=x/Ax w. r. to x becomes

matrix) symmetric isA (if2

2

2

11

1111

2

2

A

AA

aaa

aaa)A(

mmmm

mmT

x

xx

Dr. M. N. H. MOLLAH01-10-11 12

Page 13: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

For computing the gradients of products and quotients of functions, as well as of composite functions, the same rules apply as for ordinary functions of one variable.

Thus

x

xx

x

x

xx

xxx

x

x

x

xx

x

xxx

x

x

x

xx

)(f))(g(f

))(g(f

)(g])(g

)(f)(g)(f

[)(g/)(f

)(g)(f)(g

)(f)(g)(f

2

The gradient of the composite function f(g(x)) can be generalized to any number of nested functions, giving the same chain rule of differentiation that is valid for functions if one variable.

3.3 Some useful rules for derivative of scalar functions w. r. to vectors

Dr. M. N. H. MOLLAH01-10-11 13

Page 14: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

product) Hadamardfor # andproduct Kronecker for :(Note

11

109.

8

7

6

5

4

3

2

where0, 1

2

x

x

x

x

x

x

x

x

x

x

x

x

x

xxx

x

x

x

xxx

xxx

x

x

x

xxx

xx

x

x

xx

xxx

x

x

x

xxx

xxx

x

x

x

xxx

x

x

x

x

xxx

x

x

x

xx

)(ftrace

)](f[trace.

)(fvec

)(vecf.,

)(f)(f.

],)(g

)#(f[)](g#)(f

[)(g)#(f

.

])(g

)(f[)](g)(f

[)(g)(f

.

)(f))(g(f

))(g(f.

)(g])(g

)(f)(g)(f

[)(g/)(f

.

)(g)(f)(g

)(f)(g)(f.

)(g)(f)(g)(f.

)(f)(f.

) f(A)A(

.

3.3 Fundamental Rules for Matrix Differential Calculus

Dr. M. N. H. MOLLAH01-10-11 14

Page 15: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

3.4 Some useful derivatives of scalar functions w. r. to a vector variable3.4 Some useful derivatives of scalar functions w. r. to a vector variable

yyx

yxy

x

)()(

AyAyxx

xx

)(

)x(x

2

symmetric) is (if2 AAxAxxx

xAAAxxx

yAAxyx

)(

)()(

)(

)(a)(b

)(b)(a

)](b)(a[

)(a))((aD)](a)(a[

//

xx

xx

x

xxx

x

xQQxxQxx x

Dr. M. N. H. MOLLAH01-10-11 15

Page 16: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Consider a scalar-valued function f of the elements of a matrix X=(xij) as

f = f(X) = f(x11, x12,… xij,..., xmn)

Assuming that function f is differentiable, then its matrix gradient with respect to X is the m×n matrix of partial derivatives as follows

4. Derivative of Scalar Functions w. r. to a Matrix 4. Derivative of Scalar Functions w. r. to a Matrix VariableVariable4.1 Definition4.1 Definition

nmmnm

n

x

f

x

f

x

f

x

f

f

1

111

X

Dr. M. N. H. MOLLAH01-10-11 16

Page 17: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

The trace of a matrix is a scalar function of the matrix elements. Let X=(xij) is an m x m square matrix whose trace is denoted by tr (X). Then

Proof: The trace of X is defined by

Taking the partial derivatives of tr (X) with respect to one of the elements, say xij, gives

m)(tr

IX

X

m

iiix)(tr X

ji,

ji,

x ji for0

for1

)(tr ∂ X

4.2 Example 1

Dr. M. N. H. MOLLAH01-10-11 17

Page 18: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Thus we get,

m

mm

...

............

...

...

I

X

X

X

100

010

001

∂x

)(tr ∂

)(tr ∂

ij

Dr. M. N. H. MOLLAH01-10-11 18

Page 19: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

The determinant of a matrix is a scalar function of the matrix elements. Let X=(xij) is an m x m invertible square matrix whose determinant is denoted |X|. Then

Proof: The inverse of a matrix X is obtained as

where adj(X) is known as the adjoint matrix of X. It is defined by

where Cij =(-1)i+jMij is the cofactor w. r. to xij and Mij is the minor w. r. to xij.

4.2 Example 2

|X|)(|X| 1-

∂X

X

)(adj||

XX

X11-

nnn

n

CC

CC

)(adj

1

111

X

Dr. M. N. H. MOLLAH01-10-11 19

Page 20: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

The minor Mij is obtained by first taking the (n-1) x (n-1) sub-matrix of X that remains when the i-th row and j-th column of X are removed, then computing the determinant of this sub-matrix. Thus the determinant |X| can be expressed in terms of the cofactors as follows

Row i can be any row and the result is always the same. In the cofactors Cik none of the matrix elements of the i-th row appear, so the determinant is a linear function of these elements. Taking now a partial derivatives of |X| with respect to one of the elements, say xij, gives

∑1

n

k

ikikCx||

X

jiji

|C

x ∂

|∂

X

Dr. M. N. H. MOLLAH01-10-11 20

Page 21: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Thus we get,

This also implies that

1

ij∂x

||∂

||∂

||||

)(adj

C

/

mmij

mm

XX

X

X

X

X

1-

∂1

∂)(

logX

X

X

XX

X

Dr. M. N. H. MOLLAH01-10-11 21

Page 22: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

4.3 Some useful derivatives of scalar functions w.r.to matrix

.CC,a)(

a))(())()((

.

.CC,

)()(

.

)(.

)(.

)()(

.

)(.

)(.

/

if2

7

if2

6

5

24

3

2

1

bXaC

bXaCCX

bXaCbXa

aCXa

aXaCCX

CXaXa

aCXbbXaCX

CXbXa

aXaX

XaXa

abbaXX

XbXa

abX

bXa

baX

Xba

Dr. M. N. H. MOLLAH01-10-11 22

Page 23: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

k

kn

/n

n

T

/

/k

i

iKiK

/KK

k)log(

])[()][log(tr

.

e)e(tr

.

)()(tr

.

][)(tr

.

)(tr.

)(K)(tr

.

)(tr.

)(tr.

)(tr.

/

XXI

XIX

XIX

BAXXX

BAX

XXX

X

AXXX

AX

XX

X

AX

XA

AX

AX

IX

X

XX

1

1

111

111

1

0

1

1

1 where

16

15

14

13

12

11

10

9

8

Derivatives of trace w.r.to matrix

Dr. M. N. H. MOLLAH01-10-11 23

Page 24: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

X)BABA(X

)BAXX(tr.

XBAAXBX

)AXBX(tr.

AXBBXAX

)AXBX(tr.

BAX

)AXB(tr.

BAX

)BAX(tr.

)AA(XX

)XAX(tr.

X)AA(X

)AXX(tr.

TTT

TTT

TTTTTT

TT

T

TT

TT

23

22

21

20

19

18

17

Dr. M. N. H. MOLLAH01-10-11 24

Page 25: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

symmetric. and real is if231

symmetric. and real is if2

matrix. real is if30

229

000|if28

27

26

25

24

1

1

1

1

1

1

1

1

1

C,)(||log

.

C,)(||

C,)()(||||

.

)(||log

.

||,||,||,|)(||||

.

B)AXB(||||

.

)(||log

.

)(||k||

.

)(||||||

.

/

kk

CXXCXX

CXX

CXXCXCXX

CXXXCCCXXX

CXX

XXXX

XX

XBAXXAXBX

AXB

AAXBX

AXB

XX

X

XXX

X

XXX

X

X

X

Derivatives of determinants w.r.to matrix

Dr. M. N. H. MOLLAH01-10-11 25

Page 26: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Consider the vector valued function ‘f’ of a scalar variable x

as f(x)=[f1(x) , f2(x) ,…, fn(x) ]/

Assuming function f is differentiable, then its scalar gradient with respect to x is the n-dimensional row vector of partial derivatives as follows

n

n

x

)x(f,...,

x

)x(f

x

f

1

1

5. Derivatives of Vector Function w. r. to a Scalar Variable5. Derivatives of Vector Function w. r. to a Scalar Variable

5.1 Definition5.1 Definition

Dr. M. N. H. MOLLAH01-10-11 26

Page 27: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Let

Then the gradient of f with respect to x is given by

Also we can write it as

mm...,,,

x

)mx(,...,

x

)x(,

x

)x(

x

f

121

2

5.2 Example

/)mx...,,x,x()x(f 2

m

x)x(f,x

)x(f

2

1

with , where yyy

Dr. M. N. H. MOLLAH01-10-11 27

Page 28: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Consider the vector valued function ‘f’ of a vector variable x=(x1, x2, …, xm)/ as

f(x)= y =[y1= f1(x) , y2= f2(x) ,…, yn= fn(x) ]/

Assuming function f is differentiable, then its vector gradient with respect to x is the m×n matrix of partial derivatives as follows

nmm

n

m

n

nm

n

x

)(f

x

)(f

x

)(f

x

)(f

)(f,...,

)(ff

xx

xx

x

x

x

x

x

1

11

1

1

6. Derivatives of Vector Function w. r. to a Vector Variable6. Derivatives of Vector Function w. r. to a Vector Variable

6.1 Definition6.1 Definition

Dr. M. N. H. MOLLAH01-10-11 28

Page 29: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Let

Then the gradient of f (x) with respect to x is given by

6.2 Example

) 1,2,...,( , of rowth theis ][

and )( ,),..., ,( where

...,

mn/

m21

2211

nii,..., a, aaA

axxx

,xA)(f,xA)(f,xA)(f)(f

imi2i1i

ij

/nn

A

Ax

xxxAxx

A

xx

xx

x

x

x

x

x

x

nmmnm

n

nmm

n

m

n

nm

n

aa

aaa

x

)(f

x

)(f

x

)(f

x

)(f

)(f,...,

)(f)(f

1

12111

1

11

1

1

Dr. M. N. H. MOLLAH01-10-11 29

Page 30: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Consider the vector valued function ‘f’ of a matrix variable X=(xij) of order m×n as

f(X)= y =[y1= f1(X) , y2= f2(X) ,…, yq= fq(X) ]/

Assuming that function f is differentiable, then its matrix gradient with respect to X is the mn×q matrix of partial derivatives as follows

qmnmn

n

mnmn

n

n

qmn

q

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(fx

)(f

x

)(f

x

)(f

]vec[

)(f,...,

]vec[

)(ff

XXX

XXX

XXX

X

X

X

X

X

21

2121

2

21

1

1111

2

11

1

1

7. Derivatives of Vector Function w. r. to a Matrix Variable7. Derivatives of Vector Function w. r. to a Matrix Variable

7.1 Definition7.1 Definition

Dr. M. N. H. MOLLAH01-10-11 30

Page 31: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Let

Then the gradient of f (X) w. r. to matrix variable X is given by

7.2 Example

) 1,2,...,( , of rowth theis ][

and )( ,),..., ,( where

...,

n m/

m21

221

mii,..., x, xxX

xaaa

,X)(f,)X(f,)(f

X)(f

imi2i1i

ij

/mm1

X

Xa

aXaXaXX

aX

m

mmnm

m

mn

m

mnmn

m

m

mmn

q

a

a

a

a

a

a

a

a

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(fx

)(f

x

)(f

x

)(f

]vec[

)(f,...,

]vec[

)(f)(f

Ia

XXX

XXX

XXX

X

X

X

X

X

X

000

000

000

000

000

000

000

000

2

2

2

1

1

1

21

2121

2

21

1

1111

2

11

1

1

Dr. M. N. H. MOLLAH01-10-11 31

Page 32: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Consider the matrix valued function ‘F’ of a scalar variable x as F(x)= Y =[yij= fij(x)]m×n

Assuming that function F is differentiable, then its scalar gradient with respect to the scalar x is the m×n order matrix of partial derivatives as follows

nm

ij

x

)x(f

x

)x(F

8. Derivatives of Matrix Function w. r. to a Scalar Variable8. Derivatives of Matrix Function w. r. to a Scalar Variable

8.1 Definition8.1 Definition

Dr. M. N. H. MOLLAH01-10-11 32

Page 33: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Let

Then the gradient of F (x) w. r. to scalar variable x is given by

8.2 Example

matrix.order n man is )( where

n m

ij

nmijij

a

,xa)x(f

xA)x(F

A

A

aaa

aaa

aaa

x

)x(f

x

)x(f

x

)x(f

x

)x(f

x

)x(f

x

)x(fx

)x(f

x

)x(f

x

)x(f

x

)x(f

x

)x(F

mnmm

n

n

mnmm

n

n

nm

ij

21

22221

11211

21

22221

11211

Dr. M. N. H. MOLLAH01-10-11 33

Page 34: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Consider the matrix valued function ‘F’ of a vector variable x=(x1,x2,…,xm) as F(x)= Y =[yij= fij(x)]n×q

Assuming that function F is differentiable, then its vector gradient with respect to the vector x is the m×nq order matrix of partial derivatives as follows

nqm

nqqn)(f

,...,)(f

,)(f

,...,)(f)(F

x

x

x

x

x

x

x

x

x

x 1111

9. Derivatives of Matrix Function w. r. to a Vector Variable9. Derivatives of Matrix Function w. r. to a Vector Variable

9.1 Definition9.1 Definition

Dr. M. N. H. MOLLAH01-10-11 34

Page 35: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Let

Then the gradient of F (x) w. r. to scalar variable x is given by

9.2 Example

.),( and where

/121 n2

/m

nmjiij

,...,aaa,)x,...,x,x(

,ax)(f

)(F

ax

x

axx

m

mnmn

n

n

m

mn

mm

mn

mn

mnm

ij

aaa

aaa

aaa

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(fx

)(f

x

)(f

x

)(f

)x(f

x

)x(F

Ia

xxx

xxx

xxx

x

000000

000000

000000

21

21

21

2111

22

21

2

11

11

21

1

11

Dr. M. N. H. MOLLAH01-10-11 35

Page 36: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Consider the matrix valued function ‘F’ of a matrix variable X=(xij)m×p as F(X)= Y =[yij= fij(X)]n×q

Assuming that function F is differentiable, then its matrix gradient with respect to the matrix X is the mp×nq order matrix of partial derivatives as follows

nqmp

nqqn

/

vec

)(f,...,

vec

)(f,

vec

)(f,...,

vec

)(f

vec

)(vecF)(F

X

X

X

X

X

X

X

X

X

X

X

X

1111

10.10. Derivatives of Matrix Function w. r. to a Matrix VariableDerivatives of Matrix Function w. r. to a Matrix Variable

10.1 Definition10.1 Definition

Dr. M. N. H. MOLLAH01-10-11 36

Page 37: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Let

Then the gradient of F (X) w. r. to scalar variable X is given by

10.2 Example

. and where

1

qnkjnmik

qm

n

kkjikij

]x[,]a[

,xa)(f

)(F

XA

X

AXX

mqnq

mqqm

mqnq

ij

/

vec

)(f,...,

vec

)(f,...,

vec

)(f,...,

vec

)(f

vec

)(fvec

)(vecF)(F

X

X

X

X

X

X

X

X

X

XX

X

X

X

1111

Dr. M. N. H. MOLLAH01-10-11 37

Page 38: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

A

aa

aa

aa

aa

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

x

)(f

)(F

q

mqnqmnn

m

mnn

m

nq

mq

nq

q

nq

m

nq

q

mq

q

q

q

m

q

n

mq

n

q

n

m

n

mqqm

I

XXXX

XXXX

XXXX

XXXX

X

X

1

111

1

111

1111

11

1

1

1

1

11

11

1

1

1

1

11

1111

1

11

1

11

11

00

00

00

00

Dr. M. N. H. MOLLAH01-10-11 38

Page 39: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Some important rules for matrix differentiation

Adt

dAtr))A(det(

dt

d.

Adt

dAAA

dt

d.

dt

dAA...A

dt

dAAA

dt

dAA

dt

d.

dt

dCABC

dt

dBABC

dt

dA)ACB(

dt

d.

dt

dB

dt

dA)BA(

dt

d.

nnnn

5

4

3

2

1

111

121

Dr. M. N. H. MOLLAH01-10-11 39

Page 40: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Homework's

)XA()BX(X

BAX.

X)X(X

X.

)abba(XX

)XbXa(.

abX

)bXa(.

baX

)Xba(.

/

'

111

111

5

4

3

2

1

Dr. M. N. H. MOLLAH01-10-11 40

Page 41: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

11. Some Applications of Matrix Differential 11. Some Applications of Matrix Differential Calculus Calculus

1. Test of independence between functions2. Expansion of Tailor series3. Transformations of Multivariate Density

functions 4. Multiple integrations 5. And so on.

Dr. M. N. H. MOLLAH01-10-11 41

Page 42: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

Test of Test of Independence Independence

A set of functions are said to be

correlated of each other if their Jacobian is zero. That is

Example: Show that the functions

are not independent of one another. Show that

Proof:

So the functions are not independent.

xfxfxf n,...,, 21

0

,...,,

,...,,,...,,

21

2121

n

nn xxx

ffffffJ

3223

22

21332123211 2,, xxxxxfxxxfxxxf xxx

xxx 32

22

1 2 fff

0

,,

,,,,

321

321321

xxx

ffffffJ

Dr. M. N. H. MOLLAH01-10-11 42

Page 43: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

In deriving some of the gradient type learning algorithms, we have to resort to Taylor series expansion of a function g(x) of a scalar variable x,

(3.19)

We can do a similar expansion for a function g(x)=g(x1, x2,…, xm) of m variables.

We have (3.20)

Where the derivatives are evaluated at the point x. The second term is the inner product of the gradient vector with the vector x-x, and the third term is a quadratic form with the symmetric Hassian matrix (∂2g / ∂x2).The truncation error depends on the distance |x-x|; the distance has to be small, if g(x)is approximated using only the first and second-order terms.

Taylor series expansions of multivariate functions

...)(2

1)()()( 2

2

2

xxdx

gdxx

d

dgxgxg

...)()(21

)()()( 2

2

xx

xxxxx

xxx

gggg T

T

Dr. M. N. H. MOLLAH01-10-11 43

Page 44: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

The same expansion can be made for a scalar function of a matrix variable. The second order term already becomes complicated because the second order gradient is a four-dimension tensor. But we can easily extend the first order term in (3.20), the inner product of the gradient with the vector x-x to the matrix case. Remember that the vector

inner product is define as

For the matrix case, this must become the sum

.

Taylor series expansions of multivariate functions

m

iii

i

T

xxgg

1)()(

xxx

x

m

i

m

jjiji

ji

xxg

1 1)(

x

Dr. M. N. H. MOLLAH01-10-11 44

Page 45: Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh 01-10-111Dr. M. N. H. MOLLAH

This is the sum of the products of corresponding elements, just like in the vectorial inner product. This can be nicely presented in matrix form when we remember that for any two matrices, say A and B.

With obvious notation. So, we have

(3.21)

for the first two terms in the Taylor series of a function g of a matrix variable.

Taylor series expansions of multivariate functions

m

i

m

i

m

jjiiitrace

1 1 1

)()()()( ji

TT BABABA

)]()[()()( XXX

XX Tgtracegg

Dr. M. N. H. MOLLAH01-10-11 45