the multivariate gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Multivariate Gaussian Prof. Nicholas Zabaras

School of Engineering

University of Warwick

Coventry CV4 7AL

United Kingdom

Email: [email protected]

URL: http://www.zabaras.com/

August 7, 2014

1

mailto:[email protected]

http://www.zabaras.com/

http://www.zabaras.com/


Multivariate Gaussian Distribution

Mahalanobis Distance, Geometric Interpretation

Derivation of Mean and Moments

Restricted Forms of the Gaussian, 2D Examples and Generalizations

Contents

2

Kevin Murphy, Machine Learning: A probabilistic Perspective, Chapter 4

Chris Bishop, Pattern Recognition and Machine Learning, Chapter 2

http://www.cs.ubc.ca/~murphyk/MLbook/

http://research.microsoft.com/en-us/um/people/cmbishop/prml/


A multivariate is Gaussian if its probability density is

where is symmetric positive definite matrix

(covariance matrix).

,D D D

DX

1/2

11 1( , ) exp ( ) ( )

22 det

T

Dx x

N x |

Multivariate Gaussian

3

http://en.wikipedia.org/wiki/Multivariate_normal_distribution


The Gaussian distribution is invariant under linear

transformations, i.e. for

1 1 1 2 2 2

1 2

~ ( , ), ~ ( , )

, and independent

N NX X

X X

1 2 1 2 1 2~ ( , )T T NAX BX c A B c A A B B

, ,c d dM A B c


4


The functional dependence of the Gaussian on x is through

the quadratic form (Mahalanobis distance)

Δ2 = (x − μ)TΣ−1(x − μ)

The Mahalanobis distance from μ to x reduces to the

Euclidean distance when Σ is the identity matrix.

The Gaussian distribution is constant on surfaces in x-space

for which this quadratic form is constant.

Σ can be taken to be symmetric, without loss of generality,

because any antisymmetric component would disappear from

the exponent.

Mahalanobis Distance

5


Now consider the eigenvector equation for the covariance

matrix

where i = 1, . . . , D.

Because Σ is a real, symmetric, its eigenvalues will be real,

and its eigenvectors form an orthonormal set.


6

1,

0

T

i i i i j ij

if i jwhere I

otherwise

u u u u


The covariance matrix Σ can be expressed as an expansion

in terms of its eigenvectors

and similarly the inverse covariance matrix Σ−1 can be

expressed as


7

1

D

T

i i i

i

u u

1

1

1DT

i i

i i

u u


The Mahalanobis distance now becomes:

We can interpret {yi} as a new coordinate system defined by the orthonormal vectors ui that are shifted and rotated with

respect to the original xi coordinates.

Forming the vector y={y1,…,yD}, we have

y = U(x − μ)

where U is a matrix whose rows are given by

U is an orthogonal matrix, UUT = I, UTU = I, I =identity matrix


8

2 2

1

1, ( )

DT

i i i

i i

y y

u x

T

iu


Multivariate Gaussian: Geometric Interpretation

9

The quadratic form, and thus the Gaussian density, are constant on ellipsoids,

with their centers at μ and their axes oriented along ui, and with scaling factors in

the directions of the axes given by

Note that the volume within the hyper-ellipsoid above can easily be computed:

1/2

i

2 2

1

1,

( )

D

i

i i

T

i i

y

y

u x

1/2

1/2

1/2 1/2

1 1 1/

| |

| |

i i i

D D DD

i i i D

i i iz y

sphere

dy dz V

VD = volume of

unit shpere


From and using the orthogonality of U,

we can derive

The Jacobian of the transformation from x to y, is given as:

The square of the determinant of the Jacobian is:


10

( )T

j jy u x

1

1 1 1 1

( ) ( )

( )

DT

j j ji i i

i

D D D DT T T

kj j kj ji i i k k k k kj j

j i j j

y U x

U y U U x x x U y

u x

Tiij ij

j

xJ U

y

221T T TJ U U U U U


Also can be written as:

The multivariate Gaussian distribution can now be written

in the y-coordinate system as:

In the y-coordinates, the multivariate Gaussian factorizes

into a product of independent Gaussian distributions. This

verifies that p(y) is correctly normalized!


11

1/2 1/2

1

D

i

i

2 2

/2 1/21/21 1

1 1 1( ) ( ) | | exp exp

2 22 | | 2

DDj j

Dj jj jj

y yp p

y x J

2

1/21

1( ) exp 1

22

Dj

j

j jj

yp d dy

y y


The mean of the multivariate Gaussian can be computed

as:

The exponent is an even function of the components of z

and, because the integrals over these are taken over the

range (−∞,∞), the term in z in the factor (z + μ) will vanish

by symmetry. Thus

Mean of the Multivariate Gaussian

12

1

/2 1/2

1

/2 1/2

1 1[ ] exp ( ) ( )

22 | |

1 1exp

22 | |

T

D

T

D

d

d

x x x x x

z z z z

[ ]x


The 2nd moment of the multivariate Gaussian can be

computed as:

Second Moment of the Multivariate Gaussian

13

1

/2 1/2

1

/2 1/2

1

/2 1/2

1

/2 1/2

1 1[ ] exp ( ) ( )

22 | |

1 1exp

22 | |

1 1exp

22 | |

1 1exp

22 | |

T T T

D

TT

D

T T

D

T T

D

d

d

d

d

xx x x xx x

z z z z x

z z z

z z zz z


The terms with are zero due to symmetry.

The first remaining term is equal to from the

normalization of the multivariate Gaussian. It remains to

compute the last term.


14

T Tandz z T

1

/2 1/2

1

/2 1/2

1 1[ ] exp

22 | |

1 1exp

22 | |

T T T

D

T T

D

d

d

xx z z z

z z zz z


We can simplify using:


15

2

1

/2 /21/2 1/21

1 1 1 1exp exp

2 22 | | 2 | |

DjT T T

D Dj j

yd d

z z zz z zz z

1 1 1 1

D D D DT T

k k kj j k kj j jk j j j

j j j j

x U y or z U y U y or y

z u

2 2

/2 /21/2 1/21 1 1 1 1 1

2

1/2 1/2 1

1

1 1 1 1exp exp

2 22 | | 2 | |

1 1exp

22

D D D D D DT Tk k

i j i j i j i jD Di j k i j kk k

i j

T ki iD

k km

m

y yy y d y y d

y

u u z u u y

u u

The integral with terms dropsdue to symmetry

2 2

1 1

0D D D

T

i i i i

i i

y d

y u u

|J|=1


In the last step, we used the expression for the 2nd moment

of a Gaussian and:


16

2

/2 1/21 1 1

22 2

/2 1/21 1 1

1 1exp

22 | |

1 1exp 0

22 | |

D D DT k

i j i jDi j k k

D D DT Tk

i i i i i iDi k ik

yy y d

yy d

u u z

u u y u u

( ) [ ] ( ) 0

[ ] var( )

T T

i i i i

T T T

i i i i i i i i i

y y and

var y

u x u

u x u u u u u


We finally conclude that

From this, we can derive the covariance as

The parameters in the Gaussian distribution increase with

dimensionality. A general symmetric covariance matrix Σ has

D(D + 1)/2 independent parameters.

This together with the D independent parameters in μ, gives

D(D + 3)/2 parameters. This grows quadratically with D.


17

[ ]T T xx

[ ] T

cov - -x x x


For diagonal covariance matrix we have only 2D total number

of parameters.

The corresponding contours of constant

density are given by axis-aligned ellipsoids.

We could further restrict the covariance

matrix

and in this (isotropic covariance) case we

have a total of D+1 parameters. The

constant density contours are now circles.

Restricted Forms of the Multivariate Gaussian

18

2( )idiag

2 I


Level sets of 2D Gaussians (full, diagonal and spherical covariance matrix)

2D Gaussian

19

full

-5 -4 -3 -2 -1 0 1 2 3 4 5-10

-8

-6

-4

-2

0

2

4

6

8

10

-10

-5

0

5

10

-10

-5

0

5

100

0.05

0.1

0.15

0.2

full

diagonal

-5 -4 -3 -2 -1 0 1 2 3 4 5-10

-8

-6

-4

-2

0

2

4

6

8

10

-5

0

5

-10

-5

0

5

100

0.05

0.1

0.15

0.2

diagonal

spherical

-6 -4 -2 0 2 4 6-5

-4

-3

-2

-1

0

1

2

3

4

5

-5

0

5

-5

0

50

0.05

0.1

0.15

0.2

spherical

gaussPlot2DDemo

from PMTK

https://code.google.com/p/pmtk3/source/browse/trunk/demos/bookDemos/The_multivariate_Gaussian_and_friends/gaussPlot2Ddemo.m?r=2551

https://code.google.com/p/pmtk3/


The Gaussian distribution is flexible (many parameters) but

is limited to unimodal distributions.

Using latent variables (hidden variables, unobserved

variables) allows both of these problems to be addressed.


20


Using latent variables (hidden variables, unobserved

variables) allows both of these problems to be addressed:

Multimodal distributions can be obtained by introducing

discrete latent variables (mixtures of Gaussians)

Introduction of continuous latent variables leads to

models in which the number of free parameters can be

controlled independently of the dimensionality D of the

data space while still allowing the model to capture the

dominant correlations in the data set.

These two approaches can be combined leading to

hierarchical models useful in many applications.


21


In probabilistic models of images, we often use the

Gaussian version of the Markov random field.

It is a Gaussian distribution over the joint space of pixel

intensities

It is tractable because of the structure imposed for the

spatial organization of the pixels.


22


Similarly, the linear dynamical system used to model time

series data for tracking, is also a joint Gaussian distribution

over a large number of observed and hidden variables.

It is tractable due to the structure imposed on the

distribution.

Graphical models are often used to introduce the

structure for such complex models.


23

the multivariate gaussian

Documents