the multivariate gaussian

23
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) The Multivariate Gaussian Prof. Nicholas Zabaras School of Engineering University of Warwick Coventry CV4 7AL United Kingdom Email: [email protected] URL: http://www.zabaras.com/ August 7, 2014 1

Upload: others

Post on 18-May-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Multivariate Gaussian Prof. Nicholas Zabaras

School of Engineering

University of Warwick

Coventry CV4 7AL

United Kingdom

Email: [email protected]

URL: http://www.zabaras.com/

August 7, 2014

1

Page 2: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Multivariate Gaussian Distribution

Mahalanobis Distance, Geometric Interpretation

Derivation of Mean and Moments

Restricted Forms of the Gaussian, 2D Examples and Generalizations

Contents

2

Kevin Murphy, Machine Learning: A probabilistic Perspective, Chapter 4

Chris Bishop, Pattern Recognition and Machine Learning, Chapter 2

Page 3: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

A multivariate is Gaussian if its probability density is

where is symmetric positive definite matrix

(covariance matrix).

,D D D

DX

1/2

11 1( , ) exp ( ) ( )

22 det

T

Dx x

N x |

Multivariate Gaussian

3

Page 4: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Gaussian distribution is invariant under linear

transformations, i.e. for

1 1 1 2 2 2

1 2

~ ( , ), ~ ( , )

, and independent

N NX X

X X

1 2 1 2 1 2~ ( , )T T NAX BX c A B c A A B B

, ,c d dM A B c

Multivariate Gaussian

4

Page 5: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The functional dependence of the Gaussian on x is through

the quadratic form (Mahalanobis distance)

Δ2 = (x − μ)TΣ−1(x − μ)

The Mahalanobis distance from μ to x reduces to the

Euclidean distance when Σ is the identity matrix.

The Gaussian distribution is constant on surfaces in x-space

for which this quadratic form is constant.

Σ can be taken to be symmetric, without loss of generality,

because any antisymmetric component would disappear from

the exponent.

Mahalanobis Distance

5

Page 6: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Now consider the eigenvector equation for the covariance

matrix

where i = 1, . . . , D.

Because Σ is a real, symmetric, its eigenvalues will be real,

and its eigenvectors form an orthonormal set.

Multivariate Gaussian

6

1,

0

T

i i i i j ij

if i jwhere I

otherwise

u u u u

Page 7: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The covariance matrix Σ can be expressed as an expansion

in terms of its eigenvectors

and similarly the inverse covariance matrix Σ−1 can be

expressed as

Multivariate Gaussian

7

1

D

T

i i i

i

u u

1

1

1DT

i i

i i

u u

Page 8: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Mahalanobis distance now becomes:

We can interpret {yi} as a new coordinate system defined by the orthonormal vectors ui that are shifted and rotated with

respect to the original xi coordinates.

Forming the vector y={y1,…,yD}, we have

y = U(x − μ)

where U is a matrix whose rows are given by

U is an orthogonal matrix, UUT = I, UTU = I, I =identity matrix

Multivariate Gaussian

8

2 2

1

1, ( )

DT

i i i

i i

y y

u x

T

iu

Page 9: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Multivariate Gaussian: Geometric Interpretation

9

The quadratic form, and thus the Gaussian density, are constant on ellipsoids,

with their centers at μ and their axes oriented along ui, and with scaling factors in

the directions of the axes given by

Note that the volume within the hyper-ellipsoid above can easily be computed:

1/2

i

2 2

1

1,

( )

D

i

i i

T

i i

y

y

u x

1/2

1/2

1/2 1/2

1 1 1/

| |

| |

i i i

D D DD

i i i D

i i iz y

sphere

dy dz V

VD = volume of

unit shpere

Page 10: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

From and using the orthogonality of U,

we can derive

The Jacobian of the transformation from x to y, is given as:

The square of the determinant of the Jacobian is:

Multivariate Gaussian

10

( )T

j jy u x

1

1 1 1 1

( ) ( )

( )

DT

j j ji i i

i

D D D DT T T

kj j kj ji i i k k k k kj j

j i j j

y U x

U y U U x x x U y

u x

Tiij ij

j

xJ U

y

221T T TJ U U U U U

Page 11: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Also can be written as:

The multivariate Gaussian distribution can now be written

in the y-coordinate system as:

In the y-coordinates, the multivariate Gaussian factorizes

into a product of independent Gaussian distributions. This

verifies that p(y) is correctly normalized!

Multivariate Gaussian

11

1/2 1/2

1

D

i

i

2 2

/2 1/21/21 1

1 1 1( ) ( ) | | exp exp

2 22 | | 2

DDj j

Dj jj jj

y yp p

y x J

2

1/21

1( ) exp 1

22

Dj

j

j jj

yp d dy

y y

Page 12: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The mean of the multivariate Gaussian can be computed

as:

The exponent is an even function of the components of z

and, because the integrals over these are taken over the

range (−∞,∞), the term in z in the factor (z + μ) will vanish

by symmetry. Thus

Mean of the Multivariate Gaussian

12

1

/2 1/2

1

/2 1/2

1 1[ ] exp ( ) ( )

22 | |

1 1exp

22 | |

T

D

T

D

d

d

x x x x x

z z z z

[ ]x

Page 13: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The 2nd moment of the multivariate Gaussian can be

computed as:

Second Moment of the Multivariate Gaussian

13

1

/2 1/2

1

/2 1/2

1

/2 1/2

1

/2 1/2

1 1[ ] exp ( ) ( )

22 | |

1 1exp

22 | |

1 1exp

22 | |

1 1exp

22 | |

T T T

D

TT

D

T T

D

T T

D

d

d

d

d

xx x x xx x

z z z z x

z z z

z z zz z

Page 14: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The terms with are zero due to symmetry.

The first remaining term is equal to from the

normalization of the multivariate Gaussian. It remains to

compute the last term.

Second Moment of the Multivariate Gaussian

14

T Tandz z T

1

/2 1/2

1

/2 1/2

1 1[ ] exp

22 | |

1 1exp

22 | |

T T T

D

T T

D

d

d

xx z z z

z z zz z

Page 15: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

We can simplify using:

Second Moment of the Multivariate Gaussian

15

2

1

/2 /21/2 1/21

1 1 1 1exp exp

2 22 | | 2 | |

DjT T T

D Dj j

yd d

z z zz z zz z

1 1 1 1

D D D DT T

k k kj j k kj j jk j j j

j j j j

x U y or z U y U y or y

z u

2 2

/2 /21/2 1/21 1 1 1 1 1

2

1/2 1/2 1

1

1 1 1 1exp exp

2 22 | | 2 | |

1 1exp

22

D D D D D DT Tk k

i j i j i j i jD Di j k i j kk k

i j

T ki iD

k km

m

y yy y d y y d

y

u u z u u y

u u

The integral with terms dropsdue to symmetry

2 2

1 1

0D D D

T

i i i i

i i

y d

y u u

|J|=1

Page 16: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

In the last step, we used the expression for the 2nd moment

of a Gaussian and:

Second Moment of the Multivariate Gaussian

16

2

/2 1/21 1 1

22 2

/2 1/21 1 1

1 1exp

22 | |

1 1exp 0

22 | |

D D DT k

i j i jDi j k k

D D DT Tk

i i i i i iDi k ik

yy y d

yy d

u u z

u u y u u

( ) [ ] ( ) 0

[ ] var( )

T T

i i i i

T T T

i i i i i i i i i

y y and

var y

u x u

u x u u u u u

Page 17: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

We finally conclude that

From this, we can derive the covariance as

The parameters in the Gaussian distribution increase with

dimensionality. A general symmetric covariance matrix Σ has

D(D + 1)/2 independent parameters.

This together with the D independent parameters in μ, gives

D(D + 3)/2 parameters. This grows quadratically with D.

Second Moment of the Multivariate Gaussian

17

[ ]T T xx

[ ] T

cov - -x x x

Page 18: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

For diagonal covariance matrix we have only 2D total number

of parameters.

The corresponding contours of constant

density are given by axis-aligned ellipsoids.

We could further restrict the covariance

matrix

and in this (isotropic covariance) case we

have a total of D+1 parameters. The

constant density contours are now circles.

Restricted Forms of the Multivariate Gaussian

18

2( )idiag

2 I

Page 19: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Level sets of 2D Gaussians (full, diagonal and spherical covariance matrix)

2D Gaussian

19

full

-5 -4 -3 -2 -1 0 1 2 3 4 5-10

-8

-6

-4

-2

0

2

4

6

8

10

-10

-5

0

5

10

-10

-5

0

5

100

0.05

0.1

0.15

0.2

full

diagonal

-5 -4 -3 -2 -1 0 1 2 3 4 5-10

-8

-6

-4

-2

0

2

4

6

8

10

-5

0

5

-10

-5

0

5

100

0.05

0.1

0.15

0.2

diagonal

spherical

-6 -4 -2 0 2 4 6-5

-4

-3

-2

-1

0

1

2

3

4

5

-5

0

5

-5

0

50

0.05

0.1

0.15

0.2

spherical

gaussPlot2DDemo

from PMTK

Page 20: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Gaussian distribution is flexible (many parameters) but

is limited to unimodal distributions.

Using latent variables (hidden variables, unobserved

variables) allows both of these problems to be addressed.

Restricted Forms of the Multivariate Gaussian

20

Page 21: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Using latent variables (hidden variables, unobserved

variables) allows both of these problems to be addressed:

Multimodal distributions can be obtained by introducing

discrete latent variables (mixtures of Gaussians)

Introduction of continuous latent variables leads to

models in which the number of free parameters can be

controlled independently of the dimensionality D of the

data space while still allowing the model to capture the

dominant correlations in the data set.

These two approaches can be combined leading to

hierarchical models useful in many applications.

Restricted Forms of the Multivariate Gaussian

21

Page 22: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

In probabilistic models of images, we often use the

Gaussian version of the Markov random field.

It is a Gaussian distribution over the joint space of pixel

intensities

It is tractable because of the structure imposed for the

spatial organization of the pixels.

Restricted Forms of the Multivariate Gaussian

22

Page 23: The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Similarly, the linear dynamical system used to model time

series data for tracking, is also a joint Gaussian distribution

over a large number of observed and hidden variables.

It is tractable due to the structure imposed on the

distribution.

Graphical models are often used to introduce the

structure for such complex models.

Restricted Forms of the Multivariate Gaussian

23