the multivariate gaussian
TRANSCRIPT
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The Multivariate Gaussian Prof. Nicholas Zabaras
School of Engineering
University of Warwick
Coventry CV4 7AL
United Kingdom
Email: [email protected]
URL: http://www.zabaras.com/
August 7, 2014
1
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Multivariate Gaussian Distribution
Mahalanobis Distance, Geometric Interpretation
Derivation of Mean and Moments
Restricted Forms of the Gaussian, 2D Examples and Generalizations
Contents
2
Kevin Murphy, Machine Learning: A probabilistic Perspective, Chapter 4
Chris Bishop, Pattern Recognition and Machine Learning, Chapter 2
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
A multivariate is Gaussian if its probability density is
where is symmetric positive definite matrix
(covariance matrix).
,D D D
DX
1/2
11 1( , ) exp ( ) ( )
22 det
T
Dx x
N x |
Multivariate Gaussian
3
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The Gaussian distribution is invariant under linear
transformations, i.e. for
1 1 1 2 2 2
1 2
~ ( , ), ~ ( , )
, and independent
N NX X
X X
1 2 1 2 1 2~ ( , )T T NAX BX c A B c A A B B
, ,c d dM A B c
Multivariate Gaussian
4
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The functional dependence of the Gaussian on x is through
the quadratic form (Mahalanobis distance)
Δ2 = (x − μ)TΣ−1(x − μ)
The Mahalanobis distance from μ to x reduces to the
Euclidean distance when Σ is the identity matrix.
The Gaussian distribution is constant on surfaces in x-space
for which this quadratic form is constant.
Σ can be taken to be symmetric, without loss of generality,
because any antisymmetric component would disappear from
the exponent.
Mahalanobis Distance
5
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Now consider the eigenvector equation for the covariance
matrix
where i = 1, . . . , D.
Because Σ is a real, symmetric, its eigenvalues will be real,
and its eigenvectors form an orthonormal set.
Multivariate Gaussian
6
1,
0
T
i i i i j ij
if i jwhere I
otherwise
u u u u
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The covariance matrix Σ can be expressed as an expansion
in terms of its eigenvectors
and similarly the inverse covariance matrix Σ−1 can be
expressed as
Multivariate Gaussian
7
1
D
T
i i i
i
u u
1
1
1DT
i i
i i
u u
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The Mahalanobis distance now becomes:
We can interpret {yi} as a new coordinate system defined by the orthonormal vectors ui that are shifted and rotated with
respect to the original xi coordinates.
Forming the vector y={y1,…,yD}, we have
y = U(x − μ)
where U is a matrix whose rows are given by
U is an orthogonal matrix, UUT = I, UTU = I, I =identity matrix
Multivariate Gaussian
8
2 2
1
1, ( )
DT
i i i
i i
y y
u x
T
iu
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Multivariate Gaussian: Geometric Interpretation
9
The quadratic form, and thus the Gaussian density, are constant on ellipsoids,
with their centers at μ and their axes oriented along ui, and with scaling factors in
the directions of the axes given by
Note that the volume within the hyper-ellipsoid above can easily be computed:
1/2
i
2 2
1
1,
( )
D
i
i i
T
i i
y
y
u x
1/2
1/2
1/2 1/2
1 1 1/
| |
| |
i i i
D D DD
i i i D
i i iz y
sphere
dy dz V
VD = volume of
unit shpere
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
From and using the orthogonality of U,
we can derive
The Jacobian of the transformation from x to y, is given as:
The square of the determinant of the Jacobian is:
Multivariate Gaussian
10
( )T
j jy u x
1
1 1 1 1
( ) ( )
( )
DT
j j ji i i
i
D D D DT T T
kj j kj ji i i k k k k kj j
j i j j
y U x
U y U U x x x U y
u x
Tiij ij
j
xJ U
y
221T T TJ U U U U U
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Also can be written as:
The multivariate Gaussian distribution can now be written
in the y-coordinate system as:
In the y-coordinates, the multivariate Gaussian factorizes
into a product of independent Gaussian distributions. This
verifies that p(y) is correctly normalized!
Multivariate Gaussian
11
1/2 1/2
1
D
i
i
2 2
/2 1/21/21 1
1 1 1( ) ( ) | | exp exp
2 22 | | 2
DDj j
Dj jj jj
y yp p
y x J
2
1/21
1( ) exp 1
22
Dj
j
j jj
yp d dy
y y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The mean of the multivariate Gaussian can be computed
as:
The exponent is an even function of the components of z
and, because the integrals over these are taken over the
range (−∞,∞), the term in z in the factor (z + μ) will vanish
by symmetry. Thus
Mean of the Multivariate Gaussian
12
1
/2 1/2
1
/2 1/2
1 1[ ] exp ( ) ( )
22 | |
1 1exp
22 | |
T
D
T
D
d
d
x x x x x
z z z z
[ ]x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The 2nd moment of the multivariate Gaussian can be
computed as:
Second Moment of the Multivariate Gaussian
13
1
/2 1/2
1
/2 1/2
1
/2 1/2
1
/2 1/2
1 1[ ] exp ( ) ( )
22 | |
1 1exp
22 | |
1 1exp
22 | |
1 1exp
22 | |
T T T
D
TT
D
T T
D
T T
D
d
d
d
d
xx x x xx x
z z z z x
z z z
z z zz z
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The terms with are zero due to symmetry.
The first remaining term is equal to from the
normalization of the multivariate Gaussian. It remains to
compute the last term.
Second Moment of the Multivariate Gaussian
14
T Tandz z T
1
/2 1/2
1
/2 1/2
1 1[ ] exp
22 | |
1 1exp
22 | |
T T T
D
T T
D
d
d
xx z z z
z z zz z
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
We can simplify using:
Second Moment of the Multivariate Gaussian
15
2
1
/2 /21/2 1/21
1 1 1 1exp exp
2 22 | | 2 | |
DjT T T
D Dj j
yd d
z z zz z zz z
1 1 1 1
D D D DT T
k k kj j k kj j jk j j j
j j j j
x U y or z U y U y or y
z u
2 2
/2 /21/2 1/21 1 1 1 1 1
2
1/2 1/2 1
1
1 1 1 1exp exp
2 22 | | 2 | |
1 1exp
22
D D D D D DT Tk k
i j i j i j i jD Di j k i j kk k
i j
T ki iD
k km
m
y yy y d y y d
y
u u z u u y
u u
The integral with terms dropsdue to symmetry
2 2
1 1
0D D D
T
i i i i
i i
y d
y u u
|J|=1
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
In the last step, we used the expression for the 2nd moment
of a Gaussian and:
Second Moment of the Multivariate Gaussian
16
2
/2 1/21 1 1
22 2
/2 1/21 1 1
1 1exp
22 | |
1 1exp 0
22 | |
D D DT k
i j i jDi j k k
D D DT Tk
i i i i i iDi k ik
yy y d
yy d
u u z
u u y u u
( ) [ ] ( ) 0
[ ] var( )
T T
i i i i
T T T
i i i i i i i i i
y y and
var y
u x u
u x u u u u u
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
We finally conclude that
From this, we can derive the covariance as
The parameters in the Gaussian distribution increase with
dimensionality. A general symmetric covariance matrix Σ has
D(D + 1)/2 independent parameters.
This together with the D independent parameters in μ, gives
D(D + 3)/2 parameters. This grows quadratically with D.
Second Moment of the Multivariate Gaussian
17
[ ]T T xx
[ ] T
cov - -x x x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
For diagonal covariance matrix we have only 2D total number
of parameters.
The corresponding contours of constant
density are given by axis-aligned ellipsoids.
We could further restrict the covariance
matrix
and in this (isotropic covariance) case we
have a total of D+1 parameters. The
constant density contours are now circles.
Restricted Forms of the Multivariate Gaussian
18
2( )idiag
2 I
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Level sets of 2D Gaussians (full, diagonal and spherical covariance matrix)
2D Gaussian
19
full
-5 -4 -3 -2 -1 0 1 2 3 4 5-10
-8
-6
-4
-2
0
2
4
6
8
10
-10
-5
0
5
10
-10
-5
0
5
100
0.05
0.1
0.15
0.2
full
diagonal
-5 -4 -3 -2 -1 0 1 2 3 4 5-10
-8
-6
-4
-2
0
2
4
6
8
10
-5
0
5
-10
-5
0
5
100
0.05
0.1
0.15
0.2
diagonal
spherical
-6 -4 -2 0 2 4 6-5
-4
-3
-2
-1
0
1
2
3
4
5
-5
0
5
-5
0
50
0.05
0.1
0.15
0.2
spherical
gaussPlot2DDemo
from PMTK
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The Gaussian distribution is flexible (many parameters) but
is limited to unimodal distributions.
Using latent variables (hidden variables, unobserved
variables) allows both of these problems to be addressed.
Restricted Forms of the Multivariate Gaussian
20
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Using latent variables (hidden variables, unobserved
variables) allows both of these problems to be addressed:
Multimodal distributions can be obtained by introducing
discrete latent variables (mixtures of Gaussians)
Introduction of continuous latent variables leads to
models in which the number of free parameters can be
controlled independently of the dimensionality D of the
data space while still allowing the model to capture the
dominant correlations in the data set.
These two approaches can be combined leading to
hierarchical models useful in many applications.
Restricted Forms of the Multivariate Gaussian
21
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
In probabilistic models of images, we often use the
Gaussian version of the Markov random field.
It is a Gaussian distribution over the joint space of pixel
intensities
It is tractable because of the structure imposed for the
spatial organization of the pixels.
Restricted Forms of the Multivariate Gaussian
22
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Similarly, the linear dynamical system used to model time
series data for tracking, is also a joint Gaussian distribution
over a large number of observed and hidden variables.
It is tractable due to the structure imposed on the
distribution.
Graphical models are often used to introduce the
structure for such complex models.
Restricted Forms of the Multivariate Gaussian
23