Download - Bayesian Gaussian

8/12/2019 Bayesian Gaussian

1/22

Multivariate Normal Distribution


Brunero LiseoSapienza Universita di Roma,

[email protected]

February 10, 2014

1/22
http://find/


2/22


Outline

2/22
http://find/


3/22


Inference on N ( , )

LetX 1, . . . , X n

iid N p( , ).

with density

f x (x ; , ) = 1

(2) p/ 2 | |1/ 2 exp

12

(x ) 1(x )

Likelihood is

L( , ) 1| |n/ 2

exp 12

n

i=1(x i )

1(x i )

3/22
http://find/


4/22


Alternative expression of quadratic form

n

i=1(x i )

1(x i )

=

n

i=1 (x

i x

) 1

(x

i x

)

=n

i=1(x i x )

1(x i x ) + n( x ) 1(x )

= tr 1n

i=1(x i x )( x i x ) + n(x ) 1( x )

= tr 1S + n( x ) 1( x )

Then4/22
http://find/


6/22


Wishart distribution

(Wk(m, )) has support the space of all positive denitesymmetric matrices.We say that the square k-dimensional matrix V , positive denite,has Wishart distribution with m dof and scale parameter ,positive

denite matrix, and we denote it by W k(m, ), if the density is

f (V ) = 1

2mk/ 2k (m/ 2)| |m/ 2|V | (m k 1) / 2 exp

12

tr 1V ,

with

k(u) = k(k 1) / 4

k

i=1 u

12

(i 1) , u > k 1

2 .

6/22

M l i i N l Di ib i
http://find/


7/22


Construction of a random Wishart matrix

Let (Z 1, , Z m ) iid N k (0 , I ); then the quantity

W =m

i=1

Z i Z i

has a Wishart distribution W k(m, I ).

The diagonal element of W, say W jj follows a 2m distribution.

Starting from (Z 1, , Z m ) iid N k (0 , ) we can obtain a more

general W .

7/22

M lti i t N l Di t ib ti
http://find/


8/22


Inverse Wishart

An Inverse Wishart r.v. (W 1k (m, ), (say IW) has support the

space of all symmetric positive denite matrices.A IW r.v. describes the the distribution of the inverse of aWishart matrix.In Bayesian statistics it is often used as the conjugate prior forthe covariance matrix of a multivariate Gaussian model.

8/22

http://find/


9/22


Let V W k(m, ) . Since V is pos. def. with prob. 1, it is easyto compute the density function of Z = V 1:

f (Z ) = |Z | (m + k+1) / 2

2mk/ 2k(m/ 2)| |m/ 2 exp 12 tr

1Z

1.

Also

E ( Z ) = 1

m k 1.

9/22

http://find/http://goback/


10/22


A useful Lemma

Lemma .Let A and B be positive real numbers and let a and b be any real

numbers. Then

A( a)2 + B ( b)2 = ( A + B ) aA + bB

A + B

2

+ ABA + B

(a b)2

(1)

Proof; see later

10/22



11/22


(Multivariate version)

Let x , a , b vectors in R k and let A , B be symmetric matrices k ks.t. (A + B ) 1 exists. Then,

(x

a

)A

(x

a

) + (x

b

)B

(x

b

)

= ( x c ) (A + B )( x c ) + ( a b ) A (A + B ) 1B (a b )

wherec = ( A + B ) 1(Aa + Bb )

When x R the result is exactly (1)

11/22

http://find/


12/22


Proof

(x a ) A (x a ) + ( x b ) B (x b )

= x (A + B )x 2x (Aa + Bb ) + a Aa + b Bb

Add and remove c (A + B )c ,

(x c ) (A + B ) (x c ) + G ,

where G

=a Aa

+b Bb

c

(A

+B

)c

. Alsoc (A + B )c = ( Aa + Bb ) (A + B ) 1 (Aa + Bb ) =

(add and remove Ab in the rst and third factors)

12/22



13/22

[A (a b ) + ( A + B )b ] (A + B ) 1 [(A + B )a B (a b )]

= (a b ) A (A + B ) 1B (a b )+( a b ) Aa + b (A + B )a b B (a b

(a b ) A (A + B ) 1B (a b ) + a Aa + b Bb ;

Therefore

G = ( a b ) A (A + B )

1B (a b ) .

13/22



14/22

Dickeys TheoremTheorem .Let X be a k-dimensional random vector and Y be a scalar r.v.such that

X | Y N k ( , Y ), Y GI (a, b);

Then the marginal distribution of X is Multivariate Student

X St k 2a, , ba

.

In particular, setting a = / 2 e b = 1 / 2, then

Y 1 2 ; X St k (, , / ).

Proof: Easy.14/22

http://find/


15/22

The Posterior

Using the Lemma, one gets | , x N p(

, (c + n) 1 ), with

= c + n x

c + n

and |x IW p( + n, ), where

= S +

1 + ncn + c( x ) ( x )

1

15/22

http://find/


16/22

The hyperparameters

We need to specify the following parameters: , the prior mean for , the most reasonaable estimate beforethe experiment;

c; the degree of believe in your elicitation about ; smallervalues of c makes the prior less informative; and m represent the hyper-parameters about 1; theycan be elicitated by taking into account the moments of anInverse WIshart inversa: for example,

E ( ) = 1

p m 1

16/22

http://find/


17/22

Non informative case

You get a noninformative prior if you set the Hyper-parametersequal to zero

Whenc 0, 1 = 0 , = 0 ,

you get the Jeffreys prior

( , ) = det( I ( , )) 1

| |p +1

2

17/22

http://find/


18/22

Consequences of the use of the Jeffreys prior is positive denite and symmetric. Using the Spectral Dec.Thm one can write as = H DH , where H is a matrix whosecolumns are the eigenvectors and D is the diagonal matrix witheigenvalues in a non increasing order

H H = I p D = diag (1, . . . , p).

Then, assuming that all the eigenvalues are different,

( )d = (H , D )I [ 1 > 2 > > p ]dH dD

With a change of variable,

(H , D ) = (H DH )i


19/22

Then

J (, ) = J (, H , D ) 1D |

p +12 i


20/22

Gibbs sampling for N ( , )

Also in the multivariate case, it can be useful and convenient

to adopt a computational approach rather then perform closedform calculations.this solution is particularly important when you are interestedin functions of and .

20/22

http://find/


21/22

Full conditionals

We need to write down the two full conditionals that is | , x | , x .

The rst one is already known.

| , x N p( , (c + n) 1 ), (2)

The second one can be easily seen to be

| , x IW p m + n, 1 +

n

i=1(x i )(x i ) (3)

21/22

http://find/


22/22

R code

22/22
http://find/

Download - Bayesian Gaussian

Top Related