the wishart and inverse-wishart distribution

Pankaj Das

M.Sc. (Agricultural Statistics)

Roll no. – 20394

Introduction

Mathematical background

Wishart distribution

Inverse- Wishart distribution.

Relationship of Wishart and Inverse-Wishart distribution

Conclusion

In the modern Era of science and information technology, there has

been a huge influx of high-dimensional data from various fields

such as genomics, environmental sciences, finance and the social

sciences.

In sense of all the many complex relationships and multivariate

dependencies present in the data and formulating correct models and

developing inferential procedures are the major challenges in

modern day statistics.

In parametric models the covariance or correlation matrix (or its

inverse) is the fundamental object that quantifies relationships

between random variables.

Estimating the covariance matrix in a sparse way is crucial in high

dimensional problems and enables the detection of the most

important relationship.

Covariance matrices provide the simplest measure of dependency,

and therefore, much attention has been placed on modelling

covariance matrices. It has a significant impact on statistical

inferences.

In short correlation matrix plays vital role in multivariate statistics.

The correlation/covariance matrix is directly involved in a variety of

statistical models. Estimation of correlation matrix is important.

In the estimation of covariance matrices in multivariate statistics

Wishart & inverse wishart distribution is used.

Suppose is p-random variable drawn from a p-variate normal

distribution with mean vector and covariance matrix ∑.

Then the joint density function of is given by

P α

T

α ααN ( ,= , ..., ~ ) α = 1, ..., ; n X x x μ Σ

...(1)

(α)X

1 2, , nx x x

1 -1- x -μ Σ x -μn i i1 2f x ,…,x μ,Σ = e 1 n p/2 1/2i=1 2π Σ

n1 -1- x -μ Σ x -μi i21 i=1 = enp/2 n/2

2π Σ

μ

x1

x x xx 11 12 1n2x x x. 21 22 2n= =p×n

.

x x. xpnp1 p2xp

X

p×nX Where is called data matrix which is given by

The Mean vector is shown as

p×1

n

α

α=1

1x= x

n

x1

x2

1 .= x +x …+x =1 n2

n .

.

xp

The variance-covariance matrix is given by

Another way

This dispersion matrix is semi-positive definite.

11 1

1

ˆn

p pn

'

1

1ˆ (x )( )1

n

x x xn

The sample covariance matrix

Where,

11 12 1

12 11 2

1 1 1

1 2

1

1

p

np

j jp p j p p

p p pp

s s s

s s sx x x x

n

s s s

s

1

1

1

n

ik ij i kj k

j

s x x x xn

The sampling distribution of the sample covariance matrix S and

follow a distribution which is known as Wishart

distribution.

It is named in honor of John Wishart, who first formulated the

distribution in 1928.

n-1Σ̂= S

n

j

n2(n-1)s = (x -x)(x -x)j=1

j p~W n-1, Σ

The Wishart distribution is a multivariate extension of the Gamma

distribution. It simplifies to a multivariate generalization of the χ2

distribution.

χ2 distribution describes the sum of squares of n draws from a

univariate normal distribution, where the Wishart distribution

represents the sum of squares (and cross-products) of n draws from a

multivariate normal distribution.

It is a family of probability distributions defined over symmetric,

nonnegative-definite matrix-valued random variables.

The Wishart distribution arises as the distribution of the sample

covariance matrix for a sample from a multivariate normal

distribution.

Contd...

Let be k independent random p-vectors . Each having a

p-variate normal distribution with mean vector and covariance

matrix

Let

Then U is said to have the p-variate Wishart distribution with k

degrees of freedom and the covariance matrix , where .

1 2 kz ,z ,…,z

1 0 p

p p

' ' '

1 1 2 2 k kp×p

=z z +z z +…+z zU

~ pU W k

p×10

p p

The joint density of the p-variate Wishart distribution is

where is a multivariate gamma function.

i.e.

This is known as central Wishart distribution.

1 /212

/2/2

exp

2 / 2

k p

U kkpp pp

u tr uf u

k

1 /4

1

/ 2 1 / 2p

p p

p

j

k k j

...(2)

...(3)(.)

Contd...

(.)

Let

Then where M is a matrix with columns

Where &

This is called non-central Wishart distribution.

, . .., ~ N ( , ) ; 1,2,...,kT

pX x x

'

1

~ ( , ,M)k

px x W k

p k 1 2, ,..., k

11 1

'

1

k

p pk

x x

x x

11 1

1

' ( ')

k

p pk

M

When M = 0 then Wp is called central wishart distribution & we

write as

It can be easily checked that when p = 1 and ∑= 1 then the Wishart

distribution becomes the chi-square distribution with k degrees of

freedom. Note that we must have k> p-1 to ensure ∑ is invertible. If

k > p-1 does not hold, is called singular Wishart distribution due to

∑ being a singular matrix.

(k, )pW

The expected value of S is

...(4)

The expected value of a Wishart distribution depends on number of

draws one makes from the multivariate normal distribution.

In comparison, the expected value of a distribution is k, so that

the only differences between a Wishart expectation and a

expectation are the underlying dimensionality of the data and a

scale component.

(S) k

2 ( )k

2 ( )k

~ W(k , )i iS Let 1 ,...,i n where

We can find the individual variances of the elements of S. for instance,the variance of the ijth element of S is:

...(5)

Where is the ijth element of the matrix and can be thought of as the

population covariance between variable i and variable j. When p = 1, so

that the only element of the variance/covariance matrix is

Therefore, we get Var(X) = k (1 + 1 x 1) = 2k, which is the familiar

variance of a variable.

2 ij ij ii j jVar S k

ij thij

2 k

thij

2

11 11 1

ij

2 ( )k

Equation (5) is a set of variances rather than depicting the

variance/covariance matrix because every observation of the Wishart

distribution is a matrix.

Therefore, describing all combinations of variances and covariance

of S requires either an array of higher order or a Kronecker

operation to represent that higher order array as a matrix

In 2007 Eaton et al. present a new idea to derive for derivation the

covariance matrix of S. The covariance matrix of S can be represented

as

Where is the Cholesky decomposition of the (square,symmetric) matrix , and

1

Cov(S) cov( )k

T

i i

i

x x

1

( )k

T

i i

i

Cov x x

(Cz C )T T

i ikCov z

TCC

T

i i pE z z I

...(6)

Applying the vector(vec) operator to S, which forms a long vector

by stacking the columns of S, so that Cov[vec(s)] is a matrix rather

than an array, so we have

k T TCov vec S Cov vec Czz C

TkCov C C vec zz

TTk C C Cov vec zz C C

T Tk C C Cov z z C C

(by the vec to Kronecker property)

(by vec and Kronecker properties)

z

To determine cov [vec(s)] (as a proxy for cov(s)), one would only

need to know

(1) The variance of (where n is any element of z);

(2) The variance of (where are any two elements of z);

(3) The covariance between & ;

(4) The covariance between & and

(5) The covariance between and (where at most two of i, j, n, or

o are the same).

Cov z z

n oZ Z n o

n oZ Zo nZ Z

n oZ Z

2

nZ 2

oZ

2

nZ

1. is standard normally distributed, so follows a distribution

variance equal to 2(1) = 2. Therefore, = 2 for all k.

2. & are uncorrelated standard normal random variables, which

implies that they are also independent. Therefore,

due to independence

Here & both follow a distribution.

nZ

2 )(1

2 2 2 2 2 0n o n o n oVar Z Z E Z Z E Z E Z E Z Z E Z E Z

2 2( ) ( )( ) 1 1 1n o n oVar Z Z E Z Z

2 )(1

nZ

2( )nV Z

2 22 2— — 1— 0 1n o o n n o o n n o o n n o n oCov Z Z Z Z E Z Z Z Z E Z Z E Z Z E Z Z E Z E Z

3. & uncorrelated standard normal random variables,

so

4. 2 2 2 2 2 2( 1 1 0, ) —n o n o n oCov Z Z E Z Z E Z E Z

5. , 0i j n oCov Z Z Z Z

Therefore, the [p(n-1)+ n,p(n -1)+n] elements of will all be 2

because and the remaining diagonal elements of

will all be 1 because for all k = l. The off-diagonal elements

must be 0 except for those elements symbolizing the covariance

between and , which will be 1.

Ultimately can be written as

is a matrix of 1s and 0s.

2 2kVar Z

1n oVar Z Z

p p pI I M

2 2 p ppM

i jZ Z j iZ Z

pM 2 2 p p

i jZ Z j iZ Z

Therefore,

T TCov vec S k C C Cov z z C C

T T

p p pk C C I I M C C

T T T T

pk C C C C C C M C C

T T

pk C C M C C ...(7)

And we can check the derivation by simulating draws from a

Wishart distribution and comparing the simulated covariance with

the empirical covariance matrix calculated using equation (7).

Analysis is done with the help of R console.

Results shows that replications must be very large to be for theempirical covariance matrix to be close to the theoretical covariancematrix.

Some Theorems of Wishart Distribution

Some Theorems on Wishart Distribution

Wishart distribution have great importance in the estimation of

covariance matrices in multivariate statistics.

The Wishart distribution is frequently used as the prior on the

precision matrix parameter of a multivariate normal distribution.

The Wishart distribution arises as models for random variation and

descriptions of uncertainty about variance and precision matrices. They

are of particular interest in sampling and inference on covariance and

association structure in multivariate normal models, and in ranges of

extensions in regression.

The Inverse-Wishart distribution is the multivariate extension of the

Inverse-Gamma distribution.

Even though the Wishart distribution generates sums of squares

matrices, one can think of the Inverse-Wishart distribution as

generating random covariance matrices.

However, those covariance matrices would be inverses of the

covariance matrices generated under Wishart distribution.

Let T ~ InvWishp

Where denotes a positive definite scale matrix, m denotes the degrees of

freedom, and p indicates the dimensions of T (i.e. ).

Then T is positive definite with probability density function is given as

...(8)

m2| Ψ | 1 -1= exp[- tr(ΨT )]

m+p+1 mp 2m2 2| T | 2 Γ ( )p

f

2

T =

( , )m

Ψp×pT R

The expected value of T is

...(9)

In respect to the distribution, the only differences between the

Inverse-Wishart expectation and the inverse- expectation are the

dimensionality of the data and a scale component.

The Inverse-Wishart distribution has finite expectation only when m >

(p -1).

( )1

Tm p

2χ

The variance of the ijth element of T is

Where is the element of the matrix

If p = 1, then only element of the variance/covariance matrix is given by

The variance expression is given by

Which is same as the variance of variable.

2

2

( 1) ( 1)(T )

( )( 1) ( 3)

ij ii jj

ij

m p m pVar

m p m p m p

2

11 11 1

2 2 2

( ) 1 ( 2) 1 1 2( 1) 2( )

( 1)( 2) ( 4) ( 1)( 2) ( 4) ( 2) ( 4)

m m mVar X

m m m m m m m m

2Inv-χ (m)

ijψ

k ~ ( , )pS W

1 1~ ),(pInvWish mS

The Wishart distribution is related to the normal distribution, Chi-square

distribution and Gamma distribution . The Inverse-Wishart distribution

is related to the those distributions in a similar way.

let

then

Where m=k is the degrees of freedom

The Inverse-Wishart distribution is frequently used as the prior on

the variance/covariance matrix parameter (S) of a multivariate

normal distribution. Note that the Inverse-Gamma distribution is the

conjugate prior for the variance parameter of a univariate normal

distribution, and the Inverse-Wishart distribution (as its multivariate

generalization) extends conjugacy to the multivariate normal

distribution.

The Wishart and Inverse-Wishart distribution is an important

distribution having a certain good and useful statistical properties.

These distributions have important role in estimating parameter in

multivariate studies.

Wishart distribution help to develop a framework for bayesian

inference for Gaussian covariance graph models.

Hence this work completes the powerful theory that has been developed

in the mathematical statistics literature for decomposable models. This

models help to draw a valid inference.

The generalized Wishart process (GWP) – which we used to model

time-varying covariance matrices σ(t). In the future, the GWP could be

applied to study how Σ depends on covariates like interest rates.

Conclusion:

Anderson, T. W.(2003). An Introduction to Multivariate StatisticalAnalysis. Wiley India (P) Ltd., New Delhi.

Chatfield, C. and Collins, A. J.(1980). Introduction to Multivariate

Analysis. Chapman and Hall-CRC, London.

Chib, S. and Greenberg, E.(1998). Bayesian analysis of multivariate

probit models. Biometrika , 85, 347-361.

Cook, R. D.(2011). On the mean and variance of the generalized

inverse of a singular Wishart matrix . Electronic Journal of Statistics

, 5, 146-158.

Eaton, M. L.(2007). Multivariate statistics: A vector space approach , .

Wiley India (P) Ltd., New Delhi.

Nydick, S. W.(2012). The Wishart and inverse Wishart distributions.

International Journal of Electronics and Communication,

22, 119-139.

Pourahmadi, M., Daniels, M. J. and Park, T.(2006). Simultaneously

modelling of the cholesky decomposition of several covariance

matrices. Journal of Multivariate Analysis, 97, 125-135.

Rao, C. R.(1965). Linear statistical inference and its applications,Wiley India (P) Ltd., New Delhi.

It is a decomposition of hermitian positive definitematrix into the product of lower triangular matrixand it conjugate transpose.

The cholesky decomposition of a hermitian positive-definite matrix a is a decomposition of the form.

Where L is a lower triangular matrix with real andpositive diagonal entries, and L* denotes theconjugate transpose of L

A LL

Cholesky decomposition

R CODE

Any

Question

s ????

the wishart and inverse-wishart distribution

Education