multivariate normal distribution { i · pdf filemultivariate normal distribution { i ......

Multivariate Normal Distribution – I

• We will almost always assume that the joint distribution ofthe p×1 vectors of measurements on each sample unit is thep-dimensional multivariate normal distribution.

• The MVN assumption is often appropriate:

– Variables can sometimes be assumed to be multivariatenormal (perhaps after transformation)

– Central limit theorem tells us that distribution of manymultivariate sample statistics is approximately normal, re-gardless of the form of the population distribution.

• As a bonus, the MVN assumption leads to tractable results(but mathematical convenience should not be the reason forchoosing the MVN as the probability model).

107

Multivariate Normal Distribution

• The MVN is a generalization of the univariate normal

distribution for the case p ≥ 2.

• Recall that if X is normal with mean µ and variance σ2,

and its density function is given by

f(x) =1

(2πσ2)1/2exp[−

1

2σ2(x− µ)2], −∞ < x <∞.

• We can write the kernel of the density as:

exp[−1

2σ2(x− µ)2] = exp[−

1

2(x− µ)′(σ2)−1(x− µ)].

108


Now consider a p× 1 random vector X = [X1, X2, . . . , Xp]′.

The kernel shown above generalizes to

exp[−1

2(x− µ)′Σ−1(x− µ)],

where

x =

x1x2...xp

and µ =

µ1µ2...µp

and Σ =

σ11 σ12 · · · σ1pσ21 σ22 · · · σ2p

... ... . . . ...σp1 σp2 · · · σpp

is the p× p population covariance matrix. This is the kernel

of the MVN distribution.

109


• The quadratic form (x − µ)′Σ−1(x − µ) in the kernel is astatistical distance measure, of the type we described ear-lier. For any value of x, the quadratic form gives the squaredstatistical distance of x from µ accounting for the fact thatthe variances of the p variables may be different and that thevariables may be correlated.

• This quadratic form is often referred to as Mahalanobisdistance.

• The density function of the MVN distribution is:

f(x) =1

(2π)p/2|Σ|1/2exp[−

1

2(x− µ)′Σ−1(x− µ)],

where the normalizing constant (2π)p/2|Σ|1/2 makes thevolume under the MVN density equal to 1.

110


• We use the notation

X ∼ Np(µ,Σ) or X ∼MVN(µ,Σ)

to indicate that the p-dimensional random vector X has

a multivariate normal distribution with mean vector µ and

covariance matrix Σ.

111

Example: Bivariate Normal Densityσ11 = σ22, and ρ12 = 0

112

Example: Bivariate Normal Densityσ11 = σ22, and ρ12 > 0

113

Density Contours

• The x values that yield a constant height for the density formellipsoids centered at µ.

• The MVN density is constant on surfaces or contours where

(x− µ)′Σ−1(x− µ) = c2.

• Definition of a constant probability density contour is all x’sthat satisfy the expression above.

• The axes of the ellipses are in the directions of theeigenvectors of Σ and the length of the j − th longest axis isproportional to (

√λj), where λj is the eigenvalue associated

with the j − th eigenvector of Σ.

114

Density Contours• Recall that if (λj, ej) is an eigenvalue-eigenvector pair for Σ

and Σ is positive definite, then (λ−1j , ej) is an eigenvalue-

eigenvector pair of Σ−1.

• The jth axis is ±c√λjej, for j = 1, ..., p.

• We show later that if

c2 = χ2p(α),

where χ2p(α) is the upper (100α)th percentile of a χ2 distri-

bution with p degrees of freedom, then the probability is 1−αthat the value of a random vector will be inside the ellipsoiddefined by

(x− µ)′Σ−1(x− µ) ≤ χ2p(α)

115

Bivariate Normal Density Contours

• Eigenvalues and eigenvectors of Σ are obtained from

|Σ− λI| = 0. Using σ12 = σ21 = ρ√σ11√σ22, we have

0 =

∣∣∣∣∣[σ11 σ12σ12 σ22

]−[λ 00 λ

] ∣∣∣∣∣=

∣∣∣∣∣ σ11 − λ ρ√σ11√σ22

ρ√σ11√σ22 σ22 − λ

∣∣∣∣∣= (σ11 − λ)(σ22 − λ)− ρ2σ11σ22

= λ2 − (σ11 + σ22)λ+ σ11σ22(1− ρ2).

116


• Solutions to this quadratic equation are:

λ1 =1

2

[σ11 + σ22 +

√(σ11 + σ22)2 − 4σ11σ22(1− ρ2)

]

λ2 =1

2

[σ11 + σ22 −

√(σ11 + σ22)2 − 4σ11σ22(1− ρ2)

]

Solving quadratic equations:

ax2 + bx+ c = 0

The solutions are

x =−b+

√b2 − 4ac

2aand x =

−b−√b2 − 4ac

2a117


• An eigenvector associated with λ1 satisfies[σ11 σ12σ12 σ11

] [e11e12

]= λ1

[e11e12

],

where eij denotes the jth element in the ith eigenvector. We

must solve

(σ11 − λ1)e11 + ρ√σ11√σ22e12 = 0

(σ22 − λ1)e12 + ρ√σ11√σ22e11 = 0

118


Solutions:

e1 =

[e11e12

]=

[ddb

]

where d is any scalar and

b =σ22 − σ11 +

√(σ11 + σ22)2 − 4σ11σ22(1− ρ2)

2ρ√σ11√σ22

119


Choose d to satisfy 1 = e′1e1 = d2(1 + b2). Then

d =1√

1 + b2or d =

−1√1 + b2

and

e1 =

1√

1+b2

b√1+b2

or

−1√1+b2

−b√1+b2

120


The eigenvector corresponding to λ2 is

e2 =

1√1+c2

c√1+c2

or

−1√1+c2

−c√1+c2

where

c =σ22 − σ11 −

√(σ11 + σ22)2 − 4σ11σ22(1− ρ2)

2ρ√σ11√σ22

These eigenvectors have the following properties:

• ||ei|| =√

e′iei = 1

• e′iej = 0 for i 6= j

121

Bivariate Normal Contours (σ11 = σ22)

• When σ11 = σ22, the formulas simplify to

λ1 = σ11(1+ρ) = σ11+σ12 and λ2 = σ11(1−ρ) = σ11−σ12

e1 =

1√2

1√2

, e2 =

1√2

− 1√2

.

• If σ12 > 0, the major axis of the ellipse will be in the direction

of the 45o line. The actual value of σ12 does not matter. If

σ12 < 0, then the major axis will be perpendicular to the 45o

line.

122

Bivariate Normal Contours (σ11 = σ22)

123

More Bivariate Normal Examples

• 50% and 90% contours of two bivariate normal densities.

Density is the highest when x = µ.

124

Central (1− α)× 100% Region of aBivariate Normal Distribution

• The ratio of the lengths of the major and minor axes is

Length of major axis

Length of minor axis=

√λ1√λ2

• If 1 − α is the probability that a randomly selected memberof the population is observed inside the ellipse, then the half-length of the axes are given by√

χ22(α)

√λi

• This is the smallest region that has probability 1− α ofcontaining a randomly selected member of the population

125


• The area of the ellipse containing the central (1−α)×100%

of a bivariate nornal population is

area = πχ22(α)

√λ1

√λ2 = πχ2

2(α)|Σ|1/2

• Note that

det(Σ) = det

σ11 σ12

σ21 σ22

= det

σ11 ρσ1σ2

ρσ1σ2 σ22

= σ11σ22(1−ρ2)

126


• For fixed variances, σ11 and σ22, area of the ellipse is largestwhen ρ = 0

• The area becomes smaller as ρ approaches 1 or as ρapproches −1.

• For σ11 = σ22 and ρ = 0 the contours of constant densityare concentric circles and λ1 = λ2

• For σ11 > σ22 the axes of the ellipse are parallel to thecoordinate axes, with the major axis parallel to the horizontalaxis.

• For σ22 > σ11 the axes of the ellipse are parallel to thecoordinate axes, with the major axis parallel to the verticalaxis.

127

Central (1− α)× 100% Region of aMultivariate Normal Distribution

• For a p-dimensional normal distribution, the smallest region

such that there is probability 1− α that a randomly selected

observation will fall in the region is

– a p-dimensional ellipsoid

– with hypervolume

2πp/2

pΓ(p/2)

[χ2p(α)

]p/2|Σ|1/2

where Γ(·) is the gamma function

128

Gamma Function

Γ(p

2

)=(p

2− 1

)(p

2− 2

)· · · (2)(1)

when p is an even integer, and

Γ(p

2) =

(p− 2)(p− 4) · · · (3)(1)

2(p−1)/2

√π

when p is an odd integer

129

Overall Measures of Variability

• Generalized variance:

|Σ| = λ1λ2 · · ·λp

• Generalized standard deviation

|Σ|1/2 =√λ1λ2 · · ·λp

• Total variance

trace(Σ) = σ11 + σ22 + · · ·+ σpp

= λ1 + λ2 + · · ·+ λp

130

Sample Estimates: Air Samples

Xi =

[Xi1 ← CO concentrationXi2 ← N2O concentration

]

X1 =

[7

12

]X2 =

[49

]X3 =

[45

]X4 =

[58

]X5 =

[48

]

131

Sample Estimates: Air Samples

• Sample mean vector:

X̄ =

[4.88.4

]

• Sample covariance matrix:

S =

[1.7 2.62.6 6.3

]

• Sample correlation: r12=0.7945

• Generalized variance: |S| = 3.95

• Total variance; trace(S) = 1.7 + 6.3 = 8.0

132

Example 3.8

S =

[5 44 5

]S =

[5 −4−4 5

]S =

[3 00 3

]r12 = 0.8 r12 = −0.8 r12 = 0

|S| = 9 |S| = 9 |S| = 9

tr(S) = 10 tr(S) = 10 tr(S) = 6

133

multivariate normal distribution { i · pdf filemultivariate normal distribution { i ......

Documents