multivariate normal distribution { i · pdf filemultivariate normal distribution { i ......
TRANSCRIPT
Multivariate Normal Distribution – I
• We will almost always assume that the joint distribution ofthe p×1 vectors of measurements on each sample unit is thep-dimensional multivariate normal distribution.
• The MVN assumption is often appropriate:
– Variables can sometimes be assumed to be multivariatenormal (perhaps after transformation)
– Central limit theorem tells us that distribution of manymultivariate sample statistics is approximately normal, re-gardless of the form of the population distribution.
• As a bonus, the MVN assumption leads to tractable results(but mathematical convenience should not be the reason forchoosing the MVN as the probability model).
107
Multivariate Normal Distribution
• The MVN is a generalization of the univariate normal
distribution for the case p ≥ 2.
• Recall that if X is normal with mean µ and variance σ2,
and its density function is given by
f(x) =1
(2πσ2)1/2exp[−
1
2σ2(x− µ)2], −∞ < x <∞.
• We can write the kernel of the density as:
exp[−1
2σ2(x− µ)2] = exp[−
1
2(x− µ)′(σ2)−1(x− µ)].
108
Multivariate Normal Distribution
Now consider a p× 1 random vector X = [X1, X2, . . . , Xp]′.
The kernel shown above generalizes to
exp[−1
2(x− µ)′Σ−1(x− µ)],
where
x =
x1x2...xp
and µ =
µ1µ2...µp
and Σ =
σ11 σ12 · · · σ1pσ21 σ22 · · · σ2p
... ... . . . ...σp1 σp2 · · · σpp
is the p× p population covariance matrix. This is the kernel
of the MVN distribution.
109
Multivariate Normal Distribution
• The quadratic form (x − µ)′Σ−1(x − µ) in the kernel is astatistical distance measure, of the type we described ear-lier. For any value of x, the quadratic form gives the squaredstatistical distance of x from µ accounting for the fact thatthe variances of the p variables may be different and that thevariables may be correlated.
• This quadratic form is often referred to as Mahalanobisdistance.
• The density function of the MVN distribution is:
f(x) =1
(2π)p/2|Σ|1/2exp[−
1
2(x− µ)′Σ−1(x− µ)],
where the normalizing constant (2π)p/2|Σ|1/2 makes thevolume under the MVN density equal to 1.
110
Multivariate Normal Distribution
• We use the notation
X ∼ Np(µ,Σ) or X ∼MVN(µ,Σ)
to indicate that the p-dimensional random vector X has
a multivariate normal distribution with mean vector µ and
covariance matrix Σ.
111
Example: Bivariate Normal Densityσ11 = σ22, and ρ12 = 0
112
Example: Bivariate Normal Densityσ11 = σ22, and ρ12 > 0
113
Density Contours
• The x values that yield a constant height for the density formellipsoids centered at µ.
• The MVN density is constant on surfaces or contours where
(x− µ)′Σ−1(x− µ) = c2.
• Definition of a constant probability density contour is all x’sthat satisfy the expression above.
• The axes of the ellipses are in the directions of theeigenvectors of Σ and the length of the j − th longest axis isproportional to (
√λj), where λj is the eigenvalue associated
with the j − th eigenvector of Σ.
114
Density Contours• Recall that if (λj, ej) is an eigenvalue-eigenvector pair for Σ
and Σ is positive definite, then (λ−1j , ej) is an eigenvalue-
eigenvector pair of Σ−1.
• The jth axis is ±c√λjej, for j = 1, ..., p.
• We show later that if
c2 = χ2p(α),
where χ2p(α) is the upper (100α)th percentile of a χ2 distri-
bution with p degrees of freedom, then the probability is 1−αthat the value of a random vector will be inside the ellipsoiddefined by
(x− µ)′Σ−1(x− µ) ≤ χ2p(α)
115
Bivariate Normal Density Contours
• Eigenvalues and eigenvectors of Σ are obtained from
|Σ− λI| = 0. Using σ12 = σ21 = ρ√σ11√σ22, we have
0 =
∣∣∣∣∣[σ11 σ12σ12 σ22
]−[λ 00 λ
] ∣∣∣∣∣=
∣∣∣∣∣ σ11 − λ ρ√σ11√σ22
ρ√σ11√σ22 σ22 − λ
∣∣∣∣∣= (σ11 − λ)(σ22 − λ)− ρ2σ11σ22
= λ2 − (σ11 + σ22)λ+ σ11σ22(1− ρ2).
116
Bivariate Normal Density Contours
• Solutions to this quadratic equation are:
λ1 =1
2
[σ11 + σ22 +
√(σ11 + σ22)2 − 4σ11σ22(1− ρ2)
]
λ2 =1
2
[σ11 + σ22 −
√(σ11 + σ22)2 − 4σ11σ22(1− ρ2)
]
Solving quadratic equations:
ax2 + bx+ c = 0
The solutions are
x =−b+
√b2 − 4ac
2aand x =
−b−√b2 − 4ac
2a117
Bivariate Normal Density Contours
• An eigenvector associated with λ1 satisfies[σ11 σ12σ12 σ11
] [e11e12
]= λ1
[e11e12
],
where eij denotes the jth element in the ith eigenvector. We
must solve
(σ11 − λ1)e11 + ρ√σ11√σ22e12 = 0
(σ22 − λ1)e12 + ρ√σ11√σ22e11 = 0
118
Bivariate Normal Density Contours
Solutions:
e1 =
[e11e12
]=
[ddb
]
where d is any scalar and
b =σ22 − σ11 +
√(σ11 + σ22)2 − 4σ11σ22(1− ρ2)
2ρ√σ11√σ22
119
Bivariate Normal Density Contours
Choose d to satisfy 1 = e′1e1 = d2(1 + b2). Then
d =1√
1 + b2or d =
−1√1 + b2
and
e1 =
1√
1+b2
b√1+b2
or
−1√1+b2
−b√1+b2
120
Bivariate Normal Density Contours
The eigenvector corresponding to λ2 is
e2 =
1√1+c2
c√1+c2
or
−1√1+c2
−c√1+c2
where
c =σ22 − σ11 −
√(σ11 + σ22)2 − 4σ11σ22(1− ρ2)
2ρ√σ11√σ22
These eigenvectors have the following properties:
• ||ei|| =√
e′iei = 1
• e′iej = 0 for i 6= j
121
Bivariate Normal Contours (σ11 = σ22)
• When σ11 = σ22, the formulas simplify to
λ1 = σ11(1+ρ) = σ11+σ12 and λ2 = σ11(1−ρ) = σ11−σ12
e1 =
1√2
1√2
, e2 =
1√2
− 1√2
.
• If σ12 > 0, the major axis of the ellipse will be in the direction
of the 45o line. The actual value of σ12 does not matter. If
σ12 < 0, then the major axis will be perpendicular to the 45o
line.
122
Bivariate Normal Contours (σ11 = σ22)
123
More Bivariate Normal Examples
• 50% and 90% contours of two bivariate normal densities.
Density is the highest when x = µ.
124
Central (1− α)× 100% Region of aBivariate Normal Distribution
• The ratio of the lengths of the major and minor axes is
Length of major axis
Length of minor axis=
√λ1√λ2
• If 1 − α is the probability that a randomly selected memberof the population is observed inside the ellipse, then the half-length of the axes are given by√
χ22(α)
√λi
• This is the smallest region that has probability 1− α ofcontaining a randomly selected member of the population
125
Central (1− α)× 100% Region of aBivariate Normal Distribution
• The area of the ellipse containing the central (1−α)×100%
of a bivariate nornal population is
area = πχ22(α)
√λ1
√λ2 = πχ2
2(α)|Σ|1/2
• Note that
det(Σ) = det
σ11 σ12
σ21 σ22
= det
σ11 ρσ1σ2
ρσ1σ2 σ22
= σ11σ22(1−ρ2)
126
Central (1− α)× 100% Region of aBivariate Normal Distribution
• For fixed variances, σ11 and σ22, area of the ellipse is largestwhen ρ = 0
• The area becomes smaller as ρ approaches 1 or as ρapproches −1.
• For σ11 = σ22 and ρ = 0 the contours of constant densityare concentric circles and λ1 = λ2
• For σ11 > σ22 the axes of the ellipse are parallel to thecoordinate axes, with the major axis parallel to the horizontalaxis.
• For σ22 > σ11 the axes of the ellipse are parallel to thecoordinate axes, with the major axis parallel to the verticalaxis.
127
Central (1− α)× 100% Region of aMultivariate Normal Distribution
• For a p-dimensional normal distribution, the smallest region
such that there is probability 1− α that a randomly selected
observation will fall in the region is
– a p-dimensional ellipsoid
– with hypervolume
2πp/2
pΓ(p/2)
[χ2p(α)
]p/2|Σ|1/2
where Γ(·) is the gamma function
128
Gamma Function
Γ(p
2
)=(p
2− 1
)(p
2− 2
)· · · (2)(1)
when p is an even integer, and
Γ(p
2) =
(p− 2)(p− 4) · · · (3)(1)
2(p−1)/2
√π
when p is an odd integer
129
Overall Measures of Variability
• Generalized variance:
|Σ| = λ1λ2 · · ·λp
• Generalized standard deviation
|Σ|1/2 =√λ1λ2 · · ·λp
• Total variance
trace(Σ) = σ11 + σ22 + · · ·+ σpp
= λ1 + λ2 + · · ·+ λp
130
Sample Estimates: Air Samples
Xi =
[Xi1 ← CO concentrationXi2 ← N2O concentration
]
X1 =
[7
12
]X2 =
[49
]X3 =
[45
]X4 =
[58
]X5 =
[48
]
131
Sample Estimates: Air Samples
• Sample mean vector:
X̄ =
[4.88.4
]
• Sample covariance matrix:
S =
[1.7 2.62.6 6.3
]
• Sample correlation: r12=0.7945
• Generalized variance: |S| = 3.95
• Total variance; trace(S) = 1.7 + 6.3 = 8.0
132
Example 3.8
S =
[5 44 5
]S =
[5 −4−4 5
]S =
[3 00 3
]r12 = 0.8 r12 = −0.8 r12 = 0
|S| = 9 |S| = 9 |S| = 9
tr(S) = 10 tr(S) = 10 tr(S) = 6
133