generalized variance multivariate normal distribution
TRANSCRIPT
![Page 1: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/1.jpg)
Lecture #4 - 9/14/2005 Slide 1 of 57
Generalized VarianceMultivariate Normal Distribution
Lecture 4September 14, 2005Multivariate Analysis
![Page 2: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/2.jpg)
Overview
Last Time
Today’s Lecture
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 2 of 57
Last Time
■ Matrices and vectors.
◆ Eigenvalues.
◆ Eigenvectors.
◆ Determinants.
■ Basic descriptive statistics using matrices:◆ Mean vectors.
◆ Covariance Matrices.
◆ Correlation Matrices.
![Page 3: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/3.jpg)
Overview
Last Time
Today’s Lecture
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 3 of 57
Today’s Lecture
■ Generalized Variance (Chapter 3, Section 4).
■ Multivariate Normal Distribution (Chapter 4).
![Page 4: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/4.jpg)
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 4 of 57
Sample Covariance Matrix
■ Recall the sample covariance matrix:
S =1
n − 1
n∑
i=1
(xi − x)2 =1
n − 1(X − 1x′)′(X − 1x′).
■ The overall sample covariance matrix gives a picture of thecovariation between each variable in the sample.
■ A single-number summary of this matrix can be provided bythe generalized variance.
■ Although the generalized variance is not used frequently, youwill see in later slides that this value is part of the multivariatenormal distribution.
![Page 5: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/5.jpg)
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 5 of 57
Generalized Sample Variance
■ Generalized Sample Variance is computed by |S| (thedeterminant of the sample covariance matrix).
■ To begin, imagine a multidimensional cube (an ellipsoid) thatrepresents the end points of all the column vectors in thesample matrix X.
◆ Covariance matrix describes the overall spread of thatshape in each direction (if we could plot it).
◆ The equation (x − x)S−1(x − x) = a2 describes anequation where all points are equal distant from the meanx (i.e., it will form an ellipsoid).
◆ That ellipsoid will have axes proportional to the squareroots of the eigenvalues of S.
◆ The volume of that ellipsoid is equal to |S|1/2 (notice whathappens if we have a zero eigenvalue?
![Page 6: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/6.jpg)
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 6 of 57
Generalized Sample Variance With R
■ The generalized sample variance is dependent upon thescale of the variables in the sample.
■ Because of the scale of these variables, the generalizedsample variance can be hard to interpret (much likevariances and covariances).
◆ The scale of a single variable may have a disproportionateimpact on the generalized variance (e.g., your sample hasthe variables of GPA and income in US dollars).
![Page 7: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/7.jpg)
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 7 of 57
Generalized Sample Variance With R
■ Rather than using |S|, the sample correlation matrix can beused |R|.
■ The interpretation of the GSV is the same - the volume of theellipsoid that is formed by the standardized variables in thesample.
■ The difference in magnitude is proportional to the product ofthe variances of the variables in the sample.
■ SAS Example #1...
![Page 8: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/8.jpg)
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 8 of 57
Total Sample Variance
■ Another way to characterize the sample variance is with thetotal variance.
■ Total variance equals tr(S) = s11 + s22 + . . . + spp.
■ Describes the variability of the data without taking in toaccount the covariances.
■ Like generalized variance, the total sample variance reflectsthe overall spread of the data.
■ Many multivariate techniques refer use total sample variancein computation of variance accounted for.
![Page 9: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/9.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 9 of 57
Multivariate Normal Distribution
■ The generalization of the well-known normal distribution tomultiple variables is called the multivariate normaldistribution (MVN).
■ Many multivariate techniques rely on this distribution in somemanner.
■ Although real data may never come from a true MVN, theMVN provides a robust approximation, and has many nicemathematical properties.
■ Furthermore, because of the central limit theorem, manymultivariate statistics converge to the MVN distribution as thesample size increases.
![Page 10: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/10.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 10 of 57
Univariate Normal Distribution
■ The univariate normal distribution function is:
f(x) =1√
2πσ2e−[(x−µ)/σ]2/2
■ The mean is µ.
■ The variance is σ2.
■ The standard deviation is σ.
■ Standard notation for normal distributions is N(µ, σ2), whichwill be extended for the MVN distribution.
![Page 11: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/11.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 11 of 57
Univariate Normal Distribution
N(0, 1)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
Univariate Normal Distribution
x
f(x)
![Page 12: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/12.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 12 of 57
Univariate Normal Distribution
N(0, 2)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
Univariate Normal Distribution
x
f(x)
![Page 13: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/13.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 13 of 57
Univariate Normal Distribution
N(3, 1)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
Univariate Normal Distribution
x
f(x)
![Page 14: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/14.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 14 of 57
UVN - Notes
■ Recall that the area under the curve for the univariate normaldistribution is a function of the variance/standard deviation.
■ In particular:
P (µ − σ ≤ X ≤ µ + σ) = 0.683
P (µ − 2σ ≤ X ≤ µ + 2σ) = 0.954
■ Also note the term in the exponent:
(
(x − µ)
σ
)2
= (x − µ)(σ2)−1(x − µ)
■ This is the square of the distance from x to µ in standarddeviation units, and will be generalized for the MVN.
![Page 15: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/15.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 15 of 57
MVN
■ The multivariate normal distribution function is:
f(x) =1
(2π)p/2|Σ|1/2e−(x−µ)Σ
−1
(x−µ)/2
■ The mean vector is µ.
■ The covariance matrix is Σ.
■ Standard notation for multivariate normal distributions isNp(µ,Σ).
■ Visualizing the MVN is difficult for more than two dimensions,so I will demonstrate some plots with two variables - thebivariate normal distribution.
![Page 16: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/16.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 16 of 57
Bivariate Normal Plot #1
µ =
[
0
0
]
,Σ =
[
1 0
0 1
]
−4
−2
0
2
4
−4
−2
0
2
40
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
![Page 17: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/17.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 17 of 57
Bivariate Normal Plot #1a
µ =
[
0
0
]
,Σ =
[
1 0
0 1
]
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
![Page 18: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/18.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 18 of 57
Bivariate Normal Plot #2
µ =
[
0
0
]
,Σ =
[
1 0.5
0.5 1
]
−4
−2
0
2
4
−4
−2
0
2
40
0.05
0.1
0.15
0.2
![Page 19: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/19.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 19 of 57
Bivariate Normal Plot #2
µ =
[
0
0
]
,Σ =
[
1 0.5
0.5 1
]
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
![Page 20: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/20.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 20 of 57
MVN Contours
■ The lines of the contour plots denote places of equalprobability mass for the MVN distribution.
■ These contours can be constructed from the eigenvaluesand eigenvectors of the covariance matrix.
◆ The direction of the ellipse axes are in the direction of theeigenvalues.
◆ The length of the ellipse axes are proportional to theconstant times the eigenvector.
■ Specifically:
(x − µ)Σ−1(x − µ) = c2
has ellipsoids centered at µ, and has axes ±c√
λiei.
![Page 21: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/21.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 21 of 57
MVN Contours, Continued
■ Contours are useful because they provide confidenceregions for data points from the MVN distribution.
■ The multivariate analog of a confidence interval is given byan ellipsoid, where c is from the Chi-Squared distributionwith p degrees of freedom.
■ Specifically:
(x − µ)Σ−1(x − µ) = χ2p(α)
provides the confidence region containing 1 − α of theprobability mass of the MVN distribution.
![Page 22: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/22.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 22 of 57
MVN Contour Example
■ Imagine we had a bivariate normal distribution with:
µ =
[
0
0
]
,Σ =
[
1 0.5
0.5 1
]
■ The covariance matrix has eigenvalues and eigenvectors:
λ =
[
1.5
0.5
]
, E =
[
0.707 −0.707
0.707 0.707
]
■ We want to find a contour where 95% of the probability willfall, corresponding to χ2
2(0.05) = 5.99
![Page 23: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/23.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 23 of 57
MVN Contour Example
■ This contour will be centered at µ.
■ Axis 1:
µ ±√
5.99 × 1.5
[
0.707
0.707
]
=
[
2.12
2.12
]
,
[
−2.12
−2.12
]
■ Axis 2:
µ ±√
5.99 × 0.5
[
−0.707
0.707
]
=
[
−1.22
1.22
]
,
[
1.22
−1.22
]
![Page 24: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/24.jpg)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 24 of 57
MVN Properties
■ The MVN distribution has some convenient properties.
■ If X has a multivariate normal distribution, then:
1. Linear combinations of X are normally distributed.
2. All subsets of the components of X have a MVNdistribution.
3. Zero covariance implies that the correspondingcomponents are independently distributed.
4. The conditional distributions of the components are MVN.
![Page 25: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/25.jpg)
Overview
Generalized Variance
MVN
Distributions
UVN CLT
Multi CLT
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 25 of 57
Distribution of x and S
Recall back in Univariate statistics you discussed the CentralLimit Theorem (CLT)
It stated that, if the set of n observations x1, x2, . . . , xn werenormal or not...
■ The distribution of x would be normal with mean equal to µand variance σ2/n
■ We were also told that (n − 1)s2/σ2 had a Chi-Squaredistribution with n − 1 degrees of freedom
■ Note: We ended up using these pieces of information forhypothesis testing such as t-test and ANOVA.
![Page 26: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/26.jpg)
Overview
Generalized Variance
MVN
Distributions
UVN CLT
Multi CLT
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 26 of 57
Distribution of x and S
We also have a Multivariate Central Limit Theorem (CLT)
It states that, if the set of n observations x1, x2, . . . , xn aremultivariate normal or not...
■ The distribution of x would be normal with mean equal to µ
and variance/covariance matrix Σ/n
■ We are also told that (n − 1)S will have a Wishartdistribution, Wp(n − 1,Σ), with n − 1 degrees of freedom
◆ This is the multivariate analogue to a Chi-Squaredistribution.
■ Note: We will end up using some of this information formultivariate hypothesis testing.
![Page 27: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/27.jpg)
Overview
Generalized Variance
MVN
Distributions
UVN CLT
Multi CLT
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 27 of 57
Distribution of x and S
■ Therefore, let x1, x2, . . . , xn be independent observationsfrom a population with mean µ and covariance Σ
■ The following are true:
◆√
n(
X − µ)
is approximately Np(0,Σ).
◆ n (X − µ)′ S−1 (X − µ) is approximately χ2
p.
![Page 28: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/28.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 28 of 57
Assessing Normality
■ Recall from earlier that IF the data have a Multivariatenormal distribution then all of the previously discussedproperties will hold.
■ So we will want to have at least a set of test/methods toassess Multivariate Normality.
◆ We said that if all marginal distributions are NOT normalthen the joint distribution can not be MVN. So we will firsttalk about the assessing normality
◆ We also said even if all marginals are normally distributedthe joint is not necessarily MVN, So we will then assessMultivariate Normality.
![Page 29: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/29.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 29 of 57
Assessing Normality
■ We will find that there are two ways to assessnormality/MVN.
1. By comparing the distribution of your observations (orsome transformation of your observations) to some knowndistribution. (These are commonly called Q-Q plots)
2. By computing some set of statistics and obtaining ap-value (i.e., compute a statistic with a known distributionand determine how extreme the statistic is compared to anull hypothesis).
![Page 30: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/30.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 30 of 57
Assessing Univariate Normality
We begin with Assessing Univariate Normality using a Q-Qplot.■ A Q-Q plot is a plot that matches the Quantiles of the
observed data with the Quantiles of a specific distribution.
■ A Quantile (commonly called a percentile) is that value suchthat a specific proportion p of the population will score at orbelow.
◆ For example the .5 quantile of a N(0,1) is 0.
■ In our case the Quantiles of a specific distribution will be anormal, N(0, 1).
◆ It could be a N(x, s2x), if preferred.
■ There should be a linear relationship between the quantilesof the observed data with their theoretical quantiles(assuming the distribution) if they follow the samedistribution.
![Page 31: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/31.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 31 of 57
Constructing a Q-Q plot
Lets assume that we have n observations x1, x2, . . . , xn. Toconstruct a Q-Q plot we:
1. Order the observations from smallest to largest (i.e.,x(1) ≤ y(2) ≤ . . . ≤ x(n) ).
2. Next we define the ith point, x(i), as the (i − .5)/n quantile.
■ We could use i/n but can cause problems.
3. Based on a N(0, 1) distribution we compute the quantilevalues q1, q2, . . . , qn (this is typically done using a table orcomputer).
4. Finally plot (x(i), qi), and if they follow the same distribution(Normal) they should form a line.
![Page 32: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/32.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 32 of 57
Example Q-Q plot
Lets assume that we have 5 observations: 3, 6, 4, 5, 2:
First we order them
y(i) (i − .5)/n qi
23456
![Page 33: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/33.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 33 of 57
Example Q-Q plot
Lets assume that we have 5 observations: 3, 6, 4, 5, 2:
Next compute quantiles
y(i) (i − .5)/n qi
2 (1 − .5)/5 = .1
3 (2 − .5)/5 = .3
4 (3 − .5)/5 = .5
5 (4 − .5)/5 = .7
6 (5 − .5)/5 = .9
![Page 34: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/34.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 34 of 57
Example Q-Q plot
Lets assume that we have 5 observations: 3, 6, 4, 5, 2:
Finally compute quantiles values assuming N(0, 1) (i.e., this isa z-score)
y(i) (i − .5)/n qi
2 (1 − .5)/5 = .1 -1.283 (2 − .5)/5 = .3 -0.524 (3 − .5)/5 = .5 0.005 (4 − .5)/5 = .7 0.526 (5 − .5)/5 = .9 1.28
and plot
![Page 35: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/35.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 35 of 57
Example Q-Q plot
Notice how it follows nearly a straight line
Figure 1: Q-Q plot
Q
1.5 1.0 .5 0.0 -.5 -1.0 -1.5
Y
7
6
5
4
3
2
1
![Page 36: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/36.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 36 of 57
Three Tests for Normality
■ The remaining methods are tests for normality
■ In each case it is computing a statistic and checking forsignificance.
■ I will mention these because in some journals this ismentioned.
■ However, because we will be mostly interested in MVN onecould quickly check for normality and the if happy, test forMVN.
![Page 37: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/37.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 37 of 57
Test 1
Begin by computing Skewness and Kurtosis
Skewness,√
b1
√
b1 =
√n
∑ni=1(yi − y)3
[∑n
i=1(yi − y)2]3/2
Kurtosis, b2
b2 =n
∑ni=1(yi − y)4
[∑n
i=1(yi − y)2]2
![Page 38: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/38.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 38 of 57
Other Tests
Other tests can be gathered in SAS (note the null hypothesis isalways that the data is normally distributed).proc univariate data=mydata normal plot;var x1-x5;run;
![Page 39: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/39.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 39 of 57
Assessing MVN
■ Notice that many of the procedures that we discussed(Including the Q-Q plot) required that we rank order theobservations.
■ If instead of single observations, we have a set of nobservations of p variables (x1, x1, . . . , xn) ordering allobservations is much more difficult.
■ In fact, the book mentions that any test for MVN is far moredifficult and often has little power because of the number ofobservations on p-space, but some check should be done.
![Page 40: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/40.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 40 of 57
Scatter Plots
Here we will discuss two graphical methods
■ Possibly the easiest method to assess MVN is to use scatterplots.
■ If there are not a large number of variables consider lookingscatter plots of all pairs of variables.
◆ Recall that one property of a MVN is that all subsets ofvariables should also be multivariate (in this casebivariate) normal.
◆ This means any relationship between variables should belinear and otherwise should have a random pattern
■ Could also look at all sets of three variables should also havetri-variate normal distributions.
![Page 41: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/41.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 41 of 57
Q-Q Plots
As an alternative (for example if there are too many variableswe could use a Q-Q plot
■ Our Q-Q plot will be based on D2 = (x − x)′S−1(y − y).Note, x is each entity.
◆ Recall that D2 = (x − µ)′Σ−1(x − µ) has a χ2p distribution
with p degrees of freedom.
◆ We could use that to compute a Q-Q plot (i.e., use the χ2p
distribution to get the values of qi).
◆ However in estimating the mean vector and variancecovariance matrix this plot could be misleading.
■ Instead, some suggest using a function of D2.
![Page 42: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/42.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 42 of 57
Q-Q Plots
We will define a new variable ui, where
ui = D2i .
1. Order the observations from smallest to largest (i.e.,u(1) ≤ u(2) ≤ . . . ≤ u(n) ).
2. Next we define the ith point, u(i), as the (i − .5)/n quantile.
3. Based on a χ2p distribution, we compute the quantile values
q1, q2, . . . , qn (this is typically done using a table orcomputer) where:
![Page 43: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/43.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 43 of 57
Tests
■ Tests based on the Skewness and Kurtosis also exist.
■ These are a generalizations of the univariate tests.
■ You will not be tested on this, but know where it is just incase you ever need a test.
![Page 44: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/44.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 44 of 57
Outliers
■ Detecting outliers is difficult in Multiple dimensions.
◆ Can’t simply order them and look for extreme values.
◆ May not be able to see in only 2-D plots.
◆ Could be different degrees of recording errors.
■ Here I discuss one method for detecting Outliers
![Page 45: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/45.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 45 of 57
Outliers Using D2
Wilk’s statistics was designed to detect a single outlier:■ Compute w where
w = maxi
|(n − 2)S−i||(n − 1)S|
■ To simplify it can be shown that w also equals:
w = 1 −nD2
(n)
(n − 1)2
■ The distribution for w is the F distribution, however the onlyunknown is D2
(n) and so we can just make decisions basedon it.
■ Therefore a test can be made that is based on the D2, whichis also computed to assess MVN
![Page 46: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/46.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 46 of 57
Example
Here we consider three variables: time it takes people to get toclass, number of years in school, overall happiness.
■ Just as a generic procedure I would
1. Evaluate each variable individually
2. Evaluate each pair (if this is reasonable)
3. Evaluate MVN using a Q-Q plot
4. Evaluate the reasonableness of any outliers (this can alsobe done in the previous steps.
![Page 47: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/47.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 47 of 57
Example Q-Q plots
First we us a Q-Q plot for each variable.
Figure 2: Q-Q plot of Time
![Page 48: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/48.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 48 of 57
Example Q-Q plots
Figure 3: Q-Q plot of Year
![Page 49: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/49.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 49 of 57
Example Q-Q plots
Figure 4: Q-Q plot of Happy
![Page 50: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/50.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 50 of 57
Example Bivariate plots
■ So if we decide that everything looks OK then we move on tothe bivariate plots.
■ If they do not look OK we can consider using some kind ofrobust analyses or consider a transformation.
![Page 51: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/51.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 51 of 57
Example Bivariate plots
Figure 5: Time versus Year
![Page 52: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/52.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 52 of 57
Example Bivariate plots
Figure 6: Time versus Happy
![Page 53: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/53.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 53 of 57
Example Bivariate plots
Figure 7: Year versus Happy
![Page 54: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/54.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 54 of 57
MV Q-Q Plot
Finally, the Multivariate Q-Q plot.
Figure 8: Year versus Happy
■ We can also look for an outlier.■ Largest D2 = 13.4
■ Compare it to Table A.6 for 30 observation and 3 variables.◆ Max D2 is 12.24 with p-value=.05 and 14.14 with
p-value=.01.
![Page 55: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/55.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
TransformationsTransformations to Near
Normality
Wrapping Up
Lecture #4 - 9/14/2005 Slide 55 of 57
Transformations to Near Normality
A few types of data must be transformed prior to doinganalyses that assume data comes from a MVN distribution:
■ Counts, y, are transformed with√
y.
■ Proportions, p, are transformed with logit(p) = 12 log
(
p1−p
)
■ Correlations, r, are transformed with (Fisher’s)
z(r) = 12 log
(
1+r1−r
)
.
Variations of transformations exist, as do other techniques.
![Page 56: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/56.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Final Thought
Next Class
Lecture #4 - 9/14/2005 Slide 56 of 57
Final Thought
■ The multivariate normaldistribution is an analog tothe univariate normaldistribution.
■ The MVN distribution willplay a large role in theupcoming weeks.
■ We can finally put the background material to rest, and beginlearning some practical statistics.
![Page 57: Generalized Variance Multivariate Normal Distribution](https://reader034.vdocuments.us/reader034/viewer/2022042800/5895a5181a28abdd438bcb1b/html5/thumbnails/57.jpg)
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Final Thought
Next Class
Lecture #4 - 9/14/2005 Slide 57 of 57
Next Time
■ Statistical Analyses - Mean Vector Inference.