notes8 handout

8/2/2019 Notes8 Handout

1/10

Overview

Main topics in multivariate statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Main topics in multivariate statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Exploratory methods 4Graphics for multivariate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Principal component analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Possible uses of PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Possible uses of PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Factor analysis: idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Factor analysis: model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Factor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Linear discriminant analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Linear discriminant analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Cluster analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Multidimensional scaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

More formal methods 16

Normal distribution theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Tests of significance for multivariate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Canonical correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Remaining topics we did not cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1


2/10

Main topics in multivariate statistics

s We have data on several variables, there is some interdependence between the variables, and none ofthem is clearly the main variable of interest

s Methods that are mostly of exploratory nature:

x Graphics for multivariate data

x Principal component analysis (PCA)x Factor analysis

x Linear discriminant analysis (LDA)

x Cluster analysis

x Multidimensional scaling

x ...

2 / 20

Main topics in multivariate statistics

s More formal topics:

x Normal distribution theoryx Tests of significance for multivariate data

x Multivariate analysis of variance (MANOVA)

x Canonical correlation analysis

x ...

3 / 20

2


3/10

Exploratory methods 4 / 20

Graphics for multivariate data

s Goal: visualize multivariate data

s We covered:

x

Scatterplot matrix: pairs()x Star plots and segment plots: stars()

x Conditioning plots: coplot()

x Bi-plot of first two principal components: biplot()

s Other techniques:

x Interactive 3 dimensional plots

x Plots based on multidimensional scaling (more about this later)

x ...

5 / 20

Principal component analysis (PCA)

s Main idea:

x Start with variables X1, . . . , X p

x Find a rotation of these variables, say Y1, . . . , Y p (called principal components), so that:

s Y1, . . . , Y p are uncorrelated. Idea: they measure different dimensions of the data.

s Var(Y1) Var(Y2) . . . Var(Yp). Idea: Y1 is most important, then Y2, etc.

s Method is based on spectral decomposition of the covariance matrix

s No need to make distributional assumptions

6 / 20

3


4/10

Possible uses of PCA

s Interest in first principal component:

x Example: How to combine the scores on 5 different examinations to a total score? Since thefirst principal component maximizes the variance, it spreads out the scores as much as possible.

s Interest in 2nd - pth principal components:

x When all measurements are positively correlated, the first principal component is often somekind of average of the measurements (e.g., size of birds, severity index of psychiatricsymptoms).

x Then the other principal components give important information about the remaining pattern(e.g., shape of birds, pattern of psychiatric symptoms)

7 / 20

Possible uses of PCA

s Interest in first few principal components:

x Dimension reduction: summarize the data with a smaller number of variables, losing as little

information as possible.x Can be used for graphical representations of the data (bi-plot).

s Use PCA as input for regression analysis:

x Highly correlated explanatory variables are problematic in regression analysis.

x One can replace them by their principal components, which are uncorrelated by definition.

8 / 20

4


5/10

Factor analysis: idea

s Idea:

x In social sciences (e.g., psychology), it is often not possible to measure the variables of interestdirectly (e.g., intelligence, social class). Such variables are called latent variables or commonfactors.

x

Researchers examine such variables indirectly, by measuring variables that can be measured andthat are believed to be indicators of the latent variables of interest (e.g., examination scores onvarious tests)

x We want to relate the latent variables of interest to the measured variables

9 / 20

Factor analysis: model

s Multiple linear regression model:

x1 = 11f1 + + 1kfk + u1

x2 = 21f1 + + 2kfk + u2... =

...

xp = p1f1 + + pkfk + up

where

x x = (x1, . . . , xp) are the observed variables (random)

x f = (f1, . . . , f k) are the common factors (random)

x u = (u1, . . . , up) are the specific factors (random)

x ij are the factor loadings (constants)

s Note: f1, . . . , f k are not observed

s Main goal: estimate factor loadings

10 / 20

5


6/10

Factor analysis

s Assumptions:

x E(x) = 0 (if this is not the case, simply subtract the mean vector)

x E(f) = 0, Cov(f) = I

x E(u) = 0, Cov(ui, uj) = 0 for i = j

x Cov(f, u) = 0s Estimation:

x Under the above assumptions, Cov(x) = = +

x Two estimation methods: principal factor analysis and maximum likelihood

s Factor loadings are non-unique; factor rotation can be used to ease interpretation

11 / 20

Linear discriminant analysis

s Goal: Suppose that we have an np data matrix consisting ofg different groups. How can weclassify new observations into one of these groups? This is sometimes called supervised learning

s

Fisher:x Look for the linear function Xa which maximizes the ratio of the between-groups sum of

squares to the within-groups sum of squares.

x Compute average score (xi)a for each group i = 1, . . . , g.

x Compute the score xnewa for the new observation.

x Classify the new observation in group j if |xnewa (xj)a| < |xnewa (xi)

a| for all i = j.

12 / 20

6


7/10

Linear discriminant analysis

s Maximum likelihood:

x Suppose the exact distributions of the populations 1, . . . ,g are known

x Then the maximum likelihood discriminant rule is to allocate an observation x to thepopulation which gives the largest likelihood to x, i.e., to the population with the highest

density at the point x.x If the exact distributions are unknown, but we know the shape of the distributions, then we can

first estimate their parameters, and then use the above rule. This is the sample maximumlikelihood discriminant rule.

s For two groups from two multivariate normal distributions with the same covariance matrix, Fisherslinear discriminant analysis equals the maximum likelihood rule.

13 / 20

Cluster analysis

s We have multivariate data without group labels.

s We want to see if there are clusters in the data, i.e., groups of observations that are homogeneousand separated from the other groups. This is sometimes called unsupervised learning.

s Methods we discussed:

x Hierarchical clustering

x k-means clustering

x Model based clustering

s Possible applications:

x Marketing: find groups of customers with similar behavior

x Biology: classify plants or animals

x Internet: cluster text documents

14 / 20

7


8/10

Multidimensional scaling

s Not discussed in class

s Goal: Construct a map from a distance matrix, where the map should represent the distancesbetween the objects as accurate as possible.

s Possible applications:

x Psychology/sociology: subjects say how similar/different pairs of objects are. Multidimensionalscaling then creates a pictures showing the overall relationships between the subjects.

s Can be used to aid clustering

s See overhead slides and R-code

15 / 20

More formal methods 16 / 20

Normal distribution theorys Multivariate normal distribution

s Wishart distribution (for sample covariance matrix)

s Hotellings T2 distribution (for Mahalonobis distance, closely related to F-distribution)

17 / 20

8


9/10

Tests of significance for multivariate data

s Discussed in class:

x Comparison of mean values for two samples, when covariance matrices are assumed to beidentical: multivariate T2-test

s Other tests:

x Comparison of mean values for several samplesx Comparison of mean values for several samples when covariance matrices are not the same

x Comparison of variation for two samples

x Comparison of variation for several samples

18 / 20

Canonical correlation

s We study the relationship between a group of variables Y1, . . . , Y p and another group of variablesX1, . . . , X q by searching for linear combinations a

iX and b

iY that are most highly correlated

s Size of(aiX, biY) tells us about the strength of the relationship between X and Ys Loadings in ai and bi tell us about the type of relationship between X and Y

s One can test if the true canonical correlation is different from zero (not discussed in class)

s Possible application: find clusters among the variables (instead of among the observations)

19 / 20

9


10/10

Remaining topics we did not cover

s MANOVA: multivariate version of ANOVA (analysis of variance)

s Multivariate regression: multivariate version of multiple regression (when doing least squares,estimates are the same as when doing multiple regression for each dependent variable separately)

20 / 20

10

notes8 handout

Documents