samsitutorial

Upload: akeey4u

Post on 10-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 SAMSItutorial

    1/46

    A Tutorial on

    MultivariateStatistical Analysis

    Craig A. TracyUC Davis

    SAMSISeptember 2006

    1

  • 8/8/2019 SAMSItutorial

    2/46

    ELEMENTARY STATISTICS

    Collection of (real-valued) data from a sequence of experiments

    X 1 , X 2 , . . . , X n

    Might make assumption underlying law is N (, 2) with unknownmean and variance 2 . Want to estimate and 2 from the data.

    Sample Mean & Sample Variance :

    X = 1n

    j

    X j , S = 1n 1 jX j X 2

    Estimators are unbiased

    E (X ) = , E (S ) = 2

    2

  • 8/8/2019 SAMSItutorial

    3/46

    Theorem: If X 1 , X 2 , . . . are independent N (, 2) variables thenX and S are independent. We have that X is N (, 2 /n ) and(n 1)S/ 2 is 2(n 1).

    Recall 2(d) denotes the chi-squared distribution with d degrees of freedom. Its density is

    f 2 (x) =1

    2d/ 2 (d/ 2)xd/ 2 1 e x/ 2 , x 0,

    where

    (z) =

    0tz 1 e t dt, (z) > 0.

    3

  • 8/8/2019 SAMSItutorial

    4/46

    MULTIVARIATE GENERALIZATIONS

    From the classic textbook of Anderson[1]:

    Multivariate statistical analysis is concerned with data thatconsists of sets of measurements on a number of individualsor objects. The sample data may be heights and weights of some individuals drawn randomly from a population of school children in a given city, or the statistical treatment

    may be made on a collection of measurements, such aslengths and widths of petals and lengths and widths of sepals of iris plants taken from two species, or one maystudy the scores on batteries of mental tests administered

    to a number of students.

    p = # of sets of measurements on a given individual,

    n = # of observations = sample size

    4

  • 8/8/2019 SAMSItutorial

    5/46

    Remarks:

    In above examples, one can assume that pn since typicallymany measurements will be taken.

    Today it is common for p

    1, so n/p is no longer necessarilylarge.

    Vehicle Sound Signature Recognition: Vehicle noise is astochastic signal. The power spectrum is discretized to a

    vector of length p = 1200 with n 1200 samples from thesame kind of vehicle.Astrophysics: Sloan Digital Sky Survey typically has many

    observations (say of quasar spectrum) with the spectra of each quasar binned resulting in a large p.

    Financial data: S&P 500 stocks observed over monthlyintervals for twenty years.

    5

  • 8/8/2019 SAMSItutorial

    6/46

    GAUSSIAN DATA MATRICES

    The data are now n independent column vectors of length p

    x1 , x2 , . . . , xn

    from which we construct the n p data matrix

    X =

    xT 1 xT 2

    ...

    xT n The Gaussian assumption is that

    xj N p(, )

    Many applications assume the mean has been already substractedout of the data, i.e. = 0.

    6

  • 8/8/2019 SAMSItutorial

    7/46

    Multivariate Gaussian Distribution

    If x and y are vectors, the matrix xy is dened by

    (xy)jk = x j yk

    If = E (x) is the mean of the random vector x, then thecovariance matrix of x is the p p matrix

    = E [(x )(x )]] is a symmetric, non-negative denite matrix. If > 0 (positivedenite) and X N p(, ), then the density function of X is

    f X (x) = (2 ) p/ 2(det ) 1/ 2 exp

    1

    2x

    , 1(x

    , > x

    R p

    Sample mean:

    x =1n

    j

    xj , E (x) =

    7

  • 8/8/2019 SAMSItutorial

    8/46

    Sample covariance matrix:

    S =1

    n 1n

    j =1

    ( xj x)( xj x)

    For = 0 the sample covariance matrix can be written simply as1

    n 1X T X

    Some Notation: If X is a n p data matrix formed from the nindependent column vectors x j , cov(x j ) = , we can form onecolumn vector vec( X ) of length pn

    vec(X ) =

    x1

    x2...

    xn

    8

  • 8/8/2019 SAMSItutorial

    9/46

    The covariance of vec( X ) is the np

    np matrix

    I n

    =

    0 0 00 0 0...

    .

    .....

    .

    ..0 0 0

    In this case we say the data matrix X constructed from n

    independent x j N p(, ) has distribution

    N p(M, I n )

    where M = E (X ) = 1

    , 1 is the column vector of all 1s.

    9

  • 8/8/2019 SAMSItutorial

    10/46

    WISHART DISTRIBUTION

    Denition: If A = X T X where the n p matrix X isN p(0, I n ), > 0, then A is said to have Wishart distribution with n degrees of freedom and covariance matrix . We will say A

    is W p(n, ).Remarks:

    The Wishart distribution is the multivariate generalization of the chi-squared distribution.

    AW p(n, ) is positive denite with probability one if andonly if n p.

    The sample covariance matrix,

    S =1

    n 1A

    is W p(n

    1, 1n 1 ).

    10

  • 8/8/2019 SAMSItutorial

    11/46

    WISHART DENSITY FUNCTION , n

    p

    Let S p denote the space of p p positive denite (symmetric)matrices. If A = ( a jk )S p , let

    (dA) = volume element of A =j k

    da jk

    The multivariate gamma function is

    p(a) = S p e tr( A )

    (det A)a ( p+1) / 2

    (dA),(a) > ( p 1)/ 2.Theorem: If A is W p(n, ) with n p, then the density functionof A is

    12np p(n/ 2)(det) n/ 2

    e12 tr(

    1 A ) (det A)(n p 1) / 2

    11

  • 8/8/2019 SAMSItutorial

    12/46

    Sketch of Proof:

    The density function for X is the multivariate Gaussian(including volume element ( dX ))

    (2) np/ 2 (det ) n/ 2 e12 tr(

    1 X T X ) (dX )

    Recall the QR factorization [8]: Let X denote an n p matrixwith n p with full column rank. Then there exists an uniquen p matrix Q, QT Q = I p, and an unique n p uppertriangular matrix R with positive diagonal elements so thatX = QR . Note A = X T X = RT R.

    A Jacobian calculation [1, 13]: If A = X T X , then(dX ) = 2 p (det A)(n p 1) / 2 (dA)(QT dQ)

    12

  • 8/8/2019 SAMSItutorial

    13/46

    where

    (QT dQ) = p

    j =1

    n

    k = j +1

    qT k dqj

    and Q = ( q1 , . . . , q p) is the column representation of Q.

    Thus the joint distribution of A and Q is(2) np/ 2 (det ) n/ 2 e

    12 tr(

    1 A ) 2

    p

    (det A)(n p 1) / 2

    (dA)(QT

    dQ)

    Now integrate over all Q. Use fact that

    V n,p

    (QT dQ) =2 pnp/ 2

    p(n/ 2)and V n,p is the set of real n p matrices Q satisfyingQT Q = I p. (When n = p this is the orthogonal group.)

    13

  • 8/8/2019 SAMSItutorial

    14/46

    Remarks regarding the Wishart density function

    Case p = 2 obtain by R. A. Fisher in 1915. General p by J. Wishart in 1928 by geometrical arguments.

    Proof outlined above came later. (See [1, 13] for completeproof.)

    When Q is a p p orthogonal matrix p( p/ 2)2 p p2 / 2 Q

    T

    dQis normalized Haar measure for the orthogonal group O( p). Wedenote this Haar measure by ( dQ).

    Siegel proved (see, e.g. [13]) p(a) = p( p 1) / 4

    p

    j =1

    a 12

    ( j 1)

    14

  • 8/8/2019 SAMSItutorial

    15/46

    EIGENVALUES OF A WISHART MATRIX

    Theorem: If A is W p(n, ) with n p the joint density functionfor the eigenvalues 1 , . . . , p of A is

    p2 / 2 2 np/ 2 (det ) n/ 2

    p( p/ 2) p(n/ 2)

    p

    j =1

    (n p 1) / 2j

    j > p)where L = diag( 1 , . . . , p) and ( dQ) is normalized Haar measure.Note that ( ) := j

  • 8/8/2019 SAMSItutorial

    16/46

    Proof: Recall that the Wishart density function (times the volumeelement) is

    12np p(n/ 2)(det) n/ 2

    e12 tr(

    1 A ) (det A)(n p 1) / 2 (dA)

    The idea is to diagonalize A by an orthogonal transformation andthen integrate over the orthogonal group thereby giving the densityfunction for the eigenvalues of A.

    Let 1 > > p be the ordered eigenvalues of A.A = QLQ T , L = diag( 1 , . . . , p), QO( p)

    The j th column of Q is a normalized eigenvector of A. The

    transformation is not 11 since Q = [q1 , . . . , q p] works for eachxed A. The transformation is made 11 by requiring that the 1 stelement of each qj is nonnegative. This restricts Q (as A varies) toa 2 p part of

    O( p). We compensate for this at the end.

    16

  • 8/8/2019 SAMSItutorial

    17/46

    We need an expression for the volume element ( dA) in terms of Qand L. First we compute the differential of A

    dA = dQLQ T + QdLQ T + QLdQ T

    QT dA Q = QT dQ L + dL + L dQ T Q

    = dQ t Q dL + L dQ T Q + dL= L,dQ T Q + dL

    (We used QT Q = I implies QT dQ =

    dQT Q.)

    We now use the following fact (see, e.g., page 58 in [13]): If X = BY BT where X and Y are p p symmetric matrices, B is anonsingular p p matrix, then ( dX ) = (det B ) p+1 (dY ). In our caseQ is orthogonal so the volume element ( dA) equals the volumeelement ( QT dAQ). The volume element is the exterior product of the diagonal elements of QT dA Q times the exterior product of theelements above the diagonal.

    17

  • 8/8/2019 SAMSItutorial

    18/46

    Since L is diagonal, the commutator

    L,dQ T Q

    has zero diagonal elements. Thus the exterior product of thediagonal elements of QT dA Q is j dj .

    The exterior product of the elements coming from the commutatoris

    j

  • 8/8/2019 SAMSItutorial

    19/46

    One is interested in limit laws as n, p

    . For = I p,

    Johnstone [11] proved, using RMT methods, for centering andscaling constants

    np = n 1 + p2

    ,

    np = n 1 + p 1n 1+ 1 p

    1/ 3

    that1

    np

    npconverges in distribution as n, p , n/p < , to theGOE largest eigenvalue distribution [15].

    El Karoui [6] has extended the result to . The case pn appears, for example, in microarray data. Soshnikov [14] has lifted Gaussian assumption under the

    additional restriction n

    p = O( p1/ 3).

    19

  • 8/8/2019 SAMSItutorial

    20/46

    For = I p, the difficulty in establishing limit theorms comes

    from the integral

    O ( p) e 12 tr( 1 Q Q T ) (dQ)Using zonal polynomials innite series expansions have beenderived for this integral, but these expansions are difficult toanalyze. See Muirhead [13].

    For complex Gaussian data matrices X similar density formulasare known for the eigenvalues of X X . Limit theorems for = I p are known since the analogous group integral, now overthe unitary group, is known explicitlythe Harish

    ChandraItzyksonZuber (HCIZ) integral (see, e.g. [17]). Seethe work of Baik, Ben Arous and Peche [2, 3] and El Karoui [7].

    20

  • 8/8/2019 SAMSItutorial

    21/46

    PRINCIPAL COMPONENT ANALYSIS (PCA) ,

    H. Hotelling, 1933Population Principal Components: Let x be a p 1 randomvector with E (x) = and cov( x) = > 0. Let 1 2 pdenote the eigenvalues of and H an orthogonal matrixdiagonalizing : H T H = = diag( 1 , . . . , p). We write H incolumn vector form

    H = [h1 , . . . , h p]

    so that h j is the p 1 eigenvector of corresponding to eigenvalue j . Dene the p 1 vectoru = H T x = ( u1 , . . . , u p)T

    then cov(u) = E (H T x H T )(H T x H T )= H T E ((x )(x )) H = H T H =

    u j uncorrelated .

    21

  • 8/8/2019 SAMSItutorial

    22/46

    Denition: u j is called the j th principal component of x. Notevar( u j ) = j .Statistical interpretations: The claim is that u1 is that linearcombination of components of x that has maximum variance .

    Proof: For simplicity of notation, set = 0. Let b denote any p 1vector, bT b = 1, and form bT x.var( bT x) = E bT x bT x = E bT x (bT x)T = bT E (xx T )b = bT b.We want to maximize the right hand side subject to the constraintbT b = 1. By the method of Lagrange multipliers we maximize

    bT b(bT b1)Since is symmetric the vector of partial derivatives is

    2 b2bThus b must be an eigenvector with eigenvalue .

    22

  • 8/8/2019 SAMSItutorial

    23/46

    The largest variance corresponds to choosing the largest eigenvalue.

    The general result is that u r has maximum variance of allnormalized combinations uncorrelated with u1 , . . . , u r 1 .

    Sample principal components: Let S denote the sample

    covariance matrix of the data matrix X and let Q = [q1 , . . . , q p] a p p orthogonal matrix diagonalizing S :

    QT S Q = diag( 1 , . . . , p)

    The j are the sample variances that are estimates for j . Thevectors qj are sample estimates for the vectors h j .

    If x is the random vector and u = H T x is the vector of principal

    components, then u = QT x is the vector of sample principal components .

    23

  • 8/8/2019 SAMSItutorial

    24/46

    SCREE PLOTS

    In applications: How many of the j s are signicant?

    24

  • 8/8/2019 SAMSItutorial

    25/46

    CANONICAL CORRELATION ANALYSIS (CCA)

    H. Hotelling, 1936Suppose a large data set is naturally decomposed into two groups.For example, p 1 random vectors x1 , . . . , xn make up one set andq

    1 random vectors y1 , . . . , ym the other. We are interested in the

    correlations between these two data sets. For example, in medicinewe might have n measurements of age, height, and weight ( p = 3)and m measurements of systolic and diastolic blood pressures

    (q = 2). We are interested in what combination of the componentsof x is most correlated with a combination of the components of y.

    Population Canonical Correlations: Let x and y be tworandom vectors of size p

    1 and q

    1, respectively. We assume

    E (x) = E (y) = 0 and p q. Form the ( p + q) 1 vectorx

    y

    25

  • 8/8/2019 SAMSItutorial

    26/46

    and its ( p + q)

    ( p + q) covariance matrix

    = 11 12 21 22

    .

    Let u := T xR , v := T y

    R

    where and are vectors to be determined. We want to maximizethe correlation

    corr( u, v) = cov(u, v)

    var( u)var( v)The correlation does not change under scale transformationsu

    cu, etc. so we can maximize this correlation subject to the

    constraints

    E (u2) = E ( T x T x) = T 11 = 1 (1)E (v2) = T 22 = 1 (2)

    26

  • 8/8/2019 SAMSItutorial

    27/46

    Under these constraints

    corr( u, v ) = E ( T x T y) = T 12 .Let

    = T 12

    1

    2 ( T 11

    1)

    1

    2 ( t 22

    1)

    where and are Lagrange multipliers. Set the vector of partialderivatives to zero:

    = 12

    11 = 0 (3)

    = T 12 22 = 0 (4)If we left multiply (3) by T and (4) by T , use the normalization

    conditions (1) and (2) we conclude = . Thus (3) and (4) become

    11 12 21

    22

    = 0 (5)

    27

  • 8/8/2019 SAMSItutorial

    28/46

    with corr( u, v ) =

    and 0 is a solution to

    det 11 12 21

    22

    = 0

    This is a polynomial in of degree ( p + q). Let 1 denote themaximum root and 1 and 1 corresponding solutions to (5).

    Denition: u1 and v1 are called the rst canonical variables andtheir correlation 1 = corr( u1 , v1) is called the rst canonical correlation coefficient .

    More generally, the rth pair of canonical variables is the pair of

    linear combinations u r = ( ( r ) )T x and vr = ( ( r ) )T y, each of unitvariance and uncorrelated with the rst r 1 pairs of canonicalvariables and having maximum correlation. The correlationcorr( ur , vr ) is the r th canonical correlation coefficient .

    28

  • 8/8/2019 SAMSItutorial

    29/46

    REDUCTION TO AN EIGENVALUE PROBLEM

    Since

    11 12 21 22

    =

    11 0

    0 1I 111 12 122 21 I

    1 0

    0 22

    Thus the determinantal equation becomesdet 2I 21 111 12 122 = 0

    The nonzero roots 1 , . . . , k are called the population canonical

    correlation coefficients . k = rank( 12 ).In applications is not known. One uses the sample covariancematrix S to obtain sample canonical correlation coefficients .

    29

  • 8/8/2019 SAMSItutorial

    30/46

    AN EXAMPLE

    The rst application of Hotellings canonical correlations is byF. Waugh [16] in 1942. He begins his paper with

    Professor Hotellings paper, Relations between Two Sets

    of Variates, should be widely known and his method usedby practical statisticians. Yet, few practical statisticiansseem to know of the paper, and perhaps those few areinclined to regard it as a mathematical curiosity ratherthan an important and useful method for analyzingconcrete problems. This may be due to . . .

    The Practical Problem: Relation of wheat characteristics toour characteristics. Guiding principle

    The grade of the raw material should give a goodindication of the probable grade of the nished product.

    30

  • 8/8/2019 SAMSItutorial

    31/46

    The data: 138 samples of Canadian Hard Red Spring wheat and

    the our made from each these samples. a

    wheat quality our quality

    x1 = kernel texture y1 = wheat per barrels of our

    x2 = test weight y2 = ash in our

    x3 = damaged kernels y3 = crude protein in our

    x4 = crude protein in wheat y4 = gluten quality index

    x5 = foreign material

    u1 = T 1 x = index of wheat quality , v1 = T 1 y = index of our quality

    corr( u1, v

    1) = 0 .909

    a In this example p > q . The data are normalized to mean 0 and variance 1.

    31

  • 8/8/2019 SAMSItutorial

    32/46

  • 8/8/2019 SAMSItutorial

    33/46

    c p,q,n is a normalization constant. ( ) = Vandermonde determinant = pj

  • 8/8/2019 SAMSItutorial

    34/46

    ZONAL POLYNOMIALS

    Zonal polynomials naturally arise in multivariate analysis whenconsidering group integrals such as

    O (m )

    e tr( X H Y H T ) (dH )

    where X and Y are m m symmetric, positive denite matricesand ( dH ) is normalized Haar measure. Zonal polynomials can be

    dened either through the representation theory of GL(m,R

    ) [12]or as eigenfunctions of certain Laplacians [5, 13].

    Let S m denote the space of m m symmetric, positive denitematrices. Zonal polynomials, C (X ), X

    S m are certain

    homogeneous polynomials in the eigenvalues of X that are indexedby partitions .

    To give the precise denition we need some preliminary denitions.

    34

  • 8/8/2019 SAMSItutorial

    35/46

    Denition: A partition of n is a sequence = ( 1 , 2 , . . . , )

    where the j 0 are weakly decreasing and j j = n. We denotethis by n.For example, = (5 , 3, 3, 1) is a partition of 12. The number of

    nonzero parts of is called the length of , denoted ().If and are two partitions of n, we say < (lexicographicorder ) if, for some index i, j = j for j < i and j < j . Forexample

    (1, 1, 1, 1) < (2, 1, 1) < (2, 2) < (3, 1) < (4)

    If n with () = m, we dene the monomial

    x

    = x 11 x

    22 x

    mm

    and say x is of higher weight than x if > .

    35

  • 8/8/2019 SAMSItutorial

    36/46

    Metrics and Laplacians

    Let X S m and dX = ( dx jk ) the matrix of differentials of X . Wedene a metric on S m by

    (ds)2 = tr X 1dX X 1dX A simple computation shows this metric is invariant under

    X LXL T , LGL(m, R ).

    Let n = m(m + 1) / 2 = # of independent elements of X . We denoteby vec(X )

    R n the column vector representation of X , e.g.

    X =x11 x12x12 x22

    , x := vec( X ) =

    x11

    x12x22

    The metric ( ds)2 is a quadratic differential in dx.

    36

  • 8/8/2019 SAMSItutorial

    37/46

    For example,

    (ds)2 = dx11 dx12 dx22 G(x) dx11dx12dx22

    whereG(x) = ( gij ) =

    1(x11 x22 x212 )2

    x222 2x12 x22 x

    212

    2x12 x22 2x11 x22 + 2 x212 2x11 x12x212 2x11 x12 x211

    Labeling x = vec( X ) with a single index the differential is in thestandard form

    (ds)2 =i

  • 8/8/2019 SAMSItutorial

    38/46

    and the Laplacian associated to this metric is

    X = (det G) 1/ 2n

    j =1

    x j

    (det G)1/ 2n

    i =1

    gij

    x i

    whereG 1 = ( gij )

    More succinctly, if

    X =

    x 1

    ...

    x n

    ,

    X = (det G) 1/ 2

    X , (det G)

    1/ 2G

    1

    X

    where (, ) is the standard inner product on R n .

    38

  • 8/8/2019 SAMSItutorial

    39/46

    One can show that X is an invariant differential operator :

    LXL T = X , LGL(m,R )

    We now diagonalize X

    X = HY H T

    , Y = diag( y1 , . . . , ym ), H O(m).The Laplacian X is now expressed in terms of a radial part andan angular part . The radial part of X is the differential operator

    m

    j =1

    y2j 2

    y2j+

    m

    j =1

    m

    k =1k = j

    y2jyj yk

    yj

    +j

    yj

    yj

    We now let X denote only the radial part.

    We can now dene zonal polynomials!

    39

  • 8/8/2019 SAMSItutorial

    40/46

    Denition: Let X S m with eigenvalues x1 , . . . , xm and let = ( 1 , . . . , m)k into not more than m parts. Then C (X ) isthe symmetric, homogeneous polynomial of degree k in x j such that

    1. The term of highest weight in C (X ) is x

    2. C (X ) is an eigenfunction of the Laplacian X .

    3. (tr( X )) k = ( x1 + + xm )k = k

    ( ) m

    C (X ).

    Remarks:

    Must show there is an unique polynomial satisfying theserequirements.

    Eigenvalue in (2) equals := j j ( j j ) + k(m + 1) / 2. Program MOPS [5] computes zonal polynomials.

    40

  • 8/8/2019 SAMSItutorial

    41/46

    The following theorem is at the core of why zonal polynomialsappear in multivariate statistical analysis.

    Theorem: If X, Y S p, then

    O ( p) C (XHYH T ) (dH ) = C (X )C (Y )C (I p) (6)where (dH ) is normalized Haar measure.Proof: Let f (Y ) denote the left-hand side of (6) and QO( p).f (QY QT ) = f (Y ) (let H HQ in integral and use invariance of the measure). Thus f is a symmetric function of the eigenvalues of Y . Since C is homogeneous of degree ||, so is f . Now apply the

    41

  • 8/8/2019 SAMSItutorial

    42/46

    Laplacian to f .

    Y f (Y ) = O ( p) Y C (XHYH T ) (dH )=

    Y C (X 1/ 2HY H T X 1/ 2) (dH )

    = Y C (LY LT ) (dH ) (L = X 1/ 2H )=

    LY L T C (LY LT ) (dH ) invariance of Y

    = C (LY LT ) (dH )= f (Y )

    By denition of C (Y ) we have f (Y ) = d C (Y ). Sincef (I p) = C (X ), we nd d and the theorem follows.

    42

  • 8/8/2019 SAMSItutorial

    43/46

    Using Zonal Polynomials to Evaluate Group Integrals

    O ( p) e tr( X H Y H T ) (dH )

    =

    k =0

    k

    k! O ( p)

    tr( XHYH t ) k (dH )

    =

    k =0

    k

    k! k

    ( ) m O ( p) C (XHYH T ) (dH )

    =

    k =0

    k

    k! k

    ( ) m

    C (X )C (Y )C (I p)

    43

  • 8/8/2019 SAMSItutorial

    44/46

    References

    [1] T. W. Anderson, An Introduction to Multivariate Statistical Analysis , third edition, John Wiley & Sons, Inc., 2003.

    [2] J. Baik, G. Ben Arous, and S. Peche, Phase transition of the largest

    eigenvalue for nonnull complex sample covariance matrices, Ann.Probab. 33 (2005), 16431697.

    [3] J. Baik, Painleve formulas of the limiting distributions for nonnullcomplex sample covariance matrices, Duke Math. J. 133 (2006),205235.

    [4] M. Dieng and C. A. Tracy, Application of random matrix theory tomultivariate statistics, arXiv: math.PR/0603543.

    [5] I. Dumitriu, A. Edelman and G. Shuman, MOPS: MultivariateOrthogonal Polynomials (symbolically), arXiv:math-ph/0409066.

    [6] N. El Karoui, On the largest eigenvalue of Wishart matrices withidentity covariance when n , p and p/n tend to innity, arXiv:

    44

  • 8/8/2019 SAMSItutorial

    45/46

    math.ST/0309355.

    [7] N. El Karoui, Tracy-Widom limit for the largest eigenvalue of alarge class of complex Wishart matrices, arXiv: math.PR/0503109.

    [8] G. H. Golub and C. F. Van Loan, Matrix Computations , secondedition, Johns Hopkins University Press, 1989.

    [9] H. Hotelling, Analysis of a complex of statistical variables intoprincipal components, J. of Educational Psychology 24 (1933),417441, 498520.

    [10] H. Hotelling, Relations between two sets of variates, Biometrika 28(1936), 321377.

    [11] I. M. Johnstone, On the distribution of the largest eigenvalue inprincipal component analysis, Ann. Stat. 29 (2001), 295327.

    [12] I. G. Macdonald, Symmetric Functions and Hall Polynomials , 2nded., Oxford University Press, 1995.

    [13] R. J. Muirhead, Aspects of Multivariate Statistical Theory , John

    45

  • 8/8/2019 SAMSItutorial

    46/46

    Wiley & Sons Inc., 1982.

    [14] A. Soshnikov, A note on universality of the distribution of thelargest eigenvalue in certain sample covariance matrices, J.Statistical Physics 108 (2002), 10331056.

    [15] C. A. Tracy and H. Widom, On orthogonal and symplectic matrixensembles, Commun. Math. Phys. 177 (1996), 727754.

    [16] F. V. Waugh, Regression between sets of variables, Econometrica 10 (1942), 290310.

    [17] P. Zinn-Justin and J.-B. Zuber, On some integrals over the U (N )unitary group and their large N limit, J. Phys. A: Math. Gen. 36

    (2003), 31733193.

    46