samsitutorial
TRANSCRIPT
-
8/8/2019 SAMSItutorial
1/46
A Tutorial on
MultivariateStatistical Analysis
Craig A. TracyUC Davis
SAMSISeptember 2006
1
-
8/8/2019 SAMSItutorial
2/46
ELEMENTARY STATISTICS
Collection of (real-valued) data from a sequence of experiments
X 1 , X 2 , . . . , X n
Might make assumption underlying law is N (, 2) with unknownmean and variance 2 . Want to estimate and 2 from the data.
Sample Mean & Sample Variance :
X = 1n
j
X j , S = 1n 1 jX j X 2
Estimators are unbiased
E (X ) = , E (S ) = 2
2
-
8/8/2019 SAMSItutorial
3/46
Theorem: If X 1 , X 2 , . . . are independent N (, 2) variables thenX and S are independent. We have that X is N (, 2 /n ) and(n 1)S/ 2 is 2(n 1).
Recall 2(d) denotes the chi-squared distribution with d degrees of freedom. Its density is
f 2 (x) =1
2d/ 2 (d/ 2)xd/ 2 1 e x/ 2 , x 0,
where
(z) =
0tz 1 e t dt, (z) > 0.
3
-
8/8/2019 SAMSItutorial
4/46
MULTIVARIATE GENERALIZATIONS
From the classic textbook of Anderson[1]:
Multivariate statistical analysis is concerned with data thatconsists of sets of measurements on a number of individualsor objects. The sample data may be heights and weights of some individuals drawn randomly from a population of school children in a given city, or the statistical treatment
may be made on a collection of measurements, such aslengths and widths of petals and lengths and widths of sepals of iris plants taken from two species, or one maystudy the scores on batteries of mental tests administered
to a number of students.
p = # of sets of measurements on a given individual,
n = # of observations = sample size
4
-
8/8/2019 SAMSItutorial
5/46
Remarks:
In above examples, one can assume that pn since typicallymany measurements will be taken.
Today it is common for p
1, so n/p is no longer necessarilylarge.
Vehicle Sound Signature Recognition: Vehicle noise is astochastic signal. The power spectrum is discretized to a
vector of length p = 1200 with n 1200 samples from thesame kind of vehicle.Astrophysics: Sloan Digital Sky Survey typically has many
observations (say of quasar spectrum) with the spectra of each quasar binned resulting in a large p.
Financial data: S&P 500 stocks observed over monthlyintervals for twenty years.
5
-
8/8/2019 SAMSItutorial
6/46
GAUSSIAN DATA MATRICES
The data are now n independent column vectors of length p
x1 , x2 , . . . , xn
from which we construct the n p data matrix
X =
xT 1 xT 2
...
xT n The Gaussian assumption is that
xj N p(, )
Many applications assume the mean has been already substractedout of the data, i.e. = 0.
6
-
8/8/2019 SAMSItutorial
7/46
Multivariate Gaussian Distribution
If x and y are vectors, the matrix xy is dened by
(xy)jk = x j yk
If = E (x) is the mean of the random vector x, then thecovariance matrix of x is the p p matrix
= E [(x )(x )]] is a symmetric, non-negative denite matrix. If > 0 (positivedenite) and X N p(, ), then the density function of X is
f X (x) = (2 ) p/ 2(det ) 1/ 2 exp
1
2x
, 1(x
, > x
R p
Sample mean:
x =1n
j
xj , E (x) =
7
-
8/8/2019 SAMSItutorial
8/46
Sample covariance matrix:
S =1
n 1n
j =1
( xj x)( xj x)
For = 0 the sample covariance matrix can be written simply as1
n 1X T X
Some Notation: If X is a n p data matrix formed from the nindependent column vectors x j , cov(x j ) = , we can form onecolumn vector vec( X ) of length pn
vec(X ) =
x1
x2...
xn
8
-
8/8/2019 SAMSItutorial
9/46
The covariance of vec( X ) is the np
np matrix
I n
=
0 0 00 0 0...
.
.....
.
..0 0 0
In this case we say the data matrix X constructed from n
independent x j N p(, ) has distribution
N p(M, I n )
where M = E (X ) = 1
, 1 is the column vector of all 1s.
9
-
8/8/2019 SAMSItutorial
10/46
WISHART DISTRIBUTION
Denition: If A = X T X where the n p matrix X isN p(0, I n ), > 0, then A is said to have Wishart distribution with n degrees of freedom and covariance matrix . We will say A
is W p(n, ).Remarks:
The Wishart distribution is the multivariate generalization of the chi-squared distribution.
AW p(n, ) is positive denite with probability one if andonly if n p.
The sample covariance matrix,
S =1
n 1A
is W p(n
1, 1n 1 ).
10
-
8/8/2019 SAMSItutorial
11/46
WISHART DENSITY FUNCTION , n
p
Let S p denote the space of p p positive denite (symmetric)matrices. If A = ( a jk )S p , let
(dA) = volume element of A =j k
da jk
The multivariate gamma function is
p(a) = S p e tr( A )
(det A)a ( p+1) / 2
(dA),(a) > ( p 1)/ 2.Theorem: If A is W p(n, ) with n p, then the density functionof A is
12np p(n/ 2)(det) n/ 2
e12 tr(
1 A ) (det A)(n p 1) / 2
11
-
8/8/2019 SAMSItutorial
12/46
Sketch of Proof:
The density function for X is the multivariate Gaussian(including volume element ( dX ))
(2) np/ 2 (det ) n/ 2 e12 tr(
1 X T X ) (dX )
Recall the QR factorization [8]: Let X denote an n p matrixwith n p with full column rank. Then there exists an uniquen p matrix Q, QT Q = I p, and an unique n p uppertriangular matrix R with positive diagonal elements so thatX = QR . Note A = X T X = RT R.
A Jacobian calculation [1, 13]: If A = X T X , then(dX ) = 2 p (det A)(n p 1) / 2 (dA)(QT dQ)
12
-
8/8/2019 SAMSItutorial
13/46
where
(QT dQ) = p
j =1
n
k = j +1
qT k dqj
and Q = ( q1 , . . . , q p) is the column representation of Q.
Thus the joint distribution of A and Q is(2) np/ 2 (det ) n/ 2 e
12 tr(
1 A ) 2
p
(det A)(n p 1) / 2
(dA)(QT
dQ)
Now integrate over all Q. Use fact that
V n,p
(QT dQ) =2 pnp/ 2
p(n/ 2)and V n,p is the set of real n p matrices Q satisfyingQT Q = I p. (When n = p this is the orthogonal group.)
13
-
8/8/2019 SAMSItutorial
14/46
Remarks regarding the Wishart density function
Case p = 2 obtain by R. A. Fisher in 1915. General p by J. Wishart in 1928 by geometrical arguments.
Proof outlined above came later. (See [1, 13] for completeproof.)
When Q is a p p orthogonal matrix p( p/ 2)2 p p2 / 2 Q
T
dQis normalized Haar measure for the orthogonal group O( p). Wedenote this Haar measure by ( dQ).
Siegel proved (see, e.g. [13]) p(a) = p( p 1) / 4
p
j =1
a 12
( j 1)
14
-
8/8/2019 SAMSItutorial
15/46
EIGENVALUES OF A WISHART MATRIX
Theorem: If A is W p(n, ) with n p the joint density functionfor the eigenvalues 1 , . . . , p of A is
p2 / 2 2 np/ 2 (det ) n/ 2
p( p/ 2) p(n/ 2)
p
j =1
(n p 1) / 2j
j > p)where L = diag( 1 , . . . , p) and ( dQ) is normalized Haar measure.Note that ( ) := j
-
8/8/2019 SAMSItutorial
16/46
Proof: Recall that the Wishart density function (times the volumeelement) is
12np p(n/ 2)(det) n/ 2
e12 tr(
1 A ) (det A)(n p 1) / 2 (dA)
The idea is to diagonalize A by an orthogonal transformation andthen integrate over the orthogonal group thereby giving the densityfunction for the eigenvalues of A.
Let 1 > > p be the ordered eigenvalues of A.A = QLQ T , L = diag( 1 , . . . , p), QO( p)
The j th column of Q is a normalized eigenvector of A. The
transformation is not 11 since Q = [q1 , . . . , q p] works for eachxed A. The transformation is made 11 by requiring that the 1 stelement of each qj is nonnegative. This restricts Q (as A varies) toa 2 p part of
O( p). We compensate for this at the end.
16
-
8/8/2019 SAMSItutorial
17/46
We need an expression for the volume element ( dA) in terms of Qand L. First we compute the differential of A
dA = dQLQ T + QdLQ T + QLdQ T
QT dA Q = QT dQ L + dL + L dQ T Q
= dQ t Q dL + L dQ T Q + dL= L,dQ T Q + dL
(We used QT Q = I implies QT dQ =
dQT Q.)
We now use the following fact (see, e.g., page 58 in [13]): If X = BY BT where X and Y are p p symmetric matrices, B is anonsingular p p matrix, then ( dX ) = (det B ) p+1 (dY ). In our caseQ is orthogonal so the volume element ( dA) equals the volumeelement ( QT dAQ). The volume element is the exterior product of the diagonal elements of QT dA Q times the exterior product of theelements above the diagonal.
17
-
8/8/2019 SAMSItutorial
18/46
Since L is diagonal, the commutator
L,dQ T Q
has zero diagonal elements. Thus the exterior product of thediagonal elements of QT dA Q is j dj .
The exterior product of the elements coming from the commutatoris
j
-
8/8/2019 SAMSItutorial
19/46
One is interested in limit laws as n, p
. For = I p,
Johnstone [11] proved, using RMT methods, for centering andscaling constants
np = n 1 + p2
,
np = n 1 + p 1n 1+ 1 p
1/ 3
that1
np
npconverges in distribution as n, p , n/p < , to theGOE largest eigenvalue distribution [15].
El Karoui [6] has extended the result to . The case pn appears, for example, in microarray data. Soshnikov [14] has lifted Gaussian assumption under the
additional restriction n
p = O( p1/ 3).
19
-
8/8/2019 SAMSItutorial
20/46
For = I p, the difficulty in establishing limit theorms comes
from the integral
O ( p) e 12 tr( 1 Q Q T ) (dQ)Using zonal polynomials innite series expansions have beenderived for this integral, but these expansions are difficult toanalyze. See Muirhead [13].
For complex Gaussian data matrices X similar density formulasare known for the eigenvalues of X X . Limit theorems for = I p are known since the analogous group integral, now overthe unitary group, is known explicitlythe Harish
ChandraItzyksonZuber (HCIZ) integral (see, e.g. [17]). Seethe work of Baik, Ben Arous and Peche [2, 3] and El Karoui [7].
20
-
8/8/2019 SAMSItutorial
21/46
PRINCIPAL COMPONENT ANALYSIS (PCA) ,
H. Hotelling, 1933Population Principal Components: Let x be a p 1 randomvector with E (x) = and cov( x) = > 0. Let 1 2 pdenote the eigenvalues of and H an orthogonal matrixdiagonalizing : H T H = = diag( 1 , . . . , p). We write H incolumn vector form
H = [h1 , . . . , h p]
so that h j is the p 1 eigenvector of corresponding to eigenvalue j . Dene the p 1 vectoru = H T x = ( u1 , . . . , u p)T
then cov(u) = E (H T x H T )(H T x H T )= H T E ((x )(x )) H = H T H =
u j uncorrelated .
21
-
8/8/2019 SAMSItutorial
22/46
Denition: u j is called the j th principal component of x. Notevar( u j ) = j .Statistical interpretations: The claim is that u1 is that linearcombination of components of x that has maximum variance .
Proof: For simplicity of notation, set = 0. Let b denote any p 1vector, bT b = 1, and form bT x.var( bT x) = E bT x bT x = E bT x (bT x)T = bT E (xx T )b = bT b.We want to maximize the right hand side subject to the constraintbT b = 1. By the method of Lagrange multipliers we maximize
bT b(bT b1)Since is symmetric the vector of partial derivatives is
2 b2bThus b must be an eigenvector with eigenvalue .
22
-
8/8/2019 SAMSItutorial
23/46
The largest variance corresponds to choosing the largest eigenvalue.
The general result is that u r has maximum variance of allnormalized combinations uncorrelated with u1 , . . . , u r 1 .
Sample principal components: Let S denote the sample
covariance matrix of the data matrix X and let Q = [q1 , . . . , q p] a p p orthogonal matrix diagonalizing S :
QT S Q = diag( 1 , . . . , p)
The j are the sample variances that are estimates for j . Thevectors qj are sample estimates for the vectors h j .
If x is the random vector and u = H T x is the vector of principal
components, then u = QT x is the vector of sample principal components .
23
-
8/8/2019 SAMSItutorial
24/46
SCREE PLOTS
In applications: How many of the j s are signicant?
24
-
8/8/2019 SAMSItutorial
25/46
CANONICAL CORRELATION ANALYSIS (CCA)
H. Hotelling, 1936Suppose a large data set is naturally decomposed into two groups.For example, p 1 random vectors x1 , . . . , xn make up one set andq
1 random vectors y1 , . . . , ym the other. We are interested in the
correlations between these two data sets. For example, in medicinewe might have n measurements of age, height, and weight ( p = 3)and m measurements of systolic and diastolic blood pressures
(q = 2). We are interested in what combination of the componentsof x is most correlated with a combination of the components of y.
Population Canonical Correlations: Let x and y be tworandom vectors of size p
1 and q
1, respectively. We assume
E (x) = E (y) = 0 and p q. Form the ( p + q) 1 vectorx
y
25
-
8/8/2019 SAMSItutorial
26/46
and its ( p + q)
( p + q) covariance matrix
= 11 12 21 22
.
Let u := T xR , v := T y
R
where and are vectors to be determined. We want to maximizethe correlation
corr( u, v) = cov(u, v)
var( u)var( v)The correlation does not change under scale transformationsu
cu, etc. so we can maximize this correlation subject to the
constraints
E (u2) = E ( T x T x) = T 11 = 1 (1)E (v2) = T 22 = 1 (2)
26
-
8/8/2019 SAMSItutorial
27/46
Under these constraints
corr( u, v ) = E ( T x T y) = T 12 .Let
= T 12
1
2 ( T 11
1)
1
2 ( t 22
1)
where and are Lagrange multipliers. Set the vector of partialderivatives to zero:
= 12
11 = 0 (3)
= T 12 22 = 0 (4)If we left multiply (3) by T and (4) by T , use the normalization
conditions (1) and (2) we conclude = . Thus (3) and (4) become
11 12 21
22
= 0 (5)
27
-
8/8/2019 SAMSItutorial
28/46
with corr( u, v ) =
and 0 is a solution to
det 11 12 21
22
= 0
This is a polynomial in of degree ( p + q). Let 1 denote themaximum root and 1 and 1 corresponding solutions to (5).
Denition: u1 and v1 are called the rst canonical variables andtheir correlation 1 = corr( u1 , v1) is called the rst canonical correlation coefficient .
More generally, the rth pair of canonical variables is the pair of
linear combinations u r = ( ( r ) )T x and vr = ( ( r ) )T y, each of unitvariance and uncorrelated with the rst r 1 pairs of canonicalvariables and having maximum correlation. The correlationcorr( ur , vr ) is the r th canonical correlation coefficient .
28
-
8/8/2019 SAMSItutorial
29/46
REDUCTION TO AN EIGENVALUE PROBLEM
Since
11 12 21 22
=
11 0
0 1I 111 12 122 21 I
1 0
0 22
Thus the determinantal equation becomesdet 2I 21 111 12 122 = 0
The nonzero roots 1 , . . . , k are called the population canonical
correlation coefficients . k = rank( 12 ).In applications is not known. One uses the sample covariancematrix S to obtain sample canonical correlation coefficients .
29
-
8/8/2019 SAMSItutorial
30/46
AN EXAMPLE
The rst application of Hotellings canonical correlations is byF. Waugh [16] in 1942. He begins his paper with
Professor Hotellings paper, Relations between Two Sets
of Variates, should be widely known and his method usedby practical statisticians. Yet, few practical statisticiansseem to know of the paper, and perhaps those few areinclined to regard it as a mathematical curiosity ratherthan an important and useful method for analyzingconcrete problems. This may be due to . . .
The Practical Problem: Relation of wheat characteristics toour characteristics. Guiding principle
The grade of the raw material should give a goodindication of the probable grade of the nished product.
30
-
8/8/2019 SAMSItutorial
31/46
The data: 138 samples of Canadian Hard Red Spring wheat and
the our made from each these samples. a
wheat quality our quality
x1 = kernel texture y1 = wheat per barrels of our
x2 = test weight y2 = ash in our
x3 = damaged kernels y3 = crude protein in our
x4 = crude protein in wheat y4 = gluten quality index
x5 = foreign material
u1 = T 1 x = index of wheat quality , v1 = T 1 y = index of our quality
corr( u1, v
1) = 0 .909
a In this example p > q . The data are normalized to mean 0 and variance 1.
31
-
8/8/2019 SAMSItutorial
32/46
-
8/8/2019 SAMSItutorial
33/46
c p,q,n is a normalization constant. ( ) = Vandermonde determinant = pj
-
8/8/2019 SAMSItutorial
34/46
ZONAL POLYNOMIALS
Zonal polynomials naturally arise in multivariate analysis whenconsidering group integrals such as
O (m )
e tr( X H Y H T ) (dH )
where X and Y are m m symmetric, positive denite matricesand ( dH ) is normalized Haar measure. Zonal polynomials can be
dened either through the representation theory of GL(m,R
) [12]or as eigenfunctions of certain Laplacians [5, 13].
Let S m denote the space of m m symmetric, positive denitematrices. Zonal polynomials, C (X ), X
S m are certain
homogeneous polynomials in the eigenvalues of X that are indexedby partitions .
To give the precise denition we need some preliminary denitions.
34
-
8/8/2019 SAMSItutorial
35/46
Denition: A partition of n is a sequence = ( 1 , 2 , . . . , )
where the j 0 are weakly decreasing and j j = n. We denotethis by n.For example, = (5 , 3, 3, 1) is a partition of 12. The number of
nonzero parts of is called the length of , denoted ().If and are two partitions of n, we say < (lexicographicorder ) if, for some index i, j = j for j < i and j < j . Forexample
(1, 1, 1, 1) < (2, 1, 1) < (2, 2) < (3, 1) < (4)
If n with () = m, we dene the monomial
x
= x 11 x
22 x
mm
and say x is of higher weight than x if > .
35
-
8/8/2019 SAMSItutorial
36/46
Metrics and Laplacians
Let X S m and dX = ( dx jk ) the matrix of differentials of X . Wedene a metric on S m by
(ds)2 = tr X 1dX X 1dX A simple computation shows this metric is invariant under
X LXL T , LGL(m, R ).
Let n = m(m + 1) / 2 = # of independent elements of X . We denoteby vec(X )
R n the column vector representation of X , e.g.
X =x11 x12x12 x22
, x := vec( X ) =
x11
x12x22
The metric ( ds)2 is a quadratic differential in dx.
36
-
8/8/2019 SAMSItutorial
37/46
For example,
(ds)2 = dx11 dx12 dx22 G(x) dx11dx12dx22
whereG(x) = ( gij ) =
1(x11 x22 x212 )2
x222 2x12 x22 x
212
2x12 x22 2x11 x22 + 2 x212 2x11 x12x212 2x11 x12 x211
Labeling x = vec( X ) with a single index the differential is in thestandard form
(ds)2 =i
-
8/8/2019 SAMSItutorial
38/46
and the Laplacian associated to this metric is
X = (det G) 1/ 2n
j =1
x j
(det G)1/ 2n
i =1
gij
x i
whereG 1 = ( gij )
More succinctly, if
X =
x 1
...
x n
,
X = (det G) 1/ 2
X , (det G)
1/ 2G
1
X
where (, ) is the standard inner product on R n .
38
-
8/8/2019 SAMSItutorial
39/46
One can show that X is an invariant differential operator :
LXL T = X , LGL(m,R )
We now diagonalize X
X = HY H T
, Y = diag( y1 , . . . , ym ), H O(m).The Laplacian X is now expressed in terms of a radial part andan angular part . The radial part of X is the differential operator
m
j =1
y2j 2
y2j+
m
j =1
m
k =1k = j
y2jyj yk
yj
+j
yj
yj
We now let X denote only the radial part.
We can now dene zonal polynomials!
39
-
8/8/2019 SAMSItutorial
40/46
Denition: Let X S m with eigenvalues x1 , . . . , xm and let = ( 1 , . . . , m)k into not more than m parts. Then C (X ) isthe symmetric, homogeneous polynomial of degree k in x j such that
1. The term of highest weight in C (X ) is x
2. C (X ) is an eigenfunction of the Laplacian X .
3. (tr( X )) k = ( x1 + + xm )k = k
( ) m
C (X ).
Remarks:
Must show there is an unique polynomial satisfying theserequirements.
Eigenvalue in (2) equals := j j ( j j ) + k(m + 1) / 2. Program MOPS [5] computes zonal polynomials.
40
-
8/8/2019 SAMSItutorial
41/46
The following theorem is at the core of why zonal polynomialsappear in multivariate statistical analysis.
Theorem: If X, Y S p, then
O ( p) C (XHYH T ) (dH ) = C (X )C (Y )C (I p) (6)where (dH ) is normalized Haar measure.Proof: Let f (Y ) denote the left-hand side of (6) and QO( p).f (QY QT ) = f (Y ) (let H HQ in integral and use invariance of the measure). Thus f is a symmetric function of the eigenvalues of Y . Since C is homogeneous of degree ||, so is f . Now apply the
41
-
8/8/2019 SAMSItutorial
42/46
Laplacian to f .
Y f (Y ) = O ( p) Y C (XHYH T ) (dH )=
Y C (X 1/ 2HY H T X 1/ 2) (dH )
= Y C (LY LT ) (dH ) (L = X 1/ 2H )=
LY L T C (LY LT ) (dH ) invariance of Y
= C (LY LT ) (dH )= f (Y )
By denition of C (Y ) we have f (Y ) = d C (Y ). Sincef (I p) = C (X ), we nd d and the theorem follows.
42
-
8/8/2019 SAMSItutorial
43/46
Using Zonal Polynomials to Evaluate Group Integrals
O ( p) e tr( X H Y H T ) (dH )
=
k =0
k
k! O ( p)
tr( XHYH t ) k (dH )
=
k =0
k
k! k
( ) m O ( p) C (XHYH T ) (dH )
=
k =0
k
k! k
( ) m
C (X )C (Y )C (I p)
43
-
8/8/2019 SAMSItutorial
44/46
References
[1] T. W. Anderson, An Introduction to Multivariate Statistical Analysis , third edition, John Wiley & Sons, Inc., 2003.
[2] J. Baik, G. Ben Arous, and S. Peche, Phase transition of the largest
eigenvalue for nonnull complex sample covariance matrices, Ann.Probab. 33 (2005), 16431697.
[3] J. Baik, Painleve formulas of the limiting distributions for nonnullcomplex sample covariance matrices, Duke Math. J. 133 (2006),205235.
[4] M. Dieng and C. A. Tracy, Application of random matrix theory tomultivariate statistics, arXiv: math.PR/0603543.
[5] I. Dumitriu, A. Edelman and G. Shuman, MOPS: MultivariateOrthogonal Polynomials (symbolically), arXiv:math-ph/0409066.
[6] N. El Karoui, On the largest eigenvalue of Wishart matrices withidentity covariance when n , p and p/n tend to innity, arXiv:
44
-
8/8/2019 SAMSItutorial
45/46
math.ST/0309355.
[7] N. El Karoui, Tracy-Widom limit for the largest eigenvalue of alarge class of complex Wishart matrices, arXiv: math.PR/0503109.
[8] G. H. Golub and C. F. Van Loan, Matrix Computations , secondedition, Johns Hopkins University Press, 1989.
[9] H. Hotelling, Analysis of a complex of statistical variables intoprincipal components, J. of Educational Psychology 24 (1933),417441, 498520.
[10] H. Hotelling, Relations between two sets of variates, Biometrika 28(1936), 321377.
[11] I. M. Johnstone, On the distribution of the largest eigenvalue inprincipal component analysis, Ann. Stat. 29 (2001), 295327.
[12] I. G. Macdonald, Symmetric Functions and Hall Polynomials , 2nded., Oxford University Press, 1995.
[13] R. J. Muirhead, Aspects of Multivariate Statistical Theory , John
45
-
8/8/2019 SAMSItutorial
46/46
Wiley & Sons Inc., 1982.
[14] A. Soshnikov, A note on universality of the distribution of thelargest eigenvalue in certain sample covariance matrices, J.Statistical Physics 108 (2002), 10331056.
[15] C. A. Tracy and H. Widom, On orthogonal and symplectic matrixensembles, Commun. Math. Phys. 177 (1996), 727754.
[16] F. V. Waugh, Regression between sets of variables, Econometrica 10 (1942), 290310.
[17] P. Zinn-Justin and J.-B. Zuber, On some integrals over the U (N )unitary group and their large N limit, J. Phys. A: Math. Gen. 36
(2003), 31733193.
46