tests for covariance matrices with high ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfabstract title...
TRANSCRIPT
TESTS FOR COVARIANCE MATRICES
WITH HIGH-DIMENSIONAL DATA
Saowapha Chaipitak
A Dissertation Submitted in Partial
Fulfillment of the Requirements for the Degree of
Doctor of Philosophy (Statistics)
School of Applied Statistics
National Institute of Development Administration
2012
ABSTRACT
Title of Dissertation Tests for Covariance Matrices with High-dimensional Data
Author Ms. Saowapha Chaipitak
Degree Doctor of Philosophy (Statistics)
Year 2012
In multivariate statistical analysis, it is a necessity to know the facts regarding
the covariance matrix of the data in hand before applying any further analysis. This
study focuses on testing hypotheses concerning the covariance matrices of
multivariate normal data having the number of variables larger than or equal to the
sample size, called high-dimensional data. The two objectives of this study were: first,
for one sample data, to develop a test statistic for testing the hypothesis for whether
the covariance matrix equals a specified known matrix, called a partially known
matrix, and second, for two independent sample data, to develop a test statistic for
testing the hypothesis of equality of two covariance matrices of the two independent
populations. For the two hypotheses, a classical method such as the likelihood ratio
test is commonly used and is well defined when the sample size is larger than the
number of variables.
The two proposed test statistics 1T (for one sample data) and 2T (for two
sample data) are proposed for a high-dimensional situation. Both test statistics 1T and
2T are asymptotically normally distributed when the number of variables and the
sample size go towards infinity. A simulation study showed that both proposed test
statistics 1T and 2T approximately control the nominal significance level and have
good powers. The convergences to asymptotic normality of the two statistics were not
greatly affected by the change of covariance structures considered in the study
(Unstructured, Compound Symmetry, Heterogeneous Compound Symmetry, Simple,
Toeplitz, and Variance Component).
iv
Furthermore, in the one sample case, the proposed test statistic 1T performed
comparably to the test statistics JU , proposed by Ledoit and Wolf (2002), and 1ST ,
proposed by Srivastava (2005), for large sample sizes and was more powerful than
these two tests for small or moderate sample sizes with a larger or equal number of
variables. In the two sample case, the proposed test statistic 2T is as good as the
competitive tests ,, 2SJ TT and ,SYT proposed by Schott (2007), Srivastava (2007b), and
Srivastava and Yanagihara (2010), respectively, for large sample sizes and is
markedly superior to these competitive tests for small to moderate sample sizes with a
larger or equal number of variables for all of the covariance matrix structures
considered. Finally, two real datasets regarding human gene expression in colon
tissues were also analyzed to illustrate the application of the theoretical results.
ACKNOWLEDGEMENTS
I would like to express my gratitude to everyone who has given me help in
completing this dissertation. In particular, I am indebted to my advisor, Associate
Professor Dr. Samruam Chongcharoen, for always giving invaluable suggestions,
motivation and encouragement. With his guidance and support, I have completed the
dissertation. I also gratefully acknowledge my committee members: Professor Dr.
Prachoom Suwattee; Associate Professor Dr. Vichit Lorchirachoonkul and Assistant
Professor Dr. Winai Bodhisuwan, for contributing both their time and helpful
comments and suggestions.
I am grateful to the Commission on Higher Education of Thailand for financial
support through a grant fund under the Strategic Scholarships Fellowships Frontier
Research Networks. I am also grateful to Dr. John McMorris for his kindness towards
me by editing my English, which has made the manuscript more readable.
I would like to express my deep thanks to my friends for their help, spirit,
patience and co-operation.
Finally, I am greatly indebted to my parents, my elder brother, and my
younger sister, for their greatest love, encouragement, and support throughout my
graduate study.
Saowapha Chaipitak
January 2013
TABLE OF CONTENTS
Page
ABSTRACT iii
ACKNOWLEDGEMENTS v
TABLE OF CONTENTS vi
LIST OF TABLES viii
CHAPTER 1 INTRODUCTION 1
1.1 Background 1
1.2 Objectives of the Study 4
1.3 Scope of the Study 5
1.4 Usefulness of the Study 5
CHAPTER 2 LITERATURE REVIEW 6
2.1 Testing the Hypothesis for a Partially Known Matrix 6
2.2 Testing the Equality of Two Covariance Matrices for 14
Two Independent Populations
CHAPTER 3 THE PROPOSED TESTS 22
3.1 Testing the Hypothesis for a Partially Known Matrix 22
for One High-dimensional Data
3.2 Testing the Equality of Two Covariance Matrices 26
for Two High-dimensional Data
CHAPTER 4 SIMULATION STUDY 32
4.1 Simulation Study for Testing the Hypothesis for a 33
Partially Known Matrix for One High-dimensional
Data
4.2 Simulation Study for Testing the Equality of Two 42
Covariance Matrices for Two High-dimensional Data
4.3 Application 59
vii
CHAPTER 5 CONCLUSIONS, DISCUSSION AND RECOMMENDATIONS 60
FOR FUTURE RESEARCH
5.1 Conclusions 60
5.2 Discussion 62
5.3 Recommendations for Future Research 63
BIBLIOGRAPHY 64
APPENDICES 67
Appendix A Expected Values and Variances of the Estimators 68
Appendix B Proof of Theorem 3.1.2 80
Appendix C FORTRAN Syntax for One High-dimensional Data 88
Appendix D FORTRAN Syntax for Two High-dimensional Data 101
BIOGRAPHY 123
LIST OF TABLES
Tables Page
4.1 Covariance Matrix Structure Definition 32
4.2 Empirical Type I Error Rates of the Test Statistic 1T under 38
the Four Null Hypotheses applied at 05.0=α
4.3 Empirical Powers of Test Statistic 1T under the Four 39
Alternative Hypotheses applied at 05.0=α
4.4 Empirical Type I Error Rates (under IH =Σ:1*0 ) and 40
Empirical Powers (under F=Σ:1*1H ) of 1, SJ TU and 1T
applied at 05.0=α
4.5 Empirical Type I Error Rates (under IH 2:2*0 =Σ ) and 41
Empirical Powers (under DH 2:2*1 =Σ ) of 1, SJ TU and 1T
applied at 05.0=α
4.6 Empirical Type I Error Rates (under 10H ′ ) of ,,, 2 SYSJ TTT 51
and 2T and Empirical Powers (under 11H ′ ) of JT and 2T
applied at 05.0=α
4.7 Empirical Type I Error Rates (under 20H ′ ) of ,,, 2 SYSJ TTT 52
and 2T and Empirical Powers (under 21H ′ ) of ,,2 SYS TT and 2T
applied at 05.0=α
4.8 Empirical Type I Error Rate (under 30H ′ ) of ,,, 2 SYSJ TTT 53
and 2T and Empirical Powers (under 31H ′ ) of JT and 2T
applied at 05.0=α
ix
4.9 Empirical Type I Error Rates (under 40H ′ ) of ,,, 2 SYSJ TTT 54
and 2T and Empirical Powers (under 41H ′ ) of JT and 2T
applied at 05.0=α
4.10 Empirical Type I Error Rates (under 50H ′ ) of ,,, 2 SYSJ TTT 55
and 2T and Empirical Powers (under 51H ′ ) of JT and 2T
applied at 05.0=α
4.11 Empirical Type I Error Rates (under 60H ′ ) of ,,, 2 SYSJ TTT 56
and 2T and Empirical Powers (under 61H ′ ) of ,,, 2 SYSJ TTT
and 2T applied at 05.0=α
4.12 Empirical Type I Error Rates (under 70H ′ ) of ,,, 2 SYSJ TTT 57
and 2T and Empirical Powers (under 71H ′ ) of JT and 2T
applied at 05.0=α
4.13 Empirical Type I Error Rates (under 80H ′ ) of ,,, 2 SYSJ TTT 58
and 2T and Empirical Powers (under 81H ′ ) of JT and 2T
applied at 05.0=α
CHAPTER 1
INTRODUCTION
1.1 Background
In multivariate statistical analysis, the data consist of more than one variable
on a number of observations, .n Before applying further analyses, testing the
hypotheses concerning a population covariance matrix as to whether it is equal to a
particular matrix and whether two population covariance matrices are equal are very
important. Classical techniques for these two hypotheses are based on the likelihood
ratio criterion which is valid if and only if the sample size (or the number of
observations) is greater than its number of variables p (more details are provided in
the next section). In the present, there are many applications in modern science and
economics, e.g. the analysis of DNA microarrays. Here data typically have thousands
of gene expressions whereas these are obtained on a group of observations which
often numbers much less than 100 (Schott, 2007); for examples with data, see Dudoit,
Fridlyand and Speed (2002); and Ibrahim, Chen and Gray (2002). Data having the
number of variables larger than or equal to the sample size, i.e. np ≥ , are called
“high-dimensional data” (Srivastava, 2010; Fujikoshi, Ulyanov and Shimizu, 2010;
Fisher, Sun and Gallagher, 2010).
As aforementioned, the likelihood ratio criterion is not well defined when the
data fall into a high-dimensional situation. Moreover, it is based on the asymptotic
theory which is restricted to the case that the sample size goes towards infinity
whereas the number of variables is fixed, called the “classical approach”. Details are
provided in many multivariate statistics and mathematical statistical texts: see
Anderson (1984); Johnson and Wichern (2002); Casella and Berger (2002); Rao
(1973); Rohatgi (1984); Lehmann (1999); Lehmann and Romano (2005); and others.
2
A better approach to high-dimensional data sets is known as “(n, p)-
asymptotics”, “general asymptotics”, “concentration asymptotics” (Ledoit and Wolf,
2002), or “increasing dimension asymptotics” (Serdobolskii, 1999), where the
asymptotic theory framework of the test statistic is based upon both the sample size
and the number of variables approaching infinity. This approach is a generalization of
the classical technique (Fisher et al., 2010). Some examples of recent work on the
many problems of statistical inference using high-dimensional datasets include Birke
and Dette (2005); Samruam Chongcharoen (2011); Boonyarit Choopradit and
Samruam Chongcharoen (2011a, 2011b); Ledoit and Wolf (2002); Lin and Xiang
(2008); Fisher et al. (2010); Schott (2007); Srivastava (2005); Srivastava (2006);
Srivastava (2007a, 2007b); and Srivastava and Yanagihara (2010).
Two goals of this dissertation were to develop test statistics for the following
two hypotheses in high-dimensional data:
1) A hypothesis of testing for a covariance matrix equal to a specified known
matrix
2) A hypothesis of testing for the equality between two covariance matrices
Each hypothesis is presented in more detail in Sections 1.1.1 and 1.1.2 as follows:
1.1.1 Testing the Hypothesis for a Covariance Matrix Equal to a Specified
Known Matrix for One Population
Let nXX ,...,1 be a random sample drawn from a p-variate normal population
with mean vector μ and covariance matrix ,Σ denoted by ),,(~ ΣμX pj N for
,,...,1 nj = where both μ and Σ are unknown. The hypothesis for testing that the
population covariance matrix is equal to a specified known matrix, which is called
“a partially known matrix” from now on, can be written as
02
0 : Σ=Σ σH against 02
1 : Σ≠Σ σH , (1.1)
where 02 >σ is an unknown scalar and 0Σ a known positive definite matrix. The
likelihood ratio criterion that is very useful for handling problems where the number
of variables, ,p is less than the sample size, n , (Anderson, 1984: 429) is based on the
sample covariance matrix, ,S which is nonsingular. However, in the high-dimensional
3
case, when ,np ≥ the likelihood ratio criterion is not available because the sample
covariance matrix, ,S becomes singular. Hence, in this dissertation, the focus is on
developing a test statistic for testing hypothesis 0H above for high-dimensional data.
The work for this dissertation begins by exploring several tests introduced in
recent years for testing the sphericity of the covariance matrix, i.e. ,2pIσ=Σ where
pI denotes the pp× identity matrix, from high-dimensional data, such as Ledoit and
Wolf (2002); Birke and Dette (2005); Srivastava (2005); Srivastava (2006); and
Fisher et al. (2010). More details are provided in Section 2.1. A test statistic was then
developed based on the consistent estimators of the second moment of the sample
eigenvalues described in Section 3.1. Under the null hypothesis (1.1), the asymptotic
distribution of the proposed test statistic is standard normal as ),( np together
approach infinity. Its performance was assessed and compared to some recent tests
provided in the literature through a simulation study given in Section 4.1.
1.1.2 Testing the Hypothesis of the Equality of Two Covariance Matrices
for Two Independent Populations
The equality of two covariance matrices is essential in multivariate analysis.
For example, in discriminant analysis (see Fujikoshi et al., 2010: 256; and Srivastava,
2002: 252), different discriminant rules are given depending on whether the
population under consideration has equal covariance matrices or not. In classical
analysis, for ,pn > when testing the hypothesis regarding the equality of the two
mean vectors, if the covariance matrices are equal an exact test for testing this
hypothesis exists. If the two covariance matrices are not equal, only an approximate
test is available (Johnson, 1998: 420). For testing the equality of two mean sub-
vectors, this requirement is also addressed (Srivastava, 2002: 125; Gamage and
Mathew, 2008).
Now let ,2,1;,...,1; == knj kjkX be two random samples drawn from two
independent p-variate normal populations ),,( kkpN Σμ where kμ denotes an unknown
mean vector of the thk population and kΣ denotes an unknown positive definite
4
covariance matrix of the thk population. It is desirable to test the equality of these two
covariance matrices, i.e. testing the hypothesis that
Σ=Σ=Σ′ 210 :H against 211 : Σ≠Σ′H , (1.2)
where Σ denotes the common unknown positive definite covariance matrix of the two
populations under 0H ′ when the sample size from each population is less than or
equal to the number of variables, i.e. .2,1; =≤ kpnk
For pnk > , a classical way of dealing with this hypothesis is the likelihood
ratio test (Anderson, 1984) which is the case of the ratio of the determinants of the
two estimates of the covariance matrices and the determinant of the estimate of the
common covariance matrix under the null hypothesis. Barlett (1937) suggested the
modified likelihood ratio test by replacing the sample numbers appearing in the
likelihood test by the number of degrees of freedom of the sample covariance
matrices. However, these tests are valid if and only if the sample size from each
sample is greater than the number of variables.
In high-dimensional datasets, recent works, such as by Schott (2007);
Srivastava (2007b); and Srivastava and Yanagihara (2010) described in Section 2.2,
have been proposed under the general asymptotic theory framework.
At this point, this dissertation proposes a test statistic for testing the equality of
two covariance matrices for high-dimensional data. The proposed test statistic is
based on the consistent estimator of the ratio of the two second moments of the two
sample eigenvalues described in Section 3.2. Under the null hypothesis, as
,),( ∞→np the proposed statistic is asymptotically distributed as standard normal
when .2,1, =≥ knp k Note that the notation ∞→),( np means that both p and n go
towards infinity. This notation is used throughout this dissertation from here on.
Comparisons of the proposed test statistic to the existing tests given in the literature
are demonstrated using simulation provided in Section 4.2.
1.2 Objectives of the Study
The objectives of the dissertation are as follows:
5
1) To propose a test statistic for testing hypothesis 02
0 : Σ=Σ σH , where
02 >σ is an unknown scalar and 0Σ a known positive definite matrix for one sample
with high-dimensional data
2) To propose a test statistic for testing hypothesis Σ=Σ=Σ′ 210 :H for two
independent samples with high-dimensional data
3) To assess the performance of the proposed test statistics by considering
their Type I error rates and powers via a simulation study and comparing them to
those of the existing tests
1.3 Scope of the Study
The proposed test statistics for high-dimensional data were developed under
the following conditions:
1) The data are assumed to be identically and independently distributed as
multivariate normal with the number of variables being greater than or equal to the
sample size )( np ≥
2) The asymptotic distribution of the proposed test statistic was investigated
when ∞→),( np
1.4 Usefulness of the Study
The newly proposed tests could be beneficial for analyzing multivariate data
in statistical situations where the number of variables is larger than or equal to the
sample size, such as DNA microarray analysis, genetics, astronomy data, etc.
CHAPTER 2
LITERATURE REVIEW
This chapter begins with a review of the literature on testing the hypothesis
that the covariance matrix is a partially known matrix using the classical approach,
where the number of variables is less than the sample size, and the high-dimensional
approach, where the number of variables is larger than or equal to the sample size.
Testing the equality of two covariance matrices in the classical and high-dimensional
approaches is reviewed next.
2.1 Testing the Hypothesis for a Partially Known Matrix
As given in Chapter 1, the hypothesis for a partially known matrix is written
as
02
0 : Σ=Σ σH against 02
1 : Σ≠Σ σH .
Let nXX ,...,1 be a random sample drawn from a p-variate normal population
with unknown mean vector μ and unknown positive definite covariance matrix ,Σ
denoted by .,...,1),,(~ njN pj =ΣμX The variables made on a single observation are
regularly collected into a column vector, i.e. ,),,,( 21′= pjjjj xxx LX where j
represents the ,,...,1, njj th = observation from the random sample. The set of
variables on all observations in a sample set make up a matrix of observations, X ,
such that
( )
pnpnnn
p
p
n
n
xxx
xxxxxx
×⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
=⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
′
′′
=′=
LMMMM
LL
ML
21
22212
12111
2
1
21
X
XX
XXXX .
7
The −p dimensional population is assumed to have a −×1p mean vector μ and
a −× pp covariance matrixΣ , so that
⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
=Σ⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
=
pppp
p
p
p σσσ
σσσσσσ
μ
μμ
LMOMM
L
L
M
21
22212
11211
2
1
andμ ,
where .,...,1;,...,1),( njpixE iji ===μ The diagonal elements ,iiσ ])[( 2iijii xE μσ −=
are the variances of the random variables ,,...,1;,...,1, njpixij == and the off-
diagonal elements ,ilσ )])([( lljiijil μxμxE −−=σ are the covariances between the
random variables ijx and ,ljx for .,...,1;,...,2,1 njpli ==≠ The covariance matrix Σ
can be expressed using the matrix notation
⎥⎦⎤
⎢⎣⎡ ′−−= ))(( μXμXEΣ .
The probability density function for random vector jX from the p-variate normal
distribution is defined as
( ) ,)()(21exp2)( 12/12/
⎟⎠⎞
⎜⎝⎛ −Σ′−−Σ= −−− μxμxx jj
pjf π
where Σ denotes the determinant operation on the matrix Σ and 0≠Σ because Σ is
positive definite. The estimates of the mean vector μ and the covariance matrix Σ are
the sample mean vector X and sample covariance matrix S , typically defined as
,11∑=
=n
jjn
XX (2.1)
and
.))((1
11
1)(1∑=
× ′−−−
=−
==n
jjjppij nn
s XXXXAS (2.2)
2.1.1 The Classical Approach
When pn > , from Anderson (1984: 429), the appropriate test for testing the
hypothesis 02
0 : Σ=Σ σH is the likelihood ratio test (LRT), which is given by
8
,1
2
10
10
n
p
trp
L
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛Σ
Σ=
−
−
S
S (2.3)
where tr denotes the trace notation.
From (2.3), for testing hypothesis pIH 2*0 : σ=Σ , the LRT is given by
( )
2
1
1
/12
1
/11
pn
p
ii
p
i
pi
n
p
lp
l
trp
L⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛=
∑
∏
=
=
S
S, (2.4)
where ,,...,1, pili = are the eigenvalues of S and, from Anderson (1984: 432),
)()]()([)()log)1(( 32242
21
−+ +≤−≤+≤=≤−− nOzPzPzPzLnP fff χχωχρ ,
where O denotes Big-oh notation, 2fχ denotes a Chi-squared random variable with
f degrees of freedom, and
,1)1(21
−+= ppf
,)1(6221
2
−++
−=npppρ
222
23
2 )1(288)2362)(2)(1)(2(
ρω
−+++−−+
=np
pppppp .
Following this, the LRT in (2.4) was been shown to have a monotone power
function by Carter and Srivastava (1977).
John (1971) proposed the test statistic under the null hypothesis *0H ,
1])/1[(
)/1()/1(
12
22
−=⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛−=
SS
SS
trptrpI
trptrp
U . (2.5)
The test statistic U is consistent as ∞→n , while p is fixed. John (1971) showed
that U has the asymptotically locally most powerful invariant test for *0H as ∞→n .
9
2.1.2 The High-dimensional Approach
In this section, the considerable body of work completed on statistical testing
in high-dimensional data is built upon for testing the neighboring hypothesis, where
0Σ is restricted to an identity matrix for testing the hypothesis
pIH 2*0 : σ=Σ (sphericity) against .: 2*
1 pIH σ≠Σ
Various tests have been proposed, such as Ledoit and Wolf (2002); Birke and Dette
(2005); Srivastava (2005); Srivastava (2006); and Fisher et al. (2010).
The pioneering work of Ledoit and Wolf (2002) discussed the validity of
testing John’s U statistic above in a high-dimensional situation. Since the asymptotic
distribution of the U test statistic being studied assumes that ∞→n while p remains
fixed, it treats terms of order np / like terms of order ,/1 n which is inappropriate if
p is greater than .n After this, Ledoit and Wolf studied its consistency and
investigated the asymptotic distribution of the U test statistic using a new asymptotic
theoretical framework, such that ∞→),( np and ,/ cnp → for some finite
concentration c , where ( )+∞∈ ,0c . They showed that this test statistic is still
consistent when ∞→),( np and ).,0(/ +∞∈→ cnp Under the null hypothesis
,*0H they provided a test statistic based on John’s U statistic as
.2
1)1( −−−=
pUnUJ (2.6)
They showed that as ∞→),( np and ),,0(/ +∞∈→ cnp the test statistic JU is
asymptotically distributed as standard normal. It can be seen in their simulation study
that this test could control the Type I error rates under the null hypothesis when the
covariance matrix was set as the identity matrix, i.e. ,2Iσ=Σ with .12 =σ In addition,
the simulated power of the test statistic converged to one under the alternative
hypothesis when the covariance matrix was set as the diagonal matrix half of whose
elements equal 1 and other half 0.50.
Birke and Dette (2005) derived the asymptotic distribution of the test statistic
based on John’s U statistic in (2.5) using a new technique which is more applicable,
including the extreme cases of concentration 0=c and ,∞=c i.e. ].,0[/ ∞∈→ cnp
The test statistic under the null hypothesis *0H is given by
10
2
1)1( −−−=
pUnTB . (2.7)
As ∞→),( np and ],,0[/ ∞∈→ cnp the test statistic BT is asymptotically distributed
as standard normal. Note that the test statistic BT in (2.7) is exactly the same as the
test statistic JU in (2.6).
Ledoit and Wolf (2002) did likewise but only required a more general
condition, whereas Srivastava (2005) imposed the condition that ,10),( ≤<= ζζpOn
and under this condition he proposed a test based on the Cauchy-Schwarz inequality
of the eigenvalues of Σ such that
⎟⎟⎠
⎞⎜⎜⎝
⎛≤⎟⎟
⎠
⎞⎜⎜⎝
⎛ ∑∑==
p
i
ri
p
i
ri p
1
22
1λλ ,
where ,1=r and siλ are the thi eigenvalue of Σ . The equality holds if and only if
,...1 λ=λ==λ p for all pi ,...,1= , and a constant .λ
When ,1=r let
2
1
1
2
1
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛
=
∑
∑
=
=
p
ii
p
iip
λ
λψ .
He observed that 11 =ψ if and only if *0H holds. Thus the hypothesis can be
considered as 1: 1*0 =′ ψH against 1: 1
*1 >′ ψH . Note that
( ) 21
22
2
2
1
1
2
1 )/1()/1(
/
/
hh
trptrp
p
p
p
ii
p
ii
=ΣΣ
=
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛
=
∑
∑
=
=
λ
λψ ,
where .2,1,)/1( =Σ= mtrph mm
The test statistic based on the first and second moments of the eigenvalues of S under
the null hypothesis *0H is given by
,1ˆˆ
2)1(
21
21
⎥⎥⎦
⎤
⎢⎢⎣
⎡−
−=
hhnTS (2.8)
11
where
Strp
h 11 = (2.9)
and
( )( ) ( ) ⎥⎦⎤
⎢⎣⎡
−−
+−−
= 222
2 111
)12)1(ˆ SS tr
ntr
pnnnh . (2.10)
The random variables 1h and 2h are consistent estimators of Σtrp)/1( and
,)/1( 2Σtrp respectively. He showed that, under the null hypothesis ,*0H as
∞→),( np and under the condition ),( ζpOn = 10 ≤< ζ , the test statistic 1ST is
asymptotically distributed as standard normal. The asymptotic distribution of the
statistic under the alternative hypothesis was also given, but the simulation study for
evaluating the Type I error rates and the power of his test statistic 1ST were not reported
in this article.
Srivastava (2006) proposed an adapted version of the likelihood ratio test
when pn > to the case of pn ≤ simply by interchanging n and p. He let
,)1(288
]13)1(6)1(2)[1)(2(2
23
1 −−+−+−+−
=n
nnnnnnc
.2
1
,)1(6
1)1(2
2
1
2
1
−−=
−++−
−=
nng
nnnpm
The test is given by
,log 211 LmQ −=
where
1
1
1
12
11
−
=
−
=
⎟⎠
⎞⎜⎝
⎛−
=
∑
∏nn
ii
n
ii
ln
lL ,
where ,1,...,1, −= nili are the positive eigenvalues of S . This test is applicable under
the assumptions 0/ →pn and n is fixed. He provided the following result:
12
).()]()([)()( 31
224
211
21 111
mOzPzPmczPzQP ggg +≥−≥+≥=≥ +− χχχ
It can be found from his simulation study that this test can control the size (Type I
error rate) of the test when n is medium and p is large. However, this test is not
appropriate when n is large even though p is also large. Perhaps the reason is that the
asymptotic distribution of this test statistic is derived under the assumptions 0/ →pn
and n being fixed, while in practice n can vary together with .p
Motivated by the results of Srivastava (2005), and under similar condition to
Ledoit and Wolf (2002) of ,/ cnp → ),0( +∞∈c , Fisher et al. (2010) proposed an
alternative test based on the Cauchy-Schwarz inequality of the eigenvalues of Σ but
took a look at the case where 2=r , i.e.
⎟⎟⎠
⎞⎜⎜⎝
⎛≤⎟⎟
⎠
⎞⎜⎜⎝
⎛ ∑∑==
p
ii
p
ii p
1
42
1
2 λλ
Now let
2
1
2
1
4
2
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛
=
∑
∑
=
=
p
ii
p
iip
λ
λψ
In a similar fashion to Srivastava (2005), he considered that 12 =ψ if and only if
*0H holds. Thus the hypothesis can be considered as 1: 2
**0 =ψH against .1: 2
**1 >ψH
Note that
( ) ,)/1()/1(
/
/
22
422
4
2
1
2
1
4
2
1
2
1
4
2 hh
trptrp
p
pp
p
ii
p
ii
p
ii
p
ii
=Σ
Σ=
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛
=
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛
=
∑
∑
∑
∑
=
=
=
=
λ
λ
λ
λψ
where 4,...,1,)/1( =Σ= mtrph mm
With the constants
,1
4*
−−=n
b
,)2)(1()3(3)1(2
2
2*
+−−−+−
−=nnnnnc
13
,)2)(1(
)15(22
*
+−−+
=nnn
nd
)2()1(15
22*
+−−+
−=nnn
ne ,
and
,)4)(3)(2)(5)(3)(1(
)2()1( 25
−−−++++−−
=nnnnnnn
nnnτ
the test statistic based on the second and fourth moments of the sample eigenvalues
under the null hypothesis is given by
( ) ⎟⎟⎠
⎞⎜⎜⎝
⎛−
++
−= 1ˆ
ˆ
1288
122
*4
2 hh
cc
nTF , (2.11)
where c is estimated by 1−np , 2h is defined as in (2.10), and
[ ]4*22*22*3*4*4 )()()(ˆ SSSSSSS tretrtrdtrctrtrbtr
ph ⋅+⋅+⋅+⋅+=
τ , (2.12)
which is a consistent estimator of .1 4Σtrp
Under the null hypothesis ,*0H as ∞→),( np and ),,0(/ +∞∈→ cnp he
showed that the test statistic FT is asymptotically distributed as standard normal. The
asymptotic distribution of the statistic FT under the alternative hypothesis was also
provided. His simulation study showed that this test statistic performed well under
IH 2*0 : σ=Σ with 12 =σ (i.e. each )1=iλ .
For the next review, a near spherical matrix definition, as defined by Fisher et
al. (2010), is given in the form
⎟⎠⎞⎜
⎝⎛ ′Θ= I0
0B ,
where Θ is an rr × diagonal matrix, for ,pr < with all elements ,1≠iθ I is a
)()( rprp −×− identity matrix and 0 is a −− )( rp vector of zeros. The number r is
chosen to be small so that the near spherical matrix is the identity matrix with the
exception of a few elements. It can be found in Fisher et al. (2010) that under the near
spherical alternative hypothesis, this test statistic FT is more powerful than the test of
14
Srivastava (2005), as defined in (2.8), and is comparable to that of Ledoit and Wolf
(2002), as defined in (2.6).
2.2 Testing the Equality of Two Covariance Matrices for Two Independent
Populations
This section explores several test statistics for testing the hypothesis of equality
of two covariance matrices, defined as
Σ=Σ=Σ′ 210 :H against .: 211 Σ≠Σ′H
Let ,2,1;,...,1, == knj kjkX be random samples drawn from independently
normally distributed populations ),( kkpN Σμ where kμ denotes an unknown mean
vector of the thk population and kΣ denotes an unknown positive definite covariance
matrix of the thk population.
In this section, a set of two samples, one from each population, is used. The
variables made on a single observation from each sample are regularly collected into a
column vector, i.e. ′= ),,,( 21 pjkjkjkjk xxx LX , where j represents the thj observation
from the thk random sample, for .2,1;,...,1 == knj k The set of variables on all
observations in the thk sample set make up a matrix of observations, kX , such that
.2,1,
21
22212
12111
=⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
=
×
kxxx
xxxxxx
pnkpnknkn
kpkk
kpkk
k
kkkk
LMMMM
LL
X
The −p dimensional population is assumed to have −×1p mean vectors kμ
and −× pp covariance matrices kΣ so that
,2,1,and
21
22212
11211
2
1
=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
=Σ
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
= k
ppkpkpk
pkkk
pkkk
k
pk
k
k
k
σσσ
σσσσσσ
μ
μμ
L
MOMM
L
L
Mμ
15
where .2,1;,...,1),( === kpixE ijkikμ The diagonal elements ,iikσ
,)( 2ikijkiik xE μσ −= of the thk covariance matrix kΣ are the variances of the random
variable ,2,1;,...,2,1;,...,1, === knjpix kijk and the off-diagonal elements ,ilkσ
where )],)([( lkljkikijkilk μxμxE −−=σ are the covariances between the random
variables ,ijkx and .2,1;,...,1;,...,1, ===≠ knjplix kljk
The thk covariance matrix kΣ can be expressed using the matrix notation,
.2,1,))(( =⎥⎦⎤
⎢⎣⎡ ′−−=Σ kE kkkkk μXμX
The probability density function for random vector jkX from the thk p-variate
normal distribution is defined as
( ) ,2,1;,...,1,)()(21exp2)( 12/12/ ==⎟
⎠⎞
⎜⎝⎛ −Σ′−−Σ= −−− knjf kkjkikjki
pjk μxμxx π
where kΣ denotes the determinant operation on the matrix kΣ and 0≠Σk since kΣ
is a positive definite matrix.
Let
,2,1,11
== ∑=
kn
kn
jjk
kk XX (2.13)
( )( ) ,2,1,1
=′−−=∑=
kin
jkjkkjkk XXXXA (2.14)
,2,1,1
1=
−= kn kk
k AS (2.15)
,2,1,11 == ktr
ph kk S (2.16)
,2,1,)(1
1)1)(2(
)1(ˆ 222
2 =⎭⎬⎫
⎩⎨⎧
−−
+−−
= ktrn
trnnp
nh k
kk
kk
kk SS (2.17)
Suppose there are independent estimates ,, 21 SS the sample covariance
matrices of the covariance matrices 1Σ and ,2Σ respectively, with
,2,1),1,(~)1( =−Σ− knWn kkpkk S i.e. kkn S)1( − having a Wishart distribution with
16
kn degrees of freedom and covariance matrix .kΣ The common covariance matrix Σ
is estimated by the pooled sample covariance matrix
( ) ,1
11
1ˆ21 SAAA ≡+
−=
−=Σ
nn
where .121 −+= nnn Note that ),1,(~)1( −Σ− nWn pS i.e. S)1( −n has a Wishart
distribution with 1−n degrees of freedom and covariance matrix .Σ Therefore,
,11 Str
ph = (2.18)
.)(1
1)1)(2(
)1(ˆ 222
2 ⎭⎬⎫
⎩⎨⎧
−−
+−−
= SS trn
trnnp
nh (2.19)
2.2.1 The Classical Approach
For ,pn > to test Σ=Σ=Σ′ 210 :H against ,: 211 Σ≠Σ′H the likelihood ratio
criterion (see Anderson, 1984: 406; and Srivastava, 2002: 490), is given by
.)()2(
)1()1(2/
22/
1
2/)(21
2/)(21
2/22
2/11
3 21
21
21
21p
nn
nn
nn
nn
nnnn
nn
nnL ⎟
⎟⎠
⎞⎜⎜⎝
⎛ +⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
−+
−−=
+
+S
SS
This test is not unbiased unless the degrees of freedom associated with kS , kn , are
changed to 1−kn .
The modified likelihood ratio test suggested by Bartlett (1937) on intuitive
grounds and quoted in Anderson (1984: 406) is based on the statistic
.2/)1(
2/)1(2
2/)1(1
4
21
−
−−
= n
nn
LS
SS
This modified likelihood ratio test is valid only if knp < , for .2,1=k In particular, if
p is fixed then the asymptotic null distribution of this criterion ,log2 4L− as ∞→kn ,
for ,2,1=k is Chi-squared with 2/)1( +pp degrees of freedom. Pearlman (1980)
showed that the test based on 4L is unbiased.
An alternative test based on the Wald statistic quoted in Schott (2007) is
.)()1(
)1)(1()(
1)1(
2)1( 2
1
2
1
112
2
1
11
⎭⎬⎫
⎩⎨⎧
−−−
−−−−
= ∑∑∑= =
−−
=
−−
k llk
lk
kkk
k trnnn
trnnnW SSSSSSSS
17
This test statistic has the same asymptotic null distribution as 4L and is valid as long
as S is nonsingular, i.e. as long as np < (Schott, 2007).
2.2.2 The High-dimensional Approach
Schott (2007) proposed a test for testing the equality of several covariance
matrices. Since the equality of a two covariance matrix test is considered here, then
the test statistic of Schott (2007) based on a consistent estimator of the square of the
Frobenius norm of ,21 Σ−Σ i.e. ( ) ,221 Σ−Σtr is given by
( ) ,2ˆˆˆ)1(2
)1)(1(212221
2
21⎟⎟⎠
⎞⎜⎜⎝
⎛−+
−−−
= SStrp
hhhnnnTJ (2.20)
where ,2,1,ˆ2 =kh k is as defined in (2.17), and 2h as defined in (2.19). Under the null
hypothesis, he validated that the test statistic JT is distributed as standard normal as
,),,( 21 ∞→nnp and .2,1),,0(/ =∞∈→ kcnp kk
In his simulation study, the asymptotic normal distribution of this test statistic
was evaluated under the null hypothesis of Σ=Σ=Σ′ 210 :H after setting the common
unknown positive definite covariance matrix Σ as two different matrices. First, Σ
was set as the identity matrix, and second, as a block diagonal matrix with each block
matrix given by ,151.05.0 444 ′+I where 41 denotes the 14× vector with each of its
elements equal to 1. Two p-variate normal samples with equal sample sizes were
constructed.
First, the results obtained using the common covariance matrix Σ as the
identity matrix reported in his article showed that the empirical Type I error rates of
the test statistic JT were not close to the nominal significance level when the sample
size was small and they converged to the nominal significance level when p and the
sample size increased. He also found, but without tabulating the results, that the
empirical Type I error rates when the sample sizes were equal, i.e., ,21 nn = were not
substantially different from those when 2/12 nn = .
For the second setting of Σ as mentioned above, the empirical Type I error
rates were generally not close to the nominal level for small values of p and sample
18
size. When p and the sample size increased, the empirical Type I error rates seemed
to improve. However, this test statistic yielded empirical Type I error rates which
were not close to the nominal significance level when p and the sample size were very
close; the empirical Type I error rates were generally much higher than the nominal
significance level. Furthermore, he mentioned that the convergence to a standard
normal distribution of this test statistic is somewhat slower when pI≠Σ , which can
be clearly observed in that article.
To estimate the power of this test, a simulation under the alternative
hypothesis with I=Σ1 while 2Σ had a block-diagonal structure with each block
matrix given by )2,1,1,...,2,1,1,1(diag was carried out. The empirical powers converged
to one as both p and the sample size increased and converged at a slower rate for
small values of .p Moreover, from his results, it was surprising that when sample
sizes were fixed and not large enough the empirical powers decreased
when p increased.
Srivastava (2007a) proposed a test based on the statistic
,ˆ))1(])1{[(ˆ
2
22112
1
hnntrhp
GSS −−
=+
where +1S denotes the Moore-Penrose inverse of .1S He noted that when the null
hypothesis, Σ=Σ=Σ′ 210 :H , is true, ).1,(~)1( −Σ− nWn pS For a fixed 1n and
,2n and under the null hypothesis,
.~lim 2)1)(1( 21 −−∞→ nnp
G χ
It was noted by Srivastava and Yanagihara (2010) that this test did not perform well.
It was quoted in Srivastava and Yanagihara (2010) that Srivastava (2007b)
proposed a test based on a consistent estimator of 22
21 Σ−Σ trtr . The test statistic is
given by
,ˆˆ
ˆˆ22
21
22212
ηη +
−=
hhTS (2.21)
19
where ,2,1,ˆ2 =kh k are as defined in (2.17), and are consistent estimators of
,2,1,1 22 =Σ= ktr
ph kk respectively. The statistic 2ˆkη is a consistent estimator of 2
kη
where
,2,1,)1(2
1)1(
422
4222
2 =⎟⎟⎠
⎞⎜⎜⎝
⎛ −+
−= k
phhn
hn
k
kkη
,2,1,ˆˆ)1(2
1ˆ)1(
4ˆ22
4222
2 =⎟⎟⎠
⎞⎜⎜⎝
⎛ −+
−= k
hphn
hn
k
kkη
2h as defined in (2.19) and
.ˆˆˆˆˆ11ˆ 41
32232
212
211
4
04 ⎟⎟
⎠
⎞⎜⎜⎝
⎛−−−−= hnphpchhcphpctr
pch A (2.22)
The constants ,,, 210 ccc and 3c are defined as
],18)1(21)1(6)1)[(1( 230 +−+−+−−= nnnnc
],9)1(6)1(2)[1(2 21 +−+−−= nnnc
],2)1(3)[1(22 +−−= nnc and
].7)1(5)1(2)[1( 23 +−+−−= nnnc
Under the null hypothesis, he showed that the test statistic 2ST is asymptotically
distributed as standard normal as .),( ∞→np
A numerical simulation study was carried out and the results shown in
Srivastava and Yanagihara (2010). In their simulation, they let ),,...,( 1 pdddiagD =
where ),5,1(~,...,...
1 Udddii
p and )2,1( =Δ jj is a pp× matrix whose thba ),( element are
defined by 10/1
)}2(2.0{)1( baba j −+ +×− . The asymptotic normality of this test statistic
was assessed under the null hypothesis, .: 1210 DDH Δ=Σ=Σ′ Two p-variate normal
samples were simulated with equal size. Under this setting, the empirical Type I error
rates of this test statistic were not very close to the nominal significance level when
the sample size and p were not large enough. However, it tended towards the nominal
significance level when p and the sample size increased.
20
It was shown in this article that the empirical Type I error rates of the test
statistic JT , as defined in (2.20) and proposed by Schott (2007), performed quite
badly under this setup. The empirical Type I error rates were substantially greater than
the nominal significance level; they were at least 0.071 for all cases of p and sample
size considered. Moreover, the empirical Type I error rates did not converge to the
nominal significance level as p and the sample size increased. The power of this test
statistic 2ST was measured under the alternative hypothesis, 211 : Σ≠Σ′H , where
DD 11 Δ=Σ while .22 DDΔ=Σ The empirical power of test statistic 2ST tended
towards one more slowly than those of JT as p and the sample size increased.
Srivastava and Yanagihara (2010) proposed an alternative test relying on
a consistent estimator of difference
,)()( 2
2
22
21
21
21 Σ
Σ−
Σ
Σ=−
trtr
trtr
γγ
where .2,1,)( 2
2
=ΣΣ
= ktrtr
k
kkγ Under the null hypothesis ,: 210 Σ=Σ=Σ′H .021 =−γγ
Thus they noted that the hypothesis is equivalent to the following:
0: 210 =−′ γγH against .0: 211 ≠−′ γγH
Consistent estimators of kγ are given by ,ˆkγ where ,2,1,ˆˆ
ˆ2
1
2 == khh
k
kkγ the random
variables ,1kh and ,2,1,ˆ2 =kh k are as defined in (2.16) and (2.17), respectively. The
test statistic is given by
,ˆˆ
ˆˆ22
21
21
ξξ
γγ
+
−=SYT (2.23)
where
,2,1,ˆˆ
ˆˆˆ2
ˆˆ)1(2
ˆˆ
)1(4ˆ
41
451
3261
32
41
22
22 =
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎟⎟⎠
⎞⎜⎜⎝
⎛+−
−+
−= k
hh
hhh
hh
pn
hh
nk
kkξ
where ,ˆ,ˆ21 hh and 4h are as defined in (2.18), (2.19), and (2.22), respectively, and
21
.ˆ)1(ˆˆ)1(3))1((1]4)1(3)1)[(1(
1ˆ 31
212
323 ⎟⎟
⎠
⎞⎜⎜⎝
⎛−−−−−
+−+−−= hpnhhnpnntr
pnnnh S
(2.24) Under the null hypothesis, the test statistic SYT is asymptotically normally distributed
as standard normal as .),( ∞→np
A simulation study was conducted under the null hypothesis,
.: 1210 DDH Δ=Σ=Σ′ The empirical Type I error rates of this test statistic SYT were
poor when the sample size was small and p was not large. The empirical Type I error
rates converged to one as p and the sample size increased. It was noted and can be
observed from Srivastava and Yanagihara (2010) that the convergence of this test
statistic SYT to a standard normal distribution is slower than that of the test statistic
2ST in (2.21), as proposed by Srivastava (2007b). The reason was addressed in
Srivastava and Yanagihara (2010), and is that an estimation of 2kη standardizing 2ST is
easier than that of 2kξ standardizing SYT because 2
kξ depends on more terms than 2kη .
Under the alternative hypothesis, 211 : Σ≠Σ′H , where DD 11 Δ=Σ while DD 22 Δ=Σ ,
the empirical power of this test SYT tended to one faster than those of the two test
statistics JT and 2ST .
CHAPTER 3
THE PROPOSED TESTS
This chapter separately presents the methods of developing the two proposed
tests and their asymptotic distributions in two sections. The proposed test for testing
the hypothesis for a partially known matrix for one high-dimensional data is
introduced first. After that, the proposed test for testing the equality of two covariance
matrices for two high-dimensional data is also explained.
3.1 Testing the Hypothesis for a Partially Known Matrix for One High-
dimensional Data
For testing the hypothesis that
02
0 : Σ=Σ σH against ,: 02
1 Σ≠Σ σH
where 2σ is unknown and 0Σ is a known positive definite matrix, the test statistic is
built by considering a measure of distances between the two matrices, namely the
square of the Frobenius norm,
,)(2)(1)(1 410
221
0221
0 σσσψ +ΣΣ−ΣΣ=−ΣΣ= −−− trp
trp
Itrp
(3.1)
where tr denotes the trace notation. The measurement 0=ψ if and only if the null
hypothesis holds. Thus, testing hypothesis 0:0 =ψH against 0:1 >ψH can be
considered.
The following assumptions are made:
(A1) As ),1[,,),( ∞∈→∞→ ccnpnp
(A2) As ,8,...,1),,0(,, =∞∈→∞→ map mmm αα
23
[ ]( ) .,as0
)1(21
)ˆ(1ˆ
22
1211
∞→→−
=
≤>−
nppn
aaVaraaP
ε
εε
where ∑=
− =ΣΣ=p
i
m
i
imm dp
trp
a1
10 )(1)(1 λ
(see Appendix A, Section A.1, page 68)
The iλ ’s are the eigenvalues of the covariance matrixΣ and id ’s are the eigenvalues
of a given positive definite matrix .0Σ
From (3.1), to find the estimator of the measurement ,ψ 1a and 2a need to be
estimated where their estimators are consistent for large p and n , as presented in the
next theorem.
Theorem 3.1.1 Let
)(1ˆ 101 S−Σ= tr
pa (3.2)
and
( ) ⎥⎦⎤
⎢⎣⎡ Σ
−−Σ
+−−
= −− 210
210
2
2 )(1
1)(1)1)(2(
)1(ˆ SS trn
trpnn
na , (3.3)
then
(i) 1a is an unbiased and consistent estimator of ),(1 101 ΣΣ= −tr
pa and
(ii) 2a is an unbiased and consistent estimator of .)(1 2102 ΣΣ= −tr
pa
Proof As shown in Appendix A (Section A.2, page 69-70), two statistics are
obtained:
),(1ˆ 101 S−Σ= tr
pa and ( ) ⎥⎦
⎤⎢⎣⎡ Σ
−−Σ
+−−
= −− 210
210
2
2 )(1
1)(1)1)(2(
)1(ˆ SS trn
trpnn
na .
(i) Using Lemma A.2 in Appendix A (page 71), it can be shown that
( ) ,)(1ˆ 11
01 atrp
aE =ΣΣ= − (3.4)
so 1a is an unbiased estimator of 1a . Using the variance of 1a in Lemma A.3 in
Appendix A (page 72), and by applying the Chebyshev’s inequality, for any ,0>ε
we obtain
(3.5)
24
[ ]( ) .,as0481
)ˆ(1ˆ
22242
2222
∞→→⎟⎟⎠
⎞⎜⎜⎝
⎛+≈
≤>−
npan
anp
aVaraaP
ε
εε
Thus 1a is a consistent estimator of 1a .
From equations (3.4) and (3.5), it can be concluded that 1a is an unbiased and
consistent estimator of 1a .
(ii) By using Lemma A.2 in Appendix A (page 71), we obtain
( ) ,)(1ˆ 221
02 atrp
aE =ΣΣ= − (3.6)
so 2a is an unbiased estimator of .2a With a similar proof to (i) and using the variance
of 2a in Lemma A.7 in Appendix A (page 75), we get
(3.7)
Thus 2a is a consistent estimator of 2a . Hence, from equations (3.6) and (3.7), it can
be concluded that 2a is an unbiased and consistent estimator of .2a
The proof is completed.
Thus, from Theorem 3.1.1, an unbiased and consistent estimator of ψ in (3.1)
is defined as
.ˆ2ˆˆ 41
22 σσψ +−= aa (3.8)
To find the distribution of ψ , the distributions of 1a and 2a need to be found,
as shown in the next theorem.
Theorem 3.1.2 Under the assumptions (A1) and (A2), as ,),( ∞→np
,484
42
,ˆˆ
2224
3
32
2
12
2
1
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
+⎟⎟⎠
⎞⎜⎜⎝
⎛⎯→⎯⎟⎟
⎠
⎞⎜⎜⎝
⎛
an
anpnp
anpa
npa
aa
Naa D
where yx D⎯→⎯ denotes x converging in distribution to .y
Proof (See the proof in Appendix B (page 80)).
25
The following theorem and corollary provide for an asymptotic distribution of
the estimator ψ under the alternative and null hypotheses by applying the delta
method to a function of two random variables. Next, the lemma taken from Lehmann
and Romano (2005: 436) shows the delta method used to prove the asymptotic
normality of the estimator ψ .
Lemma 3.1.1 (The delta method)
Suppose n21 ,...,, XXX are random vectors in the ℜk Euclidean space and
assume that ),()( Σ⎯→⎯− 0μX kD
nn Nτ , where μ is a constant vector and { }nτ is a
sequence of constants .∞→nτ In addition, presume that (.)g is a function from ℜk
to ℜ which is differentiable at μ with a gradient (vector of first partial derivatives) of
dimension k×1 at μ equal to ),(μg′ then
( )[ ] ( ).)()(,0)( TDnn ggNgg μμμX ′Σ′⎯→⎯−τ
Proof (See proof in Lehmann and Romano (2005: 436)).
Theorem 3.1.3 Under the assumptions (A1) and (A2), as ( ) ,, ∞→np
),,0(ˆ 2βψψ ND⎯→⎯− (3.9)
with .2424 2
243
22
4
22
⎟⎟⎠
⎞⎜⎜⎝
⎛+
+−= a
pnaanan
nσσ
β
Proof Let ,2),( 41
2221 σσ +−= aaaag then the proposed test statistic is
.ˆ2ˆ)ˆ,ˆ(ˆ 41
2221 σσψ +−== aaaag
The first partial derivatives of ),( 21 aag with respect to 1a and 2a are respectively
given by
2
1
2σ−=⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂ag and .1
2
=⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂ag
Thus, by applying the delta method,
),,0(ˆ 2βψψ ND⎯→⎯−
26
where
( )
.2424
12
484
42
12
22
432
24
2
2
2
2243
32
22
⎟⎟⎠
⎞⎜⎜⎝
⎛+
+−=
⎟⎟⎠
⎞⎜⎜⎝
⎛−
⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
+−=
ap
naanann
na
npa
npa
npa
npa
σσ
σσβ
The proof is completed.
Corollary 3.1.1 Under the null hypothesis 02
0 : Σ=Σ σH , ,0=ψ and under the
assumptions (A1) and (A2), as ,),( ∞→np
).1,0(ˆ2 41 NnT D⎯→⎯= ψσ
(3.10)
Proof Under ,0H ,, 63
42 σσ == aa and ,8
4 σ=a then .42
82
nσβ = It follows from
the previous theorem, so therefore the proof is completed.
3.2 Testing the Equality of Two Covariance Matrices for Two High-
dimensional Data
In this section, it is desirable to test the hypothesis
Σ=Σ=Σ′ 210 :H against ,: 211 Σ≠Σ′H
where Σ denotes the common unknown covariance matrix of the two populations
when .2,1, =≥ knp k
Recall from Chapter 2 that the common covariance matrix Σ is estimated by
the pooled sample covariance matrix
,)(1
11
1ˆ21 SAAA ≡+
−=
−=Σ
nn
where 121 −+= nnn and .2,1,))((1
=′−−=∑=
kkn
jkjkkjkk XXXXA
27
A test statistic for 0H ′ is proposed in this work based on the fact that if the null
hypothesis 0H ′ holds, i.e. 21 Σ=Σ , then .22
21 Σ=Σ trtr Thus, under the null hypothesis,
we obtain the quantity
.122
2122
21 ==
Σ
Σ=
hh
trtr
b
Therefore, the hypothesis 1:0 =′ bH can be tested against .1:1 ≠′ bH This is a two-
sided test.
The following assumptions are imposed:
(B1) As ),0(,,),( ∞∈→∞→ ccnpnp
(B2) As ),,1[,,),( ∞∈→∞→ kkk
k ccnpnp 2,1=k
(B3) As ),,0(,, ∞∈→∞→ mmmhp αα 16,...,1=m
(B4) As ),,0(,, ∞∈→∞→ lklklkhp αα ,8,...,1;2,1 == lk
where ,1 mm tr
ph Σ= and l
klk trp
h Σ=1
In order to estimate the quantity ,b the following two lemmas extended from
some of the results of Srivastava (2005) for one population to the case of two
populations (which are presented without proof) are obtained.
Lemma 3.2.1 Let ),1,(~)1( −Σ− kkpkk nWn S and ,4,...,1;2,1,1==Σ= lktr
ph l
klk then,
under the assumptions (B2) and (B4), unbiased and consistent estimators of kh2 , as
∞→),( knp , are given by kh2ˆ , as defined in (2.17) in Chapter 2,
i.e.
.2,1,)(1
1)1)(2(
)1(ˆ 222
2 =⎭⎬⎫
⎩⎨⎧
−−
+−−
= ktrn
trnnp
nh k
kk
kk
ik SS
28
Lemma 3.2.2 Let ),1,(~)1( −Σ− kkpkk nWn S ,2,1,ˆ2 =kh k as defined in (2.17) in
Chapter 2, and ,4,...,1;2,1,1==Σ= lktr
ph l
klk then, under the assumptions (B2) and
(B4),
( ),)ˆ(lim 22
),(xx
hhP
k
kk
np k
Φ=⎥⎥⎦
⎤
⎢⎢⎣
⎡≤
−∞→ η
where ( )xΦ denotes the cumulative distribution function of a standard normal
random variable and .1
2)1(
4 22
42
⎟⎟⎠
⎞⎜⎜⎝
⎛
−+
−=
k
kk
kk n
hh
pnη
Using Lemma 3.2.1, the consistent estimator of b can be estimated by
.)(
11
)(1
1
)1)(2()1()1)(2()1(
ˆˆ
ˆ2
22
22
21
1
21
112
2
222
1
22
21
⎭⎬⎫
⎩⎨⎧
−−
⎭⎬⎫
⎩⎨⎧
−−
+−−+−−
==SS
SS
trn
tr
trn
tr
nnnnnn
hh
b
The following lemma gives the asymptotic distribution of the consistent
estimators 21h and 22h .
Lemma 3.2.3 Let ),1,(~)1( −Σ− kkpkk nWn S ,2,1,ˆ2 =kh k as defined in (2.17), and
,4,...,1;2,1,1==Σ= lktr
ph l
klk then, under the assumptions (B2) and (B4),
.
12
)1(40
01
2)1(
4
,ˆ
ˆ
2
222
422
1
221
411
22
212
22
21
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛−
+−
⎟⎟⎠
⎞⎜⎜⎝
⎛−
+−
⎟⎟⎠
⎞⎜⎜⎝
⎛⎯→⎯
⎟⎟⎠
⎞⎜⎜⎝
⎛
nhh
pn
nhh
pnhh
Nh
h D
Proof Since random samples 2,1,,...,1, == knj kjkX are drawn from two
independent populations and the sample covariance matrices ,2,1, =kkS are
calculated from corresponding independent random samples 1jX and ,2jX then 1S and
2S must be independent of each other. In fact, the statistic 21h is a function of 1S alone,
29
whereas the statistic 22h is also a function of 2S alone. Thus 21h and 22h are also
independent which results in .0)ˆ,ˆ( 2221 =hhCOV
By applying Lemma 3.2.2, 2,1,ˆ2 =kh k are asymptotically normally distributed with
mean kh2 and variance ,2kη and from the fact that the covariance between 21h and
22h is zero, it follows that the joint asymptotic distribution of estimators 21h and 22h is
bi-variate normally distributed with a mean vector and covariance matrix as given
above. The proof is completed.
From Lemma 3.2.3, the statistic b is a ratio of two uncorrelated estimators
21h and 22h . By applying the delta method as given in Lemma 3.1.1, it ensures that a
function of two random variables can be approximated as a normal distribution. The
following theorem gives the asymptotic normality of the statistic .b
Theorem 3.2.1 Let ,b and b be as defined above, then, under the assumptions (B1)-
(B4),
),,(ˆ 2δbNb D⎯→⎯
where
.1
2)1(1
21
14
2
222
422222
221
1
221
411
222
2
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎟⎟⎠
⎞⎜⎜⎝
⎛
−+
−+⎟
⎟⎠
⎞⎜⎜⎝
⎛
−+
−=
nph
hhn
hnph
hnph
δ
Proof. Let ,),(22
212221 h
hhhg = then .ˆˆ
)ˆ,ˆ(ˆ22
212221 h
hhhgb ==
The first partial derivatives of ),( 2221 hhg with respect to 21h and 22h are respectively
given by 2221
2221 1),(hh
hhg=⎟⎟
⎠
⎞⎜⎜⎝
⎛∂
∂ and .
),(222
21
22
2221
hh
hhhg
−=⎟⎟⎠
⎞⎜⎜⎝
⎛∂
∂
Thus, by applying the delta method, ),(ˆ 2δbNb D⎯→⎯ with
30
.1
2)1(1
21
141
24)1(1
24)1(
1
1
12
)1(40
01
2)1(
41
2
222
422222
221
1
221
411
222
2
222
424222
221
1
221
412221
222
21
22
2
222
422
1
221
411
222
21
22
2
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎟⎟⎠
⎞⎜⎜⎝
⎛
−+
−+⎟
⎟⎠
⎞⎜⎜⎝
⎛
−+
−=
⎟⎟⎠
⎞⎜⎜⎝
⎛
−+
−+⎟
⎟⎠
⎞⎜⎜⎝
⎛
−+
−=
⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
−⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛
−+
−
⎟⎟⎠
⎞⎜⎜⎝
⎛
−+
−⎟⎟⎠
⎞⎜⎜⎝
⎛−=
nph
hhn
hnph
hnph
nph
hphn
hnph
hphn
hh
h
nph
hpn
nph
hpn
hh
hδ
The proof is completed.
Corollary 3.2.1 Let b be as defined above. Under Σ=Σ=Σ′ 210 :H and the
assumptions (B1)-(B4), it follows that
( ).1,0
)1(11212
1ˆ
21
2
12
2
122
4
* N
np
nhh
p
bT D
k kk k
⎯→⎯
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎥⎥⎦
⎤
⎢⎢⎣
⎡
−+
−
−=
∑∑==
Proof. Under ,0H ′ ,22221 hhh == and .44241 hhh ==
Thus ,)1(1
124 2
12
2
122
42 0
⎭⎬⎫
⎩⎨⎧
−+
−= ∑∑
==
′
k kk k
H
np
nhh
pδ which follows from Theorem 3.2.1, then
the proof is completed.
In order to use *T in practice, it is necessary to estimate 2δ involving the
estimates of 2h and .4h The following lemma states a consistent estimator of 4h taken
from Fisher et al. (2010), which is also presented without proof.
Lemma 3.2.4 Let ),1,(~)1( −Σ− nWn pS and ,16,...,1,1=Σ= ktr
ph k
k then, under the
assumptions (B1) and (B3), an unbiased and consistent estimator of 4h as
∞→),( np is given by *4h , which was defined in (2.12) in Chapter 2,
[ ],)()()(ˆ 4*22*22*3*4*4 SSSSSSS tretrtrdtrctrtrbtr
ph ++++=
τ
where
31
,1
4*
−−=
nb
,)2)(1()3(3)1(2
2
2*
+−−−+−
−=nnnnnc
,)2)(1(
)15(22
*
+−−+
=nnn
nd
)2()1(15
22*
+−−+
−=nnn
ne ,
and
.)4)(3)(2)(5)(3)(1(
)2()1( 25
−−−++++−−
=nnnnnnn
nnnτ
Using Lemmas 3.2.1 and 3.2.4, consistent estimators of 2h and 4h are given
by 2h and *4h , respectively. By substituting 2h and *
4h , we obtain a corresponding
consistent estimator of ,2δ namely 2δ , as
.)1(
11
1ˆˆ2
4)1(1
1ˆˆ24ˆ
2
12
2
122
*4
2
12
2
122
*42
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
−+
−=
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
−+
−= ∑∑∑∑
==== k kk kk kk k nnhph
np
nhh
pδ
Thus a test of 0H ′ can be based on the statistic
δ
1ˆ2
−=
bT . (3.11)
In addition, its asymptotic null distribution is standard normal. The proposed test
statistic 2T with an α level of significance rejects 0H if 2/2 || αzT > , where 2/αz
denotes the upper 2/α quantile of the standard normal distribution.
CHAPTER 4
SIMULATION STUDY
This section explains a Monte Carlo method carried out using the Fortran
programming language (FORTRAN) to investigate the performance of the two
proposed tests 1T and ,2T defined in (3.10) and (3.11), respectively. Multivariate
normal vectors were generated using the international mathematics and statistics
library (IMSL) with the multivariate normal random number generator (RNMVN)
subroutine. As will be seen, the performances of the tests were assessed in two
aspects: (1) empirical Type I error rate (under the null hypothesis) and (2) empirical
power (under the alternative hypothesis). Both the empirical Type I error rates and the
empirical powers of the two proposed tests 1T and 2T , as well as those of competitive
tests, were computed under a variety of covariance matrix structures. Some types of
covariance matrix structure used in this study are given in Table 4.1.
Table 4.1 Covariance Matrix Structure Definition
Type of
Covariance Matrix
Structure
Definition
1. Unstructured (UN) ppij ×=Σ )(σ
2. Compound Symmetry (CS) ,112ppp kI ′+=Σ σ where ,02 >σ k is an
appropriate constant, pI denotes the pp× identity
matrix, and p1 denotes the 1×p vector of ones
33
Table 4.1 (Continued)
Type of
Covariance Matrix
Structure
Definition
3. Heterogeneous compound
symmetry (CSH) ppij ×=Σ )(σ where ;,02 jiiij =>=σσ
,, jijiij ≠= ρσσσ where ρ is the correlation
parameter satisfying 1<ρ
4. Simple (SIM) pI2σ=Σ
5. Toeplitz (TOEP). ppjippij ×−× ==Σ )()( ||σσ
6. Variance Component (VC) ),...,,( 2211 ppdiag σσσ=Σ
where piii ,...,1,0 =>σ
4.1 Simulation Study for Testing the Hypothesis for a Partially Known
Matrix for One High-dimensional Data
4.1.1 Simulation Setting
This section deals with a simulation to assess the asymptotic normality of the
proposed test statistic 1T . Under the null hypothesis ,: 02
0 Σ=Σ σH by setting
,12 =σ for each ),( np combination, where }320,160,80,40,20,10{∈p and
},160,80,40,20,10{∈n multivariate normal vectors were generated with 10,000
independent iterations. The test statistic 1T was calculated and then the empirical
Type I error rate of the proposed test was obtained by recording the proportion of
rejections of the test statistic. The nominal significance level )(α of interest was fixed
at 0.05. The empirical Type I error rates under the null hypothesis and empirical
powers under the corresponding alternative hypothesis were computed for four
different hypotheses with different covariance matrix structures:
34
1) Unstructured Structure (UN)
⎪⎩
⎪⎨⎧
<−
=====Σ +×× jifor
ji
jiforH jippjippij
2)1(
1)()(: 0
10 σσU
⎪⎩
⎪⎨⎧
<−
=====Σ +×× .
4)1(
1)()(: 1
11 jifor
ji
jiforH jippjippij σσU
2) Compound Symmetry Structure (CS)
pppIH 11)5.0(5.0: 020 ′+==Σ C
pppIH 11)1.0(9.0: 12
1 ′+=Σ =C
3) Heterogeneous compound symmetry Structure (CSH)
030 : M=ΣH
131 : M=ΣH
where
0M is a matrix having CSH with ,;)3,2(~2 ρσσσσσ jiijiij jiU === when
,,5.0 ji ≠=ρ and
1M is a matrix having CSH with ,;)4,3(~2 ρσσσσσ jiijiij jiU === when
.,5.0 ji ≠=ρ
4) Toeplitz Structure (TOEP)
040 : T=ΣH
14
1 : T=ΣH
where 0T is a Toeplitz matrix with elements 50.0,1 10 −== σσ with the rest of the
elements equal to zero, and 1T is a Toeplitz matrix with elements 45.0,1 10 −== σσ
with the rest of the elements equal to zero.
The results of the empirical Type I error rates are shown in Table 4.2 and
the empirical powers are tabulated in Table 4.3.
To compare the performance of the proposed test statistic 1T to the test
statistics defined in Ledoit and Wolf (2002), denoted JU as in (2.6) in Chapter 2, and
35
Srivastava (2005), denoted 1ST as in (2.8) in Chapter 2, attention to the null hypothesis
was restricted using a simple structure (SIM), i.e. IH 2*0 : σ=Σ (sphericity). The
following two hypotheses were considered under simple and variance component
structures:
1) Simple (SIM) and Variance Component Structure (VC) (as in Ledoit and Wolf,
2002)
pIH =Σ:1*0 (SIM)
F=Σ:1*1H (VC)
where F is a matrix in VC where half of the elements are equal to 1 and the other half
equal to 0.05.
2) Variance Component structure (VC)
pIH 2:2*0 =Σ (SIM)
DH 2:2*1 =Σ (VC)
where );,...,( 1 pdddiagD = .,...,2,1);1,0(~ piUnifdi =
4.1.2 Simulation Results
At this point, according to the four null hypotheses described above, the
empirical Type I error rates are correspondingly exhibited in Table 4.2. It can be
observed that for all four null hypotheses (under the UN, CS, CSH, and TOEP
structures) the proposed test statistic 1T yielded satisfying empirical Type I error rates
in the same pattern. This means that the convergence to asymptotic normality of the
proposed test statistic 1T is not greatly affected by a change in the covariance matrix
structure of the null hypothesis. As expected, the empirical Type I error rates of the
proposed test statistic 1T were reasonably close to the nominal 0.05 significance level
and got better when p and n increase.
Table 4.3 shows that the empirical powers of the proposed test statistic 1T
performed under the four covariance matrix structures (UN, CS, CSH, and TOEP)
rapidly converged to one and remained high as p and n increased. It can be seen in
36
this table that the speed of convergence to normality of the four sets of empirical
powers of the proposed test statistic 1T differed depending on the alternative
covariance matrix setting.
The empirical Type I error rates and empirical powers performed to compare
the proposed test statistic 1T to the two test statistics JU and 1ST as described above
are displayed in Tables 4.4 and 4.5.
Table 4.4 reports that the empirical Type I error rates of the proposed test
statistic 1T under the null hypothesis with the covariance matrix as the identity matrix,
i.e. ,I=Σ the same as Ledoit and Wolf (2002). It can be observed from this table that
the empirical Type I error rates of the proposed test statistic 1T are very similar to
those provided in Table 4.2 and it performed well under the four covariance matrix
structures (UN, CS, CSH, and TOEP). This suggests that the proposed test statistic 1T
is also very appropriate for the SIM structure. This table shows that the empirical
Type I error rates of the three test statistics ,, 1SJ TU and 1T generally tended to the
nominal 0.05 significance level as p and n increased, and tended to 0.05 as p
increased for any fixed .n Furthermore, the two test statistics 1ST and 1T yielded
empirical Type I error rates close to 0.05 in the following situations: (1) when p was
large, here ,160≥p for any n , and (2) when both p and n were at least medium
numbers, here 40≥p and ,40≥n where .np ≥ The test statistic JU gave empirical
Type I error rates close to 0.05 in the following situations: (1) when p was very large,
here 320≥p , for any ,n and (2) when both p and n were large, here 160≥p and
,160≥n where .np ≥ This indicates that the proposed test statistic 1T is more useful
than the test statistic JU since 1T can be applied in a wider range of p and n .
So as to take a look at the empirical powers of the competitive tests 1, SJ TU
and 1T under the alternative hypothesis where the covariance matrix was set to be an
F matrix half of whose elements equal 1 and the other half equal 0.05, Table 4.4
shows that, as expected, the empirical powers of the test statistics converged to one as
37
p and n increased. Moreover, the empirical powers of the proposed test statistic 1T
were much higher than those from the two tests JU and 1ST when the sample size was
medium, with n around 40.
Table 4.5 reporting empirical Type I error rates when I2=Σ (SIM) shows
that all of the competitive test statistics ,, 1SJ TU and 1T yielded empirical Type I error
rates with the same values as those under I=Σ , provided in Table 4.4. Thus there is
no need to recap on it. From this result, it is clear that the unknown scalar 2σ was not
effective on the consistency of the asymptotic normality of the test statistics ,, 1SJ TU
and 1T , which should have been the case.
As displayed in Table 4.5, the empirical powers of the three test statistics
1, SJ TU and 1T tended towards one as p and n increased. The empirical power of the
proposed test statistic 1T was substantially higher than those of JU and 1ST , especially
when n was small, 20 here, for all values of p where .np ≥ The magnitude of this
empirical power difference seemed to decrease when n increased.
Consequently, it can be concluded that the performance of proposed test
statistic 1T was generally outstanding when compared to the test statistics JU and 1ST ,
especially when the sample size was small or medium.
38
Table 4.2 Empirical Type I Error Rates of the Test Statistic 1T under the Four Null
Hypotheses applied at 05.0=α
Empirical Type I Error Rate of 1T p n 0
10 : U=ΣH 0
20 : C=ΣH 0
30 : M=ΣH 0
40 : T=ΣH
10 10 0.059 0.058 0.058 0.059 20 10 0.055 0.054 0.054 0.055 20 0.062 0.063 0.063 0.061
40 10 0.055 0.055 0.056 0.055 20 0.056 0.056 0.055 0.055 40 0.055 0.055 0.055 0.056
80 10 0.057 0.056 0.056 0.055 20 0.057 0.057 0.057 0.057 40 0.052 0.052 0.051 0.051 80 0.052 0.052 0.052 0.051
160 10 0.054 0.053 0.054 0.053 20 0.053 0.054 0.054 0.053 40 0.055 0.055 0.055 0.055 80 0.056 0.056 0.055 0.056 160 0.053 0.053 0.053 0.054
320 10 0.052 0.052 0.052 0.052 20 0.051 0.051 0.050 0.050 40 0.052 0.051 0.052 0.051 80 0.050 0.051 0.050 0.050 160 0.051 0.050 0.050 0.050 320 0.053 0.051 0.053 0.053
39
Table 4.3 Empirical Powers of the Test Statistic 1T under the Four Alternative
Hypotheses applied at 05.0=α
Empirical Power of 1T p n 1
11 : U=ΣH 1
21 : C=ΣH 1
31 : M=ΣH 1
41 : T=ΣH
10 10 0.174 0.560 0.269 0.4802 20 10 0.224 0.597 0.286 0.9328 20 0.362 0.905 0.447 0.999
40 10 0.265 0.617 0.288 1.000 20 0.443 0.918 0.460 1.000 40 0.772 1.000 0.776 1.000
80 10 0.300 0.624 0.292 1.000 20 0.498 0.918 0.461 1.000 40 0.837 1.000 0.776 1.000 80 0.998 1.000 0.991 1.000
160 10 0.319 0.625 0.289 1.000 20 0.537 0.925 0.467 1.000 40 0.866 1.000 0.778 1.000 80 0.999 1.000 0.993 1.000 160 1.000 1.000 1.000 1.000
320 10 0.342 0.629 0.293 1.000 20 0.554 0.925 0.459 1.000 40 0.891 1.000 0.779 1.000 80 1.000 1.000 0.993 1.000 160 1.000 1.000 1.000 1.000 320 1.000 1.000 1.000 1.000
40
Table 4.4 Empirical Type I Error Rates (under IH =Σ:1*0 ) and Empirical Powers
(under F=Σ:1*1H ) of 1, SJ TU and 1T applied at 05.0=α
Empirical Type I Error Rate Empirical Power p n JU 1ST 1T JU 1ST 1T
10 10 0.049 0.048 0.059 0.121 0.118 0.059 20 10 0.050 0.047 0.054 0.130 0.125 0.049 20 0.059 0.057 0.063 0.270 0.265 0.233
40 10 0.054 0.051 0.055 0.136 0.132 0.050 20 0.053 0.056 0.055 0.283 0.288 0.229 40 0.055 0.053 0.056 0.658 0.654 0.891
80 10 0.057 0.053 0.057 0.146 0.141 0.047 20 0.058 0056 0.057 0.291 0.284 0.226 40 0.052 0.050 0.052 0.676 0.671 0.896 80 0.051 0.050 0.052 0.991 0.991 1.000
160 10 0.056 0.054 0.053 0.144 0.139 0.045 20 0.055 0.053 0.053 0.294 0.287 0.228 40 0.056 0.055 0.055 0.673 0.668 0.896 80 0.057 0.055 0.055 0.994 0.994 1.000 160 0.052 0.052 0.053 1.000 1.000 1.000
320 10 0.055 0.052 0.052 0.143 0.138 0.045 20 0.052 0.050 0.051 0.292 0.286 0.226 40 0.054 0.052 0.052 0.684 0.677 0.897 80 0.050 0.050 0.050 1.000 1.000 1.000 160 0.050 0.050 0.051 1.000 1.000 1.000 320 0.053 0.053 0.053 1.000 1.000 1.000
41
Table 4.5 Empirical Type I Error Rates (under IH 2:2*0 =Σ ) and Empirical Powers
(under DH 2:2*1 =Σ ) of 1, SJ TU and 1T applied at 05.0=α
Empirical Type I Error Rate Empirical Power p n JU 1ST 1T JU 1ST 1T
10 10 0.049 0.048 0.059 0.412 0.405 0.453 20 10 0.050 0.047 0.054 0.445 0.437 0.498 20 0.059 0.057 0.063 0.922 0.918 1.000
40 10 0.054 0.051 0.055 0.368 0.360 0.256 20 0.053 0.056 0.055 0.821 0.816 1.000 40 0.055 0.053 0.056 0.999 0.999 1.000
80 10 0.057 0.053 0.057 0.356 0.348 0.202 20 0.058 0056 0.057 0.789 0.783 0.999 40 0.052 0.050 0.052 0.999 0.999 1.000 80 0.051 0.050 0.052 1.000 1.000 1.000
160 10 0.056 0.054 0.053 0.354 0.346 0.189 20 0.055 0.053 0.053 0.781 0.774 1.000 40 0.056 0.055 0.055 0.999 0.999 1.000 80 0.057 0.055 0.055 1.000 1.000 1.000 160 0.052 0.052 0.053 1.000 1.000 1.000
320 10 0.055 0.052 0.052 0.352 0.343 0.189 20 0.052 0.050 0.051 0.789 0.782 1.000 40 0.054 0.052 0.052 0.999 0.999 1.000 80 0.050 0.050 0.050 1.000 1.000 1.000 160 0.050 0.050 0.051 1.000 1.000 1.000 320 0.053 0.053 0.053 1.000 1.000 1.000
42
4.2 Simulation Study for Testing the Equality of Two Covariance
Matrices for Two High-dimensional Data
4.2.1 Simulation Setting
In this section, the performance of the proposed test statistic 2T was assessed
using a numerical simulation technique for testing the hypothesis Σ=Σ=Σ′ 210 :H
and by comparing it to the competitive tests ,, 2SJ TT and .SYT Under the null
hypothesis, by controlling ,2,1, =≥ knp k for each ),,( 21 nnp combination of
},240,160,80,40,20{∈p },240,160,80,40,20{1 ∈n and },240,160,80,40,20{2 ∈n two
independent p-multivariate normal samples were simulated with 10,000 independent
iterations. The nominal significance level was fixed at .05.0=α The four test
statistics ,,, 22 SJ TTT and SYT were computed and the proportions of rejections of these
statistics were recorded. The empirical Type I error rates of the four test statistics
were conducted under seven hypotheses set up with different covariance structures, as
shown below. The corresponding empirical powers (under the corresponding
alternative hypothesis) were computed for a test whose Type I error rate can control
the nominal significance level as follows:
1) Unstructured Structure (UN). Two hypotheses where the common
covariance matrix structure is a UN were considered:
1.1 0211
0 : U=Σ=Σ′H
011
1 : U=Σ′H and 12 U=Σ ,
where 0U and 1U in UN are defined as
⎪⎩
⎪⎨⎧
<−
==== +
×× jiforj
ijifor
jippjippij
)10.0()1(1
)()(0 σσU
and
⎪⎩
⎪⎨⎧
<−
==== +
×× .)05.0()1(1
)()(1 jiforj
ijifor
jippjippij σσU
43
1.2 DDH 1212 : Δ=Σ=Σ′
DDH 112 : Δ=Σ′ and ,22 DDΔ=Σ
where ),,...,( 1 pdddiagD = ),5,1(~,...,...
1 Udddii
p and )2,0( =Δ jj is a pp× matrix
whose thba ),( element are defined by 10/1
)}2(2.0{)1( baba j −+ +×− . Note that this
hypothesis follows the idea of Srivastava and Yanakihara (2010) and was previously
defined in Chapter 2.
2) Compound Symmetry Structure (CS)
pppIH 11)01.0(99.0: 0213
0 ′+==Σ=Σ′ C
013
1 : C=Σ′H and .11)05.0(95.012 pppI ′+==Σ C
3) Heterogeneous compound symmetry Structure (CSH)
0214
0 : M=Σ=Σ′H
014
1 : M=Σ′H and 12 M=Σ ,
where 0M is a matrix having CSH with ρσσσσσ jiijiij jiU === ;)6,5(~2 , when
5.0=ρ , and 1M is a matrix having CSH with ρσσσσσ jiijiij jiU === ;)5,4(~2
when .4.0=ρ
4) Simple Structure (SIM) and Variance Component Structure (VC) (as
in Schott, 2007)
pIH =Σ=Σ′ 215
0 : (SIM)
pIH =Σ′ 15
1 : (SIM) and 22 V=Σ (VC),
where ).2,1,1,1,...,2,1,1,1,2,1,1,1(2 diag=V
5) Simple Structure (SIM)
pIH 2: 216
0 =Σ=Σ′
pIH 2: 16
1 =Σ′ and .5.12 pI=Σ
6) Toeplitz Structure (TOEP)
0217
0 : T=Σ=Σ′H
017
1 : T=Σ′H and 12 T=Σ ,
44
where 0T is a Toeplitz matrix with elements 50.0,1 10 −== σσ and the rest of the
elements equal to zero, and 1T is a Toeplitz matrix with elements 30.0,1 10 −== σσ
and the rest of the elements equal to zero.
7) Variance Component Structure (VC)
0218
0 : V=Σ=Σ′H
018
0 : V=Σ′H and 12 V=Σ , where piUDiag iipp ,...,1),2,1(~),,...,,( 22110 == σσσσV
and .,...,1),5.2,5.1(~),,...,,( 22111 piUDiag iipp == σσσσV
4.2.2 Simulation Results
4.2.2.1 Empirical Type I Error Rates
In this section, the results are described in the order of the simulation
settings described above.
For the unstructured structure (UN), the empirical Type I error rates of
all four tests obtained by setting the common covariance matrix in two different
matrices under UN are exhibited in Tables 4.6 and 4.7.
Table 4.6 reports that the empirical Type I error rates of the test statistic
JT were much larger than the nominal 0.05 significance level. It is obvious that these
rates could not reach 0.05 for all of the p and sample size cases and they moved
further away from 0.05 when the sample size increased for any .p For instance,
when ,80=p the empirical Type I error rate of JT is 0.057 (at 2021 == nn ) and
increases to 0.061 (at ).8021 == nn The empirical Type I error rates of the two test
statistics 2ST and SYT were not close to 0.05 for all p and sample sizes considered.
These empirical Type I error rates are conservative (the values of Type I errors are
much smaller than the nominal significance level). The empirical Type I error rates of
the proposed test statistic 2T reasonably tended to 0.05 as p and the sample size
increased. As seen from Table 4.6, it can be concluded that the three test statistics
,, 2SJ TT and SYT did not perform satisfactorily under this common covariance matrix
setting while the proposed test 2T was appropriate if the sample size was large,
say ,1601 ≥n ,1602 ≥n and .2,1, =≥ knp k
45
Table 4.7 presents the empirical Type I error rates of the competitive
test statistics obtained by tracking the idea of Srivastava and Yanagihara (2010) as
described in the simulation settings. The empirical Type I error rates of the test
statistics JT and 2T have a similar pattern to those provided in Table 4.6, so they will
not be explained again. It can be observed from Table 4.7 that the empirical Type I
error rates of the test statistics 2ST and SYT from this table are significantly different to
those from Table 4.6. Their empirical Type I error rates converged to 0.05 when p and
the sample size increased. This indicates that the consistency towards asymptotic
normality of the two test statistics 2ST and SYT was greatly affected by a change of the
covariance matrix appearing in the null hypothesis. These two test statistics performed
well under covariance matrices set as in the idea given in Srivastava and Yanagihara
(2010). However, the test statistics 2ST and SYT did not perform well for small sample
sizes and were suitable when the sample sizes were at least medium (at least 40 here)
and .2,1, =≥ knp k Furthermore, it can be observed that the convergence to an
asymptotic normal distribution of the test statistic 2ST was faster than those of SYT .
This phenomenon agrees with the statement given in Srivastava and Yanagihara
(2010).
Under the compound symmetry structure (CS), the empirical Type I
error rates are shown in Table 4.8. Both test statistics JT and 2T gave satisfactory
empirical Type I error rates which were quite controlled at 0.05 for all cases of p and
sample size considered. The empirical Type I error rates of the test statistics 2ST and
SYT were not close to 0.05. As seen from the table, empirical Type I error rates of the
two test statistics 2ST and SYT were much larger than 0.05 when p and the sample size
were small. After that, their empirical Type I error rates decreased further away from
0.05 as p and the sample size increased. For example, at ,4021 === nnp the
empirical Type I error rates of the test statistics 2ST and SYT were 0.053 and 0.077,
respectively, and both decreased to 0.037 and 0.024 when ,24021 === nnp
respectively. Moreover, when p was fixed, the empirical Type I error rates of 2ST and
SYT dropped when the sample size increased. For example, when ,160=p the
46
empirical Type I error rates of 2ST and SYT were 0.059 and 0.057 (at ),2021 == nn
respectively, and both rates decreased to 0.043 and 0.031, respectively (at
).16021 == nn Consequently, the test statistics 2ST and SYT are not suitable for this
covariance structure while the proposed test statistic JT and the test statistic 2T are.
Under the heterogeneous compound symmetry structure (CSH), the
empirical Type I error rates are exhibited in Table 4.9. The empirical Type I error
rates of the three test statistics ,, 2SJ TT and SYT were not close to 0.05 whereas that of
the proposed test statistic 2T approximated well to 0.05 as p and the sample size
increased. It can be observed that the empirical Type I error rates of the test statistics
2ST and SYT from this table are lower than those from Table 4.6 for all cases
considered. This indicates that the convergences of the tests 2ST and SYT to a standard
normal distribution were very slow and was not accomplished when the common
covariance matrix was under the CSH structure. The proposed test statistic 2T gave
empirical Type I error rates tending to 0.05 when p and the sample size increased.
Moreover, the empirical Type I error rates of the proposed test 2T were close to 0.05
when the sample size was at least medium, here ,401 ≥n ,402 ≥n and .2,1, =≥ knp k
Under the simple structure (SIM) with ,12 =σ i.e. I=Σ=Σ 21 , the
empirical Type I error rates are shown in Table 4.10. The empirical Type I error rates
of the test statistics JT and 2T were very close to 0.05 for all cases considered.
Moreover, the empirical Type I error rates of the proposed test statistic 2T were
slightly better than those of the JT test statistic. For the test statistics 2ST and SYT ,
their empirical Type I error rates converged to 0.05 more slowly than those of the test
statistics JT and 2T . However, the empirical Type I error rates of the test statistics ST
and SYT were extremely out of control at 0.05, particularly when the sample size was
small for any .p
Under the SIM structure with ,22 =σ i.e. I221 =Σ=Σ , as displayed in
Table 4.11, the empirical Type I errors of the two test statistics JT and 2T are the same
as those displayed in Table 4.10 (when I=Σ=Σ 21 ) while the test statistics 2ST and
47
SYT became too conservative. This indicates that consistency to the asymptotic
normality of the two test statistics JT and 2T was not affected by a change of an
unknown scalar 2σ defined in SIM, from 12 =σ to ,22 =σ while the consistency of
the two test statistics 2ST and SYT was very sensitive to this scalar.
Under the Toeplitz structure (TOEP), and the variance component
structure (VC), the empirical Type I error rates are shown in Tables 4.12 and 4.13,
respectively. Since these two tables give results in a similar pattern, it can be
simultaneously summarized that the empirical Type I errors of the test statistics
2ST and SYT were approximately zero while those of the two test statistics JT and
2T approximated the nominal 0.05 significance level as p and the sample size
increased. The test statistic JT yielded empirical Type I error rates which were slightly
better than those of the proposed test statistic 2T under TOEP, but those of 2T were
slightly better than those of JT under the VC structure. Moreover, Tables 4.12 and
4.13 present that the empirical Type I error rates of the proposed test statistic 2T were
close to 0.05 when the sample size was at least medium, i.e. 40,40 21 ≥≥ nn here, and
.2,1, =≥ knp k
For the case when the two sample sizes were not equal ),( 21 nn ≠
empirical Type I error rates tests SYSJ TTT ,, 2 and 2T were also performed for all seven
null and alternative hypotheses. 12 2nn = was chosen and used in combination
with ),,( 21 nnp where },240,160,80,40,20{∈p },240,160,80,40,20{1 ∈n and
}.240,160,80,40{2 ∈n The results of the Type I error rates are displayed at the bottom
of Tables 4.6-4.13. All tables report that the empirical Type I error rates of all four
test statistics SYSJ TTT ,, 2 and 2T are not substantially different from each other when
21 nn = in every null hypothesis and covariance matrix structure combination.
As describe above, it can be concluded that the proposed test statistic 2T
performed well with all of the covariance matrix structures considered here, while the
test statistic JT is most appropriate under the CS, SIM, TOEP, and VC structures. The
48
previous statement that the test statistic JT is most appropriate under the SIM
corresponds to the results from Schott (2007). However, the convergence of the test
statistic JT to a standard normal distribution was slower when the common
covariance matrix was not of the SIM structure. In addition, the two test statistics
2ST and SYT were suitable for a certain covariance matrix under the UN structure and
also performed reasonably well under the SIM structure only in the case of 12 =σ , i.e.
when the common covariance matrix is the identity matrix.
4.2.2.2 Empirical Power
In general, it is fair to compare two tests with respect to their powers
only if they are level alpha tests, i.e. for power comparison studies, only a test which
can control the nominal significance level should be considered.
Recall that the proposed test statistic 2T is appropriate under all of the
covariance matrix structures (UN, CS, CSH, SIM, TOEP and VC), the test statistic
JT is appropriate under the CS, SIM, TOEP and VC structures, and the test statistics
2ST and SYT are appropriate under UN (for a certain null hypothesis) and SIM (when
)12 =σ structures. Subsequently, the empirical powers under the corresponding
alternative hypothesis were computed as described in the simulation settings and are
presented in Tables 4.6 to 4.13.
For the UN structure, Table 4.6 shows that the empirical powers of the
proposed test statistic 2T converged to 1 as p and the sample size increased. Table 4.7
reports the empirical powers of the proposed test statistic 2T compared to the test
statistics 2ST and SYT under certain alternative hypotheses with covariance matrices
DD 11 Δ=Σ and DD 22 Δ=Σ (from Srivastava and Yanagihara (2010)). As expected,
their empirical powers tended to 1 as p and the sample size increased. It can be seen in
this table that the empirical powers of the proposed test statistic 2T are generally
higher than those of the two tests.
For the CS structure, Table 4.8 reports that the empirical powers of the
proposed test statistic 2T and JT test statistic are quite high and rapidly tended to one.
49
Moreover, the empirical powers of both tests were quite responsive to an increase in
p and sample size. Furthermore, the empirical powers of the proposed test statistic
2T were higher than those of the test statistic JT for all cases considered.
For the CSH structure, only the proposed test statistic 2T approximated
0.05 well. As displayed in Table 4.9, the empirical powers of the proposed test
2T rapidly converged to one when p and the sample size increased.
For the SIM structure, Table 4.10 shows the empirical powers of all four
test statistics SYSJ TTT ,, 2 and 2T under the alternative hypothesis using I=Σ1 and
).2,1,1,1,...,2,1,1,1(2 dia=Σ As expected, the empirical powers of these test statistics
quickly tended to one as p and the sample size increased. The proposed test statistic
2T generally gave a higher power than the competitive tests ,, 2SJ TT and .SYT
Moreover, the empirical powers of the proposed test statistic 2T were substantially
higher than the competitive tests ,, 2SJ TT and SYT when the sample size was small.
Table 4.11 informs of the empirical powers of the proposed test
statistics 2T and JT under the alternative hypothesis with I21 =Σ and I5.12 =Σ (the
SIM structure). The empirical powers of these two test statistics converged to one as
p and the sample size increased. The convergence to one of the empirical powers of
the proposed test statistic 2T was much faster than that of the test statistic JT ,
especially when the sample size was small, 4021 ≤= nn here, for all .p For example,
when ,2021 === nnp the empirical powers of 2T and JT are 0.757 and 0.117,
respectively. Consequently, under the SIM structure, 2T is a reasonable test and more
powerful than the JT test, particularly in cases with a small sample size.
For the TOEP structure, Table 4.12 shows that the proposed test statistic
2T gave empirical powers approaching one with a much faster rate than the test
statistic JT as p and the sample size increased.
For the VC structure, the empirical powers of the test statistics 2T and
JT are presented in Table 4.13. The two tests yielded empirical powers that converged
50
to one as p and the sample size increased. The convergence of the empirical powers to
one of the proposed test statistic 2T was much faster than that of the test statistic .JT
The empirical powers of SYSJ TTT ,, 2 and 2T when the two sample sizes
were not equal, i.e. 21 nn ≠ and, in particular, ,2 12 nn = were also carried out. The
results are presented at the bottom of Tables 4.6-4.13. As expected, all four test
statistics SYSJ TTT ,, 2 and 2T yielded empirical powers higher than those obtained for
equal sample sizes for any p because the sample sizes were larger. Moreover, the
empirical power of 2T remained substantially higher than those of the test statistics
,, 2SJ TT and SYT , especially for small sample sizes.
51
Table 4.6 Empirical Type I Error Rates (under 10H ′ ) of ,,, 2 SYSJ TTT and 2T and
Empirical Powers (under 11H ′ ) of JT and 2T applied at 05.0=α
p 1n 2n
Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T 2T
21 nn = 20 20 20 0.061 0.052 0.048 0.062 0.078 40 20 20 0.055 0.027 0.019 0.058 0.103 40 40 0.090 0.027 0.012 0.053 0.128
80 20 20 0.057 0.006 0.003 0.062 0.167 40 40 0.061 0.013 0.005 0.054 0.241 80 80 0.061 0.025 0.015 0.053 0.428
160 20 20 0.064 0.001 0.000 0.071 0.288 40 40 0.064 0.008 0.003 0.057 0.446 80 80 0.066 0.019 0.013 0.056 0.719 160 160 0.066 0.031 0.024 0.053 0.944
240 20 20 0.063 0.001 0.000 0.082 0.371 40 40 0.066 0.001 0.004 0.067 0.573 80 80 0.068 0.019 0.015 0.056 0.824 160 160 0.069 0.031 0.026 0.055 0.980 240 240 0.089 0.035 0.033 0.051 1.000 12 2nn =
40 20 40 0.058 0.028 0.015 0.056 0.128 80 20 40 0.056 0.014 0.007 0.059 0.220 40 80 0.062 0.023 0.013 0.052 0.346
160 20 40 0.060 0.010 0.005 0.063 0.373 40 80 0.063 0.018 0.013 0.058 0.584 80 160 0.066 0.028 0.022 0.054 0.839
240 20 40 0.067 0.007 0.004 0.063 0.453 40 80 0.074 0.017 0.012 0.060 0.697 80 160 0.065 0.024 0.018 0.051 0.911
52
Table 4.7 Empirical Type I Error Rates (under 20H ′ ) of ,,, 2 SYSJ TTT and 2T and
Empirical Powers (under 21H ′ ) of ,,2 SYS TT and 2T applied at 05.0=α
p 1n 2n
Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T 2ST SYT 2T
21 nn =
20 20 20 0.092 0.081 0.087 0.061 0.427 0.029 0.435 40 20 20 0.082 0.064 0.077 0.057 0.438 0.017 0.525 40 40 0.095 0.057 0.065 0.055 0.655 0.190 0.635
80 20 20 0.086 0.074 0.075 0.060 0.590 0.008 0.518 40 40 0.093 0.042 0.065 0.055 0.699 0.106 0.729 80 80 0.100 0.056 0.041 0.055 0.917 0.502 0.926
160 20 20 0.093 0.071 0.073 0.060 0.415 0.001 0.506 40 40 0.095 0.063 0.059 0.058 0.746 0.307 0.878 80 80 0.097 0.048 0.045 0.054 0.897 0.589 0.901 160 160 0.103 0.051 0.054 0.052 1.000 0.999 1.000
240 20 20 0.095 0.064 0.069 0.069 0.491 0.001 0.515 40 40 0.094 0.047 0.040 0.056 0.804 0.032 0.882 80 80 0.104 0.054 0.046 0.047 0.993 0.708 0.999 160 160 0.105 0.053 0.054 0.053 1.000 1.000 1.000 240 240 0.108 0.050 0.051 0.051 1.000 1.000 1.000 12 2nn =
40 20 40 0.081 0.058 0.069 0.056 0.503 0.018 0.677 80 20 40 0.083 0.071 0.074 0.056 0.600 0.010 0.711 40 80 0.096 0.047 0.060 0.054 0.854 0.299 0.864
160 20 40 0.089 0.058 0.061 0.056 0.509 0.406 0.690 40 80 0.090 0.055 0.057 0.054 0.852 0.731 0.953 80 160 0.101 0.047 0.053 0.047 1.000 0.904 1.000
240 20 40 0.096 0.058 0.061 0.055 0.603 0.028 0.645 40 80 0.096 0.046 0.044 0.046 0.927 0.695 0.985 80 160 0.106 0.051 0.050 0.050 1.000 1.000 1.000
53
Table 4.8 Empirical Type I Error Rates (under 30H ′ ) of ,,, 2 SYSJ TTT and 2T and
Empirical Powers (under 31H ′ ) of JT and 2T applied at 05.0=α
p 1n 2n
Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2T
21 nn =
20 20 20 0.057 0.081 0.084 0.056 0.079 0.080 40 20 20 0.055 0.085 0.081 0.052 0.099 0.114 40 40 0.052 0.053 0.077 0.050 0.169 0.145
80 20 20 0.047 0.082 0.081 0.056 0.165 0.210 40 40 0.051 0.052 0.059 0.052 0.324 0.333 80 80 0.055 0.049 0.045 0.053 0.654 0.585
160 20 20 0.048 0.059 0.057 0.052 0.287 0.387 40 40 0.053 0.041 0.040 0.052 0.589 0.670 80 80 0.050 0.040 0.035 0.051 0.918 0.932 160 160 0.051 0.043 0.031 0.052 0.998 0.998
240 20 20 0.048 0.046 0.045 0.050 0.399 0.534 40 40 0.051 0.030 0.027 0.048 0.739 0.836 80 80 0.048 0.030 0.019 0.051 0.975 0.988 160 160 0.052 0.036 0.022 0.052 1.000 1.000 240 240 0.052 0.037 0.024 0.050 1.000 1.000 12 2nn =
40 20 40 0.054 0.064 0.076 0.054 0.135 0.142 80 20 40 0.051 0.059 0.071 0.054 0.228 0.276 40 80 0.053 0.047 0.045 0.049 0.448 0.469
160 20 40 0.047 0.042 0.044 0.046 0.397 0.508 40 80 0.053 0.039 0.033 0.051 0.725 0.809 80 160 0.056 0.044 0.034 0.052 0.967 0.977
240 20 40 0.050 0.036 0.033 0.049 0.528 0.669 40 80 0.054 0.032 0.025 0.052 0.859 0.925 80 160 0.054 0.033 0.025 0.047 0.993 0.998
54
Table 4.9 Empirical Type I Error Rates (under 40H ′ ) of ,,, 2 SYSJ TTT and 2T and
Empirical Powers (under 41H ′ ) of JT and 2T applied at 05.0=α
p 1n 2n
Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T 2T
21 nn =
20 20 20 0.101 0.033 0.007 0.058 0.0408 40 20 20 0.098 0.032 0.007 0.058 0.551 40 40 0.098 0.039 0.022 0.053 0.851
80 20 20 0.098 0.031 0.007 0.060 0.579 40 40 0.101 0.040 0.023 0.052 0.862 80 80 0.103 0.043 0.031 0.049 0.989
160 20 20 0.100 0.031 0.008 0.057 0.528 40 40 0.101 0.038 0.021 0.052 0.755 80 80 0.099 0.041 0.031 0.050 0.946 160 160 0.103 0.050 0.044 0.051 0.998
240 20 20 0.098 0.029 0.007 0.065 0.471 40 40 0.099 0.038 0.021 0.058 0.668 80 80 0.099 0.041 0.032 0.052 0.861 160 160 0.100 0.044 0.038 0.052 0.987 240 240 0.108 0.050 0.044 0.052 0.999 12 2nn =
40 20 40 0.095 0.039 0.012 0.057 0.645 80 20 40 0.098 0.035 0.011 0.053 0.664 40 80 0.097 0.040 0.027 0.050 0.938
160 20 40 0.097 0.033 0.011 0.052 0.578 40 80 0.098 0.038 0.025 0.048 0.840 80 160 0.100 0.043 0.038 0.051 0.983
240 20 40 0.096 0.035 0.012 0.059 0.497 40 80 0.098 0.038 0.024 0.051 0.727 80 160 0.099 0.042 0.035 0.049 0.934
55
Table 4.10 Empirical Type I Error Rates (under 50H ′ ) of ,,, 2 SYSJ TTT and 2T and
Empirical Powers (under 51H ′ ) of JT and 2T applied at 05.0=α
p 1n 2n
Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2ST SYT 2T
21 nn =
20 20 20 0.057 0.082 0.084 0.058 0.243 0.422 0.015 0.23040 20 20 0.054 0.090 0.087 0.050 0.241 0.533 0.009 0.502 40 40 0.054 0.054 0.083 0.052 0.564 0.976 0.051 0.978
80 20 20 0.048 0.093 0.090 0.056 0.223 0.581 0.003 0.708 40 40 0.051 0.061 0.073 0.051 0.573 0.997 0.040 0.998 80 80 0.056 0.054 0.057 0.052 0.969 1.000 0.485 1.000
160 20 20 0.049 0.082 0.075 0.052 0.213 0.538 0.001 0.821 40 40 0.053 0.062 0.071 0.050 0.561 0.999 0.029 1.000 80 80 0.048 0.052 0.055 0.051 0.971 1.000 0.469 1.000 160 160 0.051 0.050 0.050 0.050 1.000 1.000 0.999 1.000
240 20 20 0.046 0.070 0.068 0.052 0.200 0.454 0.001 0.853 40 40 0.052 0.062 0.065 0.049 0.552 1.000 0.021 1.000 80 80 0.048 0.051 0.048 0.048 0.973 1.000 0.456 1.000 160 160 0.052 0.050 0.053 0.050 1.000 1.000 1.000 1.000 240 240 0.052 0.048 0.054 0.048 1.000 1.000 1.000 1.000 12 2nn =
40 20 40 0.054 0.064 0.079 0.053 0.265 0.659 0.004 0.74380 20 40 0.053 0.067 0.081 0.053 0.260 0.784 0.004 0.907 40 80 0.053 0.051 0.056 0.049 0.751 1.000 0.079 1.000
160 20 40 0.049 0.060 0.067 0.047 0.244 0.846 0.002 0.972 40 80 0.052 0.052 0.056 0.050 0.746 1.000 0.072 1.000 80 160 0.055 0.050 0.054 0.051 0.999 1.000 0.786 1.000
240 20 40 0.048 0.064 0.068 0.048 0.228 0.845 0.001 0.983 40 80 0.057 0.054 0.051 0.051 0.751 1.000 0.067 1.000 80 160 0.050 0.048 0.051 0.049 0.999 1.000 0.768 1.000
56
Table 4.11 Empirical Type I Error Rates (under 60H ′ ) of ,,, 2 SYSJ TTT and 2T and
Empirical Powers (under 61H ′ ) of ,,, 2 SYSJ TTT and 2T applied at 05.0=α
p 1n 2n
Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2T
21 nn =
20 20 20 0.057 0.005 0.000 0.058 0.117 0.757 40 20 20 0.054 0.001 0.000 0.051 0.105 0.870 40 40 0.054 0.003 0.000 0.052 0.205 0.998
80 20 20 0.048 0.000 0.000 0.055 0.096 0.936 40 40 0.051 0.001 0.000 0.051 0.198 1.000 80 80 0.056 0.005 0.000 0.052 0.495 1.000
160 20 20 0.049 0.000 0.000 0.052 0.086 0.964 40 40 0.053 0.001 0.000 0.050 0.185 1.000 80 80 0.048 0.002 0.000 0.051 0.931 1.000 160 160 0.051 0.004 0.000 0.050 1.000 1.000
240 20 20 0.046 0.000 0.000 0.052 0.070 0.974 40 40 0.052 0.000 0.000 0.049 0.177 1.000 80 80 0.048 0.001 0.000 0.048 0.464 1.000 160 160 0.052 0.002 0.000 0.050 0.936 1.000 240 240 0.052 0.004 0.000 0.048 0.999 1.000 12 2nn =
40 20 40 0.055 0.003 0.001 0.052 0.189 0.947 80 20 40 0.053 0.002 0.000 0.053 0.186 0.984 40 80 0.053 0.003 0.001 0.049 0.347 1.000
160 20 40 0.049 0.000 0.000 0.052 0.174 0.994 40 80 0.052 0.001 0.000 0.050 0.342 1.000 80 160 0.055 0.003 0.000 0.051 0.708 1.000
240 20 40 0.048 0.001 0.000 0.048 0.160 0.996 40 80 0.057 0.001 0.000 0.051 0.329 1.000 80 160 0.050 0.002 0.000 0.049 0.714 1.000
57
Table 4.12 Empirical Type I Error Rates (under 70H ′ ) of ,,, 2 SYSJ TTT and 2T and
Empirical Powers (under 71H ′ ) of JT and 2T applied at 05.0=α
p 1n 2n
Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2T
21 nn =
20 20 20 0.061 0.016 0.001 0.067 0.100 0.215 40 20 20 0.055 0.008 0.001 0.056 0.093 0.262 40 40 0.056 0.017 0.001 0.053 0.152 0.428
80 20 20 0.049 0.004 0.001 0.060 0.089 0.320 40 40 0.055 0.007 0.001 0.052 0.151 0.589 80 80 0.054 0.015 0.001 0.051 0.328 0.907
160 20 20 0.050 0.001 0.000 0.055 0.090 0.373 40 40 0.052 0.004 0.000 0.054 0.152 0.736 80 80 0.053 0.007 0.000 0.053 0.336 0.987 160 160 0.052 0.013 0.000 0.047 0.772 1.000
240 20 20 0.048 0.001 0.001 0.053 0.084 0.398 40 40 0.050 0.002 0.000 0.053 0.147 0.800 80 80 0.049 0.005 0.001 0.053 0.333 0.996 160 160 0.055 0.011 0.000 0.050 0.766 1.000 240 240 0.051 0.015 0.001 0.044 0.973 1.000 12 2nn =
40 20 40 0.056 0.013 0.001 0.052 0.134 0.312 80 20 40 0.050 0.008 0.001 0.054 0.132 0.330 40 80 0.054 0.011 0.001 0.049 0.228 0.718
160 20 40 0.053 0.003 0.001 0.049 0.133 0.474 40 80 0.055 0.006 0.001 0.052 0.224 0.867 80 160 0.052 0.014 0.001 0.052 0.502 0.998
240 20 40 0.051 0.003 0.001 0.047 0.127 0.508 40 80 0.051 0.003 0.001 0.050 0.231 0.919 80 160 0.055 0.011 0.000 0.050 0.506 1.000
58
Table 4.13 Empirical Type I Error Rates (under 80H ′ ) of ,,, 2 SYSJ TTT and 2T and
Empirical Powers (under 81H ′ ) of JT and 2T applied at 05.0=α
p 1n 2n
Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2T
21 nn =
20 20 20 0.059 0.008 0.000 0.059 0.118 0.286 40 20 20 0.054 0.002 0.000 0.052 0.108 0.526 40 40 0.054 0.007 0.000 0.052 0.194 0.981
80 20 20 0.048 0.000 0.000 0.056 0.092 0.717 40 40 0.052 0.002 0.000 0.052 0.184 0.999 80 80 0.055 0.008 0.000 0.054 0.454 1.000
160 20 20 0.049 0.001 0.000 0.052 0.081 0.825 40 40 0.053 0.001 0.000 0.052 0.179 1.000 80 80 0.051 0.003 0.000 0.051 0.450 1.000 160 160 0.050 0.006 0.000 0.049 0.912 1.000
240 20 20 0.047 0.000 0.000 0.049 0.074 0.861 40 40 0.051 0.000 0.000 0.049 0.166 1.000 80 80 0.047 0.000 0.000 0.049 0.430 1.000 160 160 0.054 0.004 0.000 0.048 0.914 1.000 240 240 0.052 0.006 0.000 0.047 0.998 1.000 12 2nn =
40 20 40 0.056 0.007 0.001 0.053 0.084 0.782 80 20 40 0.051 0.002 0.000 0.052 0.073 0.922 40 80 0.055 0.005 0.000 0.052 0.198 1.000
160 20 40 0.051 0.001 0.001 0.045 0.067 0.974 40 80 0.054 0.003 0.000 0.050 0.188 1.000 80 160 0.055 0.005 0.000 0.050 0.594 1.000
240 20 40 0.048 0.002 0.000 0.048 0.060 0.983 40 80 0.055 0.001 0.000 0.051 0.184 1.000 80 160 0.049 0.003 0.000 0.049 0.587 1.000
59
4.3 Application
Here, the proposed test statistics 1T and 2T were applied to a microarray
dataset collected by Notterman et al. (2001), which is available at http://genomics-
pubs.princeton.edu/oncology/Data/CarcinomaNormal datasetCancerResearch.xls (last
access: October 9, 2012). Two groups of cancerous colon tissues (adenocarcinoma
and adenoma) and their paired normal colon tissues were examined using
olignucleotide arrays. The expression levels of about 6500 human genes were probed
in 18 colon adenocarcinomas, 4 colon adenomas, and 22 normal colon tissues.
4.3.1 One High-dimensional Sample
From the dataset, 18 colon adenocarcinomas were probed with oligonucleotide
arrays and the expression levels of 6500 human genes were measured on each. For
convenience, attention to the 18 colon adenocarcinomas was restricted to the first 256
measurements each, so 18=n and .256=p The covariance matrix was examined for
sphericity. The data gave the observed test statistic values as ,567.284=JU
,582.2701 =ST and ,500.81 =T with each p-value 0≈ , so the hypothesis for testing
sphericity is rejected at any reasonable significance level.
4.3.2 Two High-dimensional Samples
Two groups of cancerous colon tissues (adenocarcinoma and adenoma) were
examined with olignucleotide arrays. Attention was restricted to a subset of 100
expression levels of the gene expressions on colon tissues from 4 adenocarcinomas
and 4 adenomas, giving ,4,4 21 == nn and .100=p The covariance matrices of the two
groups were examined for equality. The data presented the observed test statistic
values of 908.0=JT and 636.02 −=T whose corresponding p-values were 0.182 and
0.524, indicating that the hypothesis of the equality of the two covariance matrices of
these data was not rejected at any reasonable significance level.
CHAPTER 5
CONCLUSIONS, DISCUSSION AND RECOMMENDATIONS
FOR FUTURE RESEARCH
5.1 Conclusions
In this dissertation, independent p-variate normal data are assumed to be high-
dimensional, i.e. the number of the variables is larger than or equal to the sample size.
The two hypotheses of interest are:
1) The hypothesis for testing for a partially known matrix for one high-
dimensional data, i.e.
02
0 : Σ=Σ σH against 02
1 : Σ≠Σ σH (1.1)
2) The hypothesis for testing the equality of two covariance matrices for two
high-dimensional data, i.e.
Σ=Σ=Σ′ 210 :H against 211 : Σ≠Σ′H (1.2)
For these two hypotheses, the test statistics 1T from (1.1) and 2T from (1.2)
were proposed. The first test statistic 1T is given by
ψσ
ˆ2 41
nT = ,
where 41
22 ˆ2ˆˆ σσψ +−= aa . The second test statistic 2T is formulated as
δ1ˆ
2−
=bT ,
where
⎭⎬⎫
⎩⎨⎧
−−
⎭⎬⎫
⎩⎨⎧
−−
+−−+−−
=2
22
22
21
1
21
112
2
222
1
)(1
1
)(1
1
)1)(2()1()1)(2()1(ˆ
SS
SS
trn
tr
trn
tr
nnnnnn
b
61
and
.)1(
11
1ˆˆ2
4)1(1
1ˆˆ24ˆ
2
12
2
122
*4
2
12
2
122
*42
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
−+
−=
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
−+
−= ∑∑∑∑
==== k kk kk kk k nnhph
np
nhh
pδ
Under the first null hypothesis (1.1), the asymptotic distribution of the first
proposed test statistic 1T follows the standard normal as the number of variables and
the sample size goes towards infinity. Under the second null hypothesis (1.2), the
proposed test statistic 2T is asymptotically distributed as standard normal as the
number of variables and the sample size goes towards infinity.
The properties of the two proposed test statistics 1T and 2T were evaluated on
two aspects: the Type I error rate and the power of the tests via simulation techniques.
With 10,000 independent simulations, independent p-variate normal data were
generated. Empirical Type I error rates of the two test statistics 1T and 2T were
computed and compared to existing test statistics in the literature under the null
hypothesis with a variety of covariance matrix structures (Unstructured (UN),
Compound Symmetry (CS), Heterogeneous compound symmetry (CSH), Simple
(SIM), Toeplitz (TOEP) and Variance Component (VC)). Empirical powers were also
investigated under the corresponding alternative hypothesis with a variety of
covariance matrix structures.
For testing hypothesis (1.1), the results from the simulation study are as
follows. The first proposed test statistic 1T performed well for all covariance matrix
structures considered (UN, CS, CSH and TOEP) and convergence to asymptotic
normality of the proposed test statistic was not affected by a change of covariance
matrix structure. A comparison of the proposed test statistic 1T with the existing tests
JU and 1ST proposed by Ledoit and Wolf (2002); and Srivastava (2005), respectively,
was also carried out under the SIM structure. As seen in Chapter 4, the proposed test
statistic 1T was generally very comparable to the test statistics JU and 1ST when the
sample size was large and the proposed test was outstanding in comparison with the
test statistics JU and 1ST for small or medium sample size at any p, where .np ≥
62
For testing hypothesis (1.2), the performance of the second proposed test
statistic 2T compared to competitive test statistics ,, 2SJ TT and SYT was assessed under
six covariance matrix structures (UN, CS, CSH, TOEP, SIM and VC). It was found
that the proposed test statistic 2T is an appropriate test for all of the covariance matrix
structures considered here, and the test statistic JT is appropriate for some of the
structures (CS, SIM, TOEP, and VC), whereas the test statistics 2ST and SYT performed
reasonably only under the UN (only for a certain covariance matrix) and SIM (only in
the case of )12 =σ structures. In addition, for all covariance matrix structures, the
proposed test statistic 2T is outstanding when compared to these competitive tests,
especially for small or medium sample size at any p, where 2,1, =≥ knp k .
Finally, the two proposed test statistics 2T and 2T were shown to be applicable
to real data regarding human gene expression of adenoma, adenocarcinoma, and
normal colon tissues.
5.2 Discussion
It can be seen in Chapter 4 that the efficiency of the proposed test statistic 2T
seems to be similar to the test statistic JT for some of the covariance matrix structures.
Perhaps the reason is because these two test statistics are constructed based upon
unbiased and consistent estimators of the parameter 2,1,2 =Σ itr i . It is possible that the
convergences of these estimators to the parameter are at a similar rate resulting in
coincidences in their empirical Type I error rates for some situations. However, the
power of the proposed test statistic 2T was generally better than that of the test
statistic JT for all situations investigated here.
As seen in Chapter 4, the test statistic SYT did not perform quite as well for
many of the covariance matrix structures. The reason could be that it may not be
correct to assume that the hypothesis 210 : Σ=Σ′H against 211 : Σ≠Σ′H is equivalent to
0: 210 =−′ γγH against 0: 211 ≠−′ γγH because when 211 : Σ≠Σ′H is true, 21 γγ − can
equal zero. To illustrate this, with 3=p and under ,21 Σ≠Σ suppose that
63
⎥⎥⎦
⎤
⎢⎢⎣
⎡==Σ
200
020
002
2 31 I and ,2000
0200
0020
20 32⎥⎥⎦
⎤
⎢⎢⎣
⎡==Σ I
then 31
3612
)( 21
21
1 ==ΣΣ
=trtr
γ and 31
36001200
)( 22
22
2 ==ΣΣ
=trtr
γ , leading to .021 =−γγ
In another example, we let
⎥⎥⎦
⎤
⎢⎢⎣
⎡=Σ
300
020
001
1 and ⎥⎥⎦
⎤
⎢⎢⎣
⎡=Σ
1000
0300
0020
2 ,
then 3614
)( 21
21
1 =ΣΣ
=trtr
γ and 3614
36001400
)( 22
22
2 ==ΣΣ
=trtr
γ . It is again clear that
021 =− γγ even though 21 Σ≠Σ , as previously seen.
5.3 Recommendations for Future Research
Here, possible extensions of this study are recommended as follows:
1) For one and two high-dimensional data, since it can be said that both
proposed test statistics 1T and 2T are based on the Frobenius norm, then a proposal
based on other norms should be possible.
2) An extension to the proposed test statistic 2T for testing two covariance
matrices to several high-dimensional covariance matrices could be of interest.
3) In this study, normality of the data is presumed, so developing another test
statistic without this assumption is one to be considered.
4) In this study, test statistics concerning the parameter of a normal
distribution were proposed, so a distribution free statistic could also be interesting to
study.
BIBLIOGRAPHY
Anderson, T. W. 1984. An Introduction to Multivariate Statistical Analysis.
2nd ed. New York: John Wiley & Sons.
Bartlett, M. S. 1937. Properties of Sufficiency and Statistical Tests. In Proceedings
of the Royal Society of London. Series A. Sir Mark Welland. London:
Royal Society. Pp. 268-282.
Billingsley, P. 1995. Probability and Measure. 3rd ed. New York: John Wiley &
Sons.
Birke, M. and Dette, H. 2005. A Note on Testing the Covariance Matrix for Large
Dimension. Statistics and Probability Letters. 74 (October): 281-289.
Boonyarit Choopradit and Samruam Chongcharoen. 2011a. A Test for One-sample
Repeated Measures Designs: Effect of High-dimensional Data. Journal of
Applied Sciences. 11 (October): 3285-3292.
Boonyarit Choopradit and Samruam Chongcharoen. 2011b. A Test for Two-sample
Repeated Measures Designs: Effect of High-dimensional Data. Journal of
Mathematics and Statistics. 7 (October): 332-342.
Carter, E. M. and Srivastava, M. S. 1977. Monotonicity of the Power Functions of
Modifed Likelihood Ratio Criterion for the Homogeneity of Variances and
of the Shericity Test. Journal of Multivariate Analysis. 7 (March): 229-
233.
Casella, G. and Berger, R. L. 2002. Statistical Inferences. California: Duxbury.
Dudoit, S.; Fridlyand, J. and Speed, T. P. 2002. Comparison of Discrimination
Methods for the Classification of Tumors Using Gene Expression Data.
Journal of American Statistical Association. 97 (December): 77-87.
Fisher, T.; Sun, X. and Gallagher, C. M. 2010. A New Test for Sphericity of the
Covariance Matrix for High Dimensional Data. Journal of Multivariate
Analysis. 101 (November): 2554-2570.
65
Fujikoshi, Y.; Ulyanov, V. and Shimizu, R. 2010. Multivariate Statistics : High-
Dimensional and Large-Sample Approximations. New Jersey: John
Wiley & Sons.
Gamage, J. and Mathew, T. 2008. Inference on Mean Sub-vectors of Two
Multivariate Normal Populations with Unequal Covariance Matrices.
Statistics and Probability Letters. 78 (March): 420-425.
Ibrahim, J. G.; Chen, M. and Gray, R. J. 2002. Baysian Models for Gene Expression
with DNA Microarray Data. Journal of American Statistical
Association. 97 (December): 88-99.
John, S. 1971. Some Optimal Multivariate Tests. Biometrika. 58 (April): 123-127.
Johnson, D. E. 1998. Applied Multivariate Methods for Data Analysts.
California: Duxbury.
Johnson, R. A. and Wichern, D. W. 2002. Applied Multivariate Statistical Analysis.
5th ed. New Jersey: Prentice Hall.
Ledoit, O. and Wolf, M. 2002. Some Hypothesis Tests for the Covariance Matrix
when the Dimension is Large Compared to the Sample Size. The Annals
of Statistics. 30 (August): 1081-1102.
Lehmann, E. L. 1999. Elements of Large-Sample Theory. New York: Springer.
Lehmann, E. L. and Romano, J. P. 2005. Testing Statistical Hypotheses. 3rd ed.
New York: Springer.
Lin, Z. and Xiang, Y. 2008. A Hypothesis Test for Independence of Sets of Variates
in High Dimensions. Statistics and Probability Letters. 78 (December):
2939-2946.
Notterman, D. A.; Alon, U.; Sierk, A. J. and Levine, A. J. 2001. Transcriptional
Gene Expression Profiles of Colorectal Adenoma, Adenocarcinoma, and
Normal Tissue Examined by Oligonucleotide Arrays. Cancer Research.
61 (April): 3124-3130.
Pearlman, M. D. 1980. Unbiasedness of the Likelihood Ratio Tests for Equality of
Several Covariance Matrices and Equality of Several Multivariate Normal
Populations. The Annals of Statistics. 8 (March): 247-263.
Rao, C. R. 1973. Linear Statistical Inference and Its Application. 2nd ed.
New York: John Wiley & Sons.
66
Rencher, A. R. 2003. Linear Models in Statistics. New York: John Wiley & Sons.
Rohatgi, V. K. 1984. Statistical Inference. New York: John Wiley & Sons.
Samruam Chongcharoen. 2011. Inversion of Covariance Matrix for High
Dimensional Data. Journal of Mathematics and Statistics. 7 (July):
227-229.
Schott, J. R. 2007. A Test for the Equality of Covariance Matrices when the
Dimension is Large Relative to the Sample Sizes. Computational
Statistics and Data Analysis. 51 (August): 6535-6542.
Serdobolskii, V. I. 1999. Theory of Essentially Multivariate Statistical Analysis.
Uspekhi Matematicheskikh Nauk. 54 (2): 85-112.
Srivastava, M. S. 2002. Methods of Multivariate Statistics. New York: John
Wiley & Sons.
Srivastava, M. S. 2005. Some Tests Concerning the Covariance Matrix in High
Dimensional Data. Journal of Japan Statistical Society. 35 (2): 251-272.
Srivastava, M. S. 2006. Some Tests Criteria for the Covariance Matrix with Fewer
Observations than the Dimension. Acta Et Commentationes
Universitatis Tartuensis De Mathematica. 10 (1): 77-93
Srivastava, M. S. 2007a. Multivatiate Theory for Analyzing High-dimensional Data.
Journal of Japan Statistical Society. 37 (1): 53-86.
Srivastava, M. S. 2007b. Testing the Equality of Two Covariance Matrices and
Independence of Two Sub-vectors with Fewer than Observations than the
Simension. In International Conference on Advances in
Interdisciplinary Statistics and Combinatorics. Oct: 12-14. Sat Gupta.
North Carolina: Taylor & Francis.
Srivastava, M. S. 2010. Methods of Multivariate Statistics. New York: John Wiley
& Sons.
Srivastava, M. S. and Yanagihara, H. 2010. Testing the Equality of Several
Covariance Matrices with Fewer Observations than the Dimension.
Journal of Multivariate Analysis. 101 (July): 1319-1329.
APPENDICES
APPENDIX A
Expected Values and Variances of the Estimators
For a positive symmetric definite matrix ,Σ by spectral decomposition, we can
obtain ,Γ′ΓΛ=Σ where )...,,,( 21 pdiag λλλ=Λ with iλ being the thi eigenvalue of Σ
and Γ is an orthogonal matrix with each column normalized corresponding to
eigenvectors .p21 γ,...,γ,γ Similarly, 0Σ can also be written as ,RRD ′=Σ0 where
)...,,,( 21 pddddiag=D with id being the thi eigenvalue of ,0Σ and R an orthogonal
matrix with each column normalized corresponding to eigenvectors p21 r,...,r,r
(Rencher, 2003: 46).
A.1 Expressions of the Estimators mm tr
pa )(1 1
0 ΣΣ= −
We can write the expressions ,8,...,1,)(1 10 =ΣΣ= − mtr
pa m
m in terms of
eigenvalues as follows:
.1
)D(1
))RDR((1
))RRD((1
)(1
1
1
1
1
10
∑=
−
−
−
−
⎟⎟⎠
⎞⎜⎜⎝
⎛ λ=
Λ=
Γ′ΛΓ′=
Γ′ΓΛ′=
ΣΣ=
p
i
m
i
i
m
m
m
mm
dp
trp
trp
trp
trp
a
69
A.2 Expressions of the Estimators for 1a and 2a
Let ),1,(~)1( −Σ′=− nWn pYYS where )( 1−= n21 y...,,y,yY , and each
),(~ Σ0y pj N and is independent (Anderson, 1984: Section 3.3). In addition, let
)( 1n21 Z...,,Z,ZZ −= , where jZ are independently and identically distributed
(iid.) ),( IN p 0 , then we can write ZY 21
Σ= , where .21
21
Σ=ΣΣ Define
),( p21 w,...,w,wZW =Γ′′=′ and each iw are iid. ).,(1 INn 0− Thus, define iiww′=iiυ
as iid. Chi-squared random variables with 1−n degree of freedom.
We can write )(1 10 S−Σtr
pand 21
0 )(1 S−Σtrp
in terms of Chi-squared random
variables as follows:
( )
( )( )( )( )
.)1(
1)1(
1)1(
1)1(
1)1(
1)1(
11
11
)(1ˆ
1
1
1
1
1
1
1
1
∑
∑
=
=
−
−
−
−
−
−
−=
′−
=
′Λ−
=
Γ′ΓΛ′−
=
Σ′−
=
′−
=
⎟⎠⎞
⎜⎝⎛ ′
−′=
Σ=
p
iii
i
i
p
i i
i
dpn
dpn
trpn
trpn
trpn
trpn
ntr
p
trp
a
υλ
λii
10
ww
WWD
ZZD
ZZD
YYD
YYRRD
S
Similarly,
( )
( )( )WWDWWD
WWD
YYRDRS10
′Λ′Λ−
=
′Λ−
=
⎟⎠⎞
⎜⎝⎛ ′
−=Σ
−−
−
−−
112
212
212
)1(1
)1(1
111)(1
trpn
trpn
ntr
ptr
p
70
( )( )( )
,)1(
2)1(
1
2)1(
1
2)1(
1)1(
1)(1
22
2
12
2
2
22
12
2
2
12
2
2
112
2
∑∑
∑∑
∑∑
<=
<=
<=
−−−
−+
−=
⎥⎥⎦
⎤
⎢⎢⎣
⎡+
−=
⎥⎥⎦
⎤
⎢⎢⎣
⎡′′+′′
−=
Λ′Λ′−
=Σ
p
jiij
ji
jiii
p
i i
i
p
jiij
ji
jiii
p
i i
i
p
ji ji
jip
i i
i
ddpndpn
dddtr
pn
dddtr
pn
trpn
trp
υλλ
υλ
υλλ
υλ
λλλjjiiiiii
10
wwwwwwww
WDWWDWS
where jiww′=ijυ .
Let
( )( ) ( ) .)(1
1)(112
)1(ˆ 210
210
2
2 ⎥⎦⎤
⎢⎣⎡ Σ
−−Σ
+−−
= −− SS trn
trpnn
na
Since ,1)1)(2(
)1( 2
≈+−
−nn
n
( )
,1
1)1(
2)1(2
2)1(
1)1(
2)1(
11
11
11
)1(2
)1(11
)(1
1)(1ˆ
21
22
2
12
2
3
2
12
2
32
22
12
2
2
2
1
22
2
12
2
2
210
2102
bbnddpndpn
ndddnddndnp
dnnddndnp
trn
trp
a
p
jijjiiij
ji
jiii
p
i i
i
p
jijjii
ji
jiii
p
i i
ip
jiij
ji
jiii
p
i i
i
p
iii
i
ip
jiij
ji
jiii
p
i i
i
+=
⎟⎠⎞
⎜⎝⎛
−−
−+
−−
=
⎥⎥⎦
⎤
⎢⎢⎣
⎡+
−−
⎥⎥⎦
⎤
⎢⎢⎣
⎡
−+
−=
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛−−
−−
+−
=
⎥⎦⎤
⎢⎣⎡ Σ
−−Σ≈
∑∑
∑∑∑∑
∑∑∑
<=
<=<=
=<=
−−
υυυλλ
υλ
υυλλ
υλ
υλλ
υλ
υλ
υλλ
υλ
SS
where
.1
1)1(
2,)1(2 2
222
12
2
31 ∑∑<=
⎟⎠⎞
⎜⎝⎛
−−
−=
−−
=p
jijjiiij
ji
jiii
p
i i
i
nddpnb
dpnnb υυυ
λλυ
λ
A.3 Expected Values of the Estimators
The following lemma taken from Srivastava (2005) and Fisher et al. (2010)
gives important results for proving various lemmas in this work.
71
Lemma A.1. For iiww′=iiυ and jiww′=ijυ , for any ,ji ≠
,...2,1),32)...(1)(1()( =−++−= rrnnnE riiυ ),1(2)( −= nVar iiυ
),2)(1)(1(8)( 2 ++−= nnnVar iiυ ),1(8)]1([ 3 −=−− nnE iiυ
),3)(1(12)]1([ 4 +−=−− nnnE iiυ
[ ],)()1(272)1)(1(3)]1)(1([ 3442 nOnnnnnE ii +−+−=+−−υ
,1)( 2 −= nE ijυ ),1)(1(3)( 4 +−= nnE ijυ
),1)(1()( 2 +−= nnE ijiiυυ ),3)(1)(1()( 22 ++−= nnnE ijiiυυ
.)1)(1()( 22 +−= nnE jjiiij υυυ
Proof. The first 6 results can be found in Srivastava (2005) and the last 5 results can
be found in Fisher et al. (2010).
A.3.1 Expected Values of 1a and 2a
Lemma A.2 For 1a and 2a as defined above,
,)ˆ( 11 aaE = and .)ˆ( 22 aaE = (A.1)
Proof Using the results from Lemma A.1.,
( )
.
)(1
1
)1()1(
1
)()1(
1)1(
11)ˆ(
1
10
1
1
1
1
101
a
trp
dp
ndpn
Edpn
dpnEtr
pEaE
p
i i
i
p
i i
i
ii
p
i i
i
ii
p
i i
i
=
ΣΣ=
=
−−
=
−=
⎟⎟⎠
⎞⎜⎜⎝
⎛−
=⎥⎦
⎤⎢⎣
⎡Σ=
−
=
=
=
=
−
∑
∑
∑
∑
λ
λ
υλ
υλ
S
Since ,01
12 =⎟⎠⎞
⎜⎝⎛
−− jjiiij n
E υυυ so it follows that ,0)( 2 =bE then
72
( )( )
( )( )
( )( )
.
)(1
1
)1)(1()1(2
12)1(
)()1(2
12)1(
)1(2
12)1()ˆ(
2
210
12
21
2
2
3
2
2
12
2
3
2
2
12
2
3
2
2
a
trp
dp
nndpn
nnn
n
Edpn
nnn
ndpn
nEnn
naE
p
i i
i
p
i i
i
ii
p
i i
i
ii
p
i i
i
=
ΣΣ=
=
+−−−
+−−
=
⎟⎟⎠
⎞⎜⎜⎝
⎛
−−
+−−
=
⎟⎟⎠
⎞⎜⎜⎝
⎛
−−
+−−
=
−
=
=
=
=
∑
∑
∑
∑
λ
λ
υλ
υλ
A.4 Variances of the Estimators
A.4.1 Variance of 1a
Lemma A.3.
.)1(
2)ˆ( 21 apn
aVar−
=
Proof By applying Lemma A.1.,
.)1(
2
)(1)1(
2
1)1(
2
)1(2)1(
1
)()1(
1)1(
1)ˆ(
2
210
12
21
22
122
11
apn
trppn
dppn
ndpn
Vardpn
dpnVaraVar
p
i i
i
p
i i
i
p
iii
i
i
p
iii
i
i
−=
⎟⎟⎠
⎞⎜⎜⎝
⎛ΣΣ
−=
⎟⎟⎠
⎞⎜⎜⎝
⎛
−=
−−
=
−=
⎟⎟⎠
⎞⎜⎜⎝
⎛−
=
−
=
=
=
=
∑
∑
∑
∑
λ
λ
υλ
υλ
73
A.4.2 Variance of 2a
Lemma A.4.
.8)1(
)2)(1()2(8)( 445
2
1 anp
apn
nnnbVar ≈−
++−=
Proof By applying Lemma A.1., the variance of 1b can be easily found as follows:
( )
.8
)(18
18
1)1(
)2)(1()2(8)1(
)2)(1()2(8
)2)(1)(1(8)1(
)2()1(
)2()1(
)2()(
4
410
14
41
4
4
5
21
4
4
25
21
4
4
26
2
2
14
4
26
2
2
12
2
26
2
1
anp
trpnp
dpnp
dppnnnn
dpnnnn
nnndpn
n
Vardpn
nd
Varpn
nbVar
p
i i
i
p
i i
i
p
i i
i
p
i i
i
ii
p
i i
i
ii
p
i i
i
=
⎟⎟⎠
⎞⎜⎜⎝
⎛ΣΣ=
⎟⎟⎠
⎞⎜⎜⎝
⎛ λ≈
⎟⎟⎠
⎞⎜⎜⎝
⎛ λ
−++−
=
λ
−++−
=
++−λ
−−
=
υλ
−−
=
⎟⎟⎠
⎞⎜⎜⎝
⎛υ
λ
−−
=
−
=
=
=
=
=
=
∑
∑
∑
∑
∑
∑
Lemma A.5.
.1)1(
)1)(2(4)( 42242 ⎟⎟
⎠
⎞⎜⎜⎝
⎛−
−+−
= ap
an
nnbVar
Proof By applying Lemma A.1., we compute
74
,)1()1)(1(3)]([)()( 22242 −−+−=−= nnnEEVar ijijij υυυ
,)1(4)]1)(1[()1()1(
)]([)()()]([)()(
2
222
222
22
−=−−−+−=
−=−=
nnnnnn
EEEEEVar
jjiijjii
jjiijjiijjii
υυυυυυυυυυ
( )
( )( )[ ]
( )[ ]
( ) ( ) ( ) ( )[ ]
[ ]
( )[ ]n
nnnn
nnnnnnnnn
nEEnEnEn
nnnEn
nnEn
nE
nEE
nCOV
jjiiijjjiiij
jjiiijjjiiij
jjiiij
jjiijjiiijijjjiiij
4
)1()1()1(1
1
)1()1)(1)(1()1()1()1)(1(1
1
)1()1()1(1
1
)1()1()1(1
1
)1()1(1
1
11
11)(
11,
22
322
3222
3222
22
222
=
−−+−−
=
−+−−−−−−−+−−
=
−+−−−−−
=
−+−−−−−
=
−−−−−
=
⎥⎦
⎤⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−
−−=⎟
⎠⎞
⎜⎝⎛
−
υυυυυυ
υυυυυυ
υυυ
υυυυυυυυυ
( )
⎟⎟⎠
⎞⎜⎜⎝
⎛−
−+−
=
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛ΣΣ−⎟⎟
⎠
⎞⎜⎜⎝
⎛ΣΣ
−+−
=
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟⎟
⎠
⎞⎜⎜⎝
⎛
−+−
=
−+−
=
⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟⎟
⎠
⎞⎜⎜⎝
⎛−
−+−−+−
−=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−
−+
−=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−
−=
−−
==
<
<
<
<
∑∑
∑
∑
∑
∑
4224
410
221
04
14
42
12
2
4
22
22
24
22
222
22
24
22
222
22
24
2242
1)1(
)1)(2(4
)(11)(1)1(
)1)(2(4
111)1(
)1)(2(4
2)1(
)1)(2(4
)4(2)1(4)1(
1)1()1)(1(3)1(
41
1,2)()1(
1)()1(
41
1)1(
4)(
ap
an
nn
trpp
trpn
nn
dppdpnnn
ddpnnn
nnnn
nnnddpn
nCOVVar
nVar
ddpn
nddVar
pnbVar
p
i i
ip
i i
i
p
ji ji
ji
p
ji ji
ji
p
jijjiiijjjiiij
ji
ji
p
jijjiiij
ji
ji
λλ
λλ
λλ
υυυυυυλλ
υυυλλ
75
Lemma A.6. .0),( 21 =bbCOV
Proof Since ,0)( 2 =bE
.01
1...
11
11
)1()2(2
11
)1()2(2
)(),(
222
2
22222
2
22
22112
1
21
5
22
12
2
5
2121
=⎪⎪⎪⎪
⎭
⎪⎪⎪⎪
⎬
⎫
⎪⎪⎪⎪
⎩
⎪⎪⎪⎪
⎨
⎧
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−++
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−+
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−
−−
=
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−⎟⎟
⎠
⎞⎜⎜⎝
⎛
−−
=
=
∑
∑
∑
∑∑
<
<
<
<=
p
kjkkjjjk
kj
kjpp
p
p
p
kjkkjjjk
kj
kj
p
kjkkjjjk
kj
kj
p
kjkkjjjk
kj
kjii
p
i i
i
ndddE
ndddE
ndddE
pnn
ndddE
pnn
bbEbbCOV
υυυλλ
υλ
υυυλλ
υλ
υυυλλ
υλ
υυυλλ
υλ
Note that, using the results in Lemma A.1.,
for ,kji ≠≠
for ,kji ≠=
for .ikj =≠
Lemma A.7. Let
( ),
)(1
1)(1ˆ
21
210
2102
bb
trn
trp
a
+=⎥⎦⎤
⎢⎣⎡ Σ
−−Σ≈ −− SS
then, 22242
48)ˆ( an
anp
aVar +≈ as .),( ∞→np
( ) ( )
( )( ) ( )( )
( )( ) ( )( ) 0)1(31)1(1
131)1(
11
0)1(31)1(1
131)1(
11
,0)1)(1(1)1(1
1)1(1)1(
111
1
322
322
222
22
=⎟⎠⎞
⎜⎝⎛ −++−
−−++−=
⎟⎠⎞
⎜⎝⎛
−−=
=⎟⎠⎞
⎜⎝⎛ −++−
−−++−=
⎟⎠⎞
⎜⎝⎛
−−=
=⎟⎠⎞
⎜⎝⎛ −−+−
−−−+−=
⎟⎠⎞
⎜⎝⎛
−−=
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎠⎞
⎜⎝⎛
−−
∑
∑
∑
∑
∑
∑
∑
<
<
<
<
<
<
<
p
kj kj
kj
p
kjjjiiijii
kj
kj
p
kj kj
kj
p
kjkkiiikii
kj
kj
p
kj kj
kj
p
kjkkjjiijkii
kj
kj
p
kjkkjjjk
kj
kjii
nnnnn
nnndd
nE
dd
nnnnn
nnndd
nE
dd
nnnnn
nnndd
nE
dd
nddE
λλ
υυυυλλ
λλ
υυυυλλ
λλ
υυυυυλλ
υυυλλ
υ
76
Proof From Lemma A.4., A.5. and A.6.,
.48
4)2(4
4)12(4
448
448
148
1)1(
)1)(2(4)1(
)2)(1()2(8),(2)()()ˆ(
2224
22242
22242
22242
22242
42224
422445
221212
an
anp
an
apnn
an
apn
n
an
apnnp
an
apnnp
ap
an
anp
ap
an
nnapn
nnnbbCOVbVarbVaraVar
+=
+≈
+−
=
+⎟⎟⎠
⎞⎜⎜⎝
⎛−=
+⎟⎟⎠
⎞⎜⎜⎝
⎛−=
⎟⎟⎠
⎞⎜⎜⎝
⎛−+≈
⎟⎟⎠
⎞⎜⎜⎝
⎛−
−+−
+−
++−=
++=
A.5 Covariance Terms Lemma A.8. Proof Using the results from Lemma A.1.,
( )
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛
−+−
−⎥⎥⎦
⎤
⎢⎢⎣
⎡+
−−
=
⎟⎟⎠
⎞⎜⎜⎝
⎛
−−
⎟⎟⎠
⎞⎜⎜⎝
⎛−
−
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛
−−
=
−=
∑∑∑∑
∑∑
∑∑
==≠=
==
==
p
i i
ip
i i
ijjii
p
ji ji
jiii
p
i i
i
ii
p
i i
ip
iii
i
i
ii
p
i i
iii
p
i i
i
ddpnnn
dddE
pnn
dpnnE
dpnE
ddE
pnn
bEaEbaEbaCOV
12
2
122
22
23
13
3
24
2
12
2
31
2
12
2
124
111111
)1()1)(2(
)1(2
)1(2
)1(1
)1(2
)ˆ()ˆ(),ˆ(
λλυυ
λλυ
λ
υλ
υλ
υλ
υλ
.),(as4
)1()1)(2(4
),ˆ(
3
33
11
∞→≈
−+−
=
npnpa
pnann
baCOV
77
npa
trppn
nndpn
nndpn
nnn
ddpnnn
dpnnn
ddpnnnn
dpnnnnn
dddpnnn
nndd
nnndpn
nbaCOV
p
i i
ip
i i
i
p
ji jj
jip
i i
i
p
ji ji
jip
i i
i
p
i
p
ji jj
ji
i
i
p
ji ji
jip
i i
i
33103
13
3
221
3
3
23
2
2
221
3
3
22
2
2
24
2
13
3
24
12
2
3
3
22
22
2
13
3
2411
4)(1
)1()3)(2(4
)1()1)(2(
)1()3)(1)(2(
)1()1)(2(
)1()1)(2(
)1()2)(1()1(
)1()3)(1)(1)(2(
)1()1)(2(
)1()1()3)(1)(1()1(2),ˆ(
≈⎟⎟⎠
⎞⎜⎜⎝
⎛ΣΣ
−+−
=
−+−
−−
++−=
−+−
−−
+−−
−++−
+−
++−−=
⎥⎥⎦
⎤
⎢⎢⎣
⎡+
−+−
−
⎥⎥⎦
⎤
⎢⎢⎣
⎡+−+++−
−−
=
−
==
≠=
≠=
= ≠
≠=
∑∑
∑∑
∑∑
∑ ∑
∑∑
λλ
λλλ
λλλ
λλλ
λλλ
Lemma A.9. 0),ˆ( 21 =baCOV .
Proof From the fact that ,0)( 2 =bE then
.01
1)1(
2)ˆ(),ˆ(
2
123
2121
=⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−⎟⎟
⎠
⎞⎜⎜⎝
⎛
−=
=
∑∑<=
p
jijjiiij
ji
jip
iii
i
i
ndddE
pn
baEbaCOV
υυυλλ
υλ
.
This follows from the note in Lemma A.6.
Lemma A.10.
.4)ˆ,ˆ( 321 anp
aaCOV ≈
Proof
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎟⎠⎞
⎜⎝⎛
−−
−+
−−
⎟⎟⎠
⎞⎜⎜⎝
⎛−
=
∑
∑∑
<
=
=p
jijjiiij
ji
ji
ii
p
i i
ip
iii
i
i
nddpn
dpnn
dpnEaaE
υυυλλ
υλ
υλ
11
)1(2)1(2
)1(1)ˆˆ(
22
2
12
2
3
121
78
.
11
)1(2
)1(2
)ˆˆ(2
123
2
12
2
124
21
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−⎟⎟
⎠
⎞⎜⎜⎝
⎛
−+
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛
−−
=
∑∑
∑∑
<=
==
p
jijjiiij
ji
jip
iii
i
i
ii
p
i i
ip
iii
i
i
ndddpn
ddpnn
EaaEυυυ
λλυ
λ
υλ
υλ
The expectation of the second term equals zero, so
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛
−+−
+−
+−=
⎥⎥⎦
⎤
⎢⎢⎣
⎡−⎟⎟
⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛
−+−
+−
++−=
−+−
+−
++−=
⎥⎥⎦
⎤
⎢⎢⎣
⎡+−+++−
−−
=
⎥⎥⎦
⎤
⎢⎢⎣
⎡+
−−
≈
∑∑∑
∑∑∑∑
∑∑
∑∑
∑∑
===
====
≠=
≠=
≠=
p
i i
ip
i i
ip
i i
i
p
i i
ip
i i
ip
i i
ip
i i
i
p
ji ji
jip
i i
i
p
ji ji
jip
i i
i
p
jijjii
ji
jiii
p
i i
i
ddpnnn
dpnnn
dddpnnn
dpnnnn
ddpnnn
dpnnnn
nndd
nnndpn
n
dddE
pnnaaE
112
2
221
3
3
23
13
3
112
2
221
3
3
23
2
2
221
3
3
23
22
2
13
3
24
22
23
13
3
2421
)1()1)(2(
)1()1)(2(4
)1()1)(2(
)1()3)(1)(2(
)1()1)(2(
)1()3)(1)(2(
)1()1()3)(1)(1()1(2
)1(2)ˆˆ(
λλλ
λλλλ
λλλ
λλλ
υυλλ
υλ
.4
)(1)(1)(14
1114
14
123
10
210
310
112
2
13
311
2
2
21
3
3
2
aaanp
trp
trp
trpnp
dpdpdpnp
ddpdnpp
i i
ip
i i
ip
i i
i
p
i i
ip
i i
ip
i i
i
+=
⎟⎟⎠
⎞⎜⎜⎝
⎛ΣΣ⎟⎟
⎠
⎞⎜⎜⎝
⎛ΣΣ+⎟⎟
⎠
⎞⎜⎜⎝
⎛ΣΣ=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛+⎟
⎟⎠
⎞⎜⎜⎝
⎛=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛+≈
−−−
===
===
∑∑∑
∑∑∑λλλ
λλλ
Using the results from Lemma A.2.,
.4
)(14
4
1114)ˆ()ˆ()ˆˆ()ˆ,ˆ(
3
310
13
3
2
12
2
1112
2
21
3
3
2
212121
anp
trpnp
dnp
dpdpddpdnp
aEaEaaEaaCOV
p
i i
i
p
i i
ip
i i
ip
i i
ip
i i
ip
i i
i
=
⎟⎟⎠
⎞⎜⎜⎝
⎛ΣΣ=
=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛−
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛+≈
−=
−
=
=====
∑
∑∑∑∑∑λ
λλλλλ
79
A.6 Asymptotic Variance and Covariance Terms
The variance and covariance terms are simplified by finding their asymptotic
values under assumptions (A1) and (A2), as well as .),( ∞→np
,2
)1(2
)ˆ( 221 np
apn
aaVar ≈
−= (A.2)
,8
)1()2)(1()2(8)( 4
45
2
1 npa
apn
nnnbVar ≈−
++−= (A.3)
,14
1)1(
)1)(2(4)(
4222
42242
⎟⎟⎠
⎞⎜⎜⎝
⎛−≈
⎟⎟⎠
⎞⎜⎜⎝
⎛−
−+−
=
ap
an
ap
an
nnbVar (A.4)
,48)ˆ( 22242 a
na
npaVar +≈ (A.5)
.4
)1()1)(2(4
),ˆ( 323
311 np
apn
annbaCOV ≈
−+−
= (A.6)
.4
)ˆ,ˆ( 123
21 aanpa
aaCOV +≈ (A.7)
APPENDIX B
Proof of Theorem 3.1.2
To find the distributions of 1a and ,ˆ2a the Lyapunov-type Central Limit
Theorem from Rao (1973: 147) is used and given in the following theorem.
Theorem B.1. (Central Limit Theorem from Rao (1973: 147))
Let nXXX ,...,2,1 be a sequence of independent −p dimensional random
variables, such that ,0)( =iE X and iΣ is the pp × covariance matrix of .iX Suppose
that, as ,∞→n 0≠Σ→∑=
Σ 01
1 n
i in and, for every ,0>ε
01
21→∑
=∫ε>
n
i idF
iin
nXX ,
where iF is the distribution function of iX and iX is the Euclidean norm of vector
.iX Following this, the random variables nn /)...1( XX ++ converge to a
−p variate normal distribution with mean zero and covariance matrix 0.Σ
Proof (See proof in Rao (1973: 147)).
Since, from (3.3) in Chapter 3 (page 23), ],)1)(2(
)1(ˆ 21
2
2 [ bbnn
na ++−
−= then it is
necessary to find the distributions of 11,ˆ ba and 2b which will standardize them as
normally distributed. We start by finding the distributions of 11,ˆ ba (both are functions
of )iiυ and of 2b (a function of )., jiij ≠υ Subsequently, the distribution of 2a , which
is a distribution of a linear function of two normal random variables, is obtained.
81
First, in order to find the distribution of 1a and 1b , under the eigenvalues iλ
and id , as defined in Appendix A (page 67), we let
,1
))1((1 −
−−=
ndn
ui
iiii
υλ and ,
)2)(1)(1())1)(1((
2
22
2 ++−
+−−υλ=
nnndnn
ui
iiii
where
,0)(,0)( 21 == ii uEuE
,2
)1(2)1(
)()1(
)( 2
2
2
2
2
2
1i
i
i
iii
i
ii d
ndn
Vardn
uVarλλ
υλ
=−−
=−
=
.8
)2)(1)(1(8)2)(1)(1(
1
)()2)(1)(1(
1)(
4
4
4
4
24
4
2
i
i
i
i
iii
ii
d
nnndnnn
Vardnnn
uVar
λ
λ
υλ
=
++−++−
=
++−=
Since ,0)(,0)( 21 == ii uEuE
( )( )[ ]
[ ]
3
3
3
3
3
3
3
2233
3
23
2121
4
)2(14
)2)(1()1()1)(1(4
)2)(1()1()1()1()1)(1()1(
)2)(1()1()1)(1()1(
),(),(
i
ni
i
i
i
i
i
iiiiiii
i
iiiii
iiii
de
ndn
nnndnn
nnndnnnnnE
nnndnnnE
uuEuuCOV
λ=
+
+λ=
++−
+−λ=
++−
+−+υ+−−υ−−υλ=
++−
+−−υ−−υλ=
=
and 12/1 ≈++= nnen as .∞→n
82
Since siiυ are independent, then ),( 21 ′= iii uuU are independently distributed random
vectors, for ,,...,1 pi = with ,)( 0U =iE and covariance matrix inΩ given by
.,...,1,84
42
4
4
3
3
3
3
2
2
pi
dde
de
d
i
i
i
ni
i
ni
i
i
in =
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=Ωλλ
λλ
For any ,n as ∞→p
.8442
84
42
...
0
43
32
14
4
13
31
3
3
12
2
1
0≠Ω→⎥⎦⎤
⎢⎣⎡=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
Ω++Ω=Ω
∑∑
∑∑
==
==
nn
n
p
i i
ip
i i
in
p
i i
inp
i i
i
pnnn
aaeaea
dpdpe
dpe
dp
p
λλ
λλ
where .8442
43
320⎥⎦⎤
⎢⎣⎡=Ω αα
ααn
nn e
e
If iF is the distribution function of ,iu then
∑
∑
∑ ∫∑ ∫
=
=
== >′
+≤
+=
′≤′
p
iii
p
iii
p
iiii
p
i p
iii
uuEp
uuEp
dFpp
dFp
ii
1
42
4122
1
222
2122
1
22
1
),(2
)(1
)111(
ε
ε
εε
uuuuuu
from the −rC inequality in Rao (1973: 149). Since, as ,∞→p and from Lemma B.1,
[ ]
0
)3)(1(12)1(
2
)1()1(
2)(2
124
4
22
4
124
4
221
4122
→
+−−
=
−−−
=
∑
∑∑
=
==
nnndp
nEndp
uEp
p
i i
i
ii
p
i i
ip
ii
λε
υλ
εε
and, by an analogous derivation, as ,∞→p
83
.0
)2()1()1()]()1(272)[1)(1(32
)2()1()1()]1)(1([2)(2
12228
348
22
12228
428
221
4222
→
++−+−+−
=
++−+−−
=
∑
∑∑
=
==
p
i i
i
p
i i
iiip
ii
nnndnOnnn
p
nnndnnE
puE
p
λε
υλεε
Hence ,0)(21
42
4122 →+∑
=
p
iii uuE
pε as .∞→p
By applying the multivariate central limit theorem in Theorem B.1, as ,∞→p for any
,n
).,(
)2)(1())1)(1((
)1(1
))1(()1(
1
)...(1 02
12
221
21 nD
p
i i
iii
p
i i
iii
p N
nndnn
pn
dn
pnp
Ω⎯→⎯
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
+++−−
−
−−−=+++
∑
∑
=
= 0UUUυλ
υλ
Note that, as ,∞→n and ,1≈ne
0
43
320
8442
Ω→⎥⎦⎤
⎢⎣⎡=Ω αα
ααn
nn e
e ,
where .8442
43
320⎥⎦⎤
⎢⎣⎡=Ω αα
αα
Thus, it follows that, as ,),( ∞→pn
).,(
)2)(1())1)(1((
)1(1
))1(()1(
10
2
12
221 Ω⎯→⎯
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
++
+−−
−
−−
−
∑
∑
=
= 0N
nndnn
pn
dn
pn Dp
i i
iii
p
i i
iii
υλ
υλ
Subsequently, under assumption (A2) which leads to assuming that ,0ΩΩ →
where ,8442
43
32⎥⎦⎤
⎢⎣⎡=Ω aa
aa it follows that
).,(
)2)(1())1)(1((
)1(1
))1(()1(
1
2
12
221 Ω⎯→⎯
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
++
+−−
−
−−
−
∑
∑
=
= 0N
nndnn
pn
dn
pn Dp
i i
iii
p
i i
iii
υλ
υλ
84
For the first element in the previous random vector, recall that ii
p
i i
i
dpna υ
λ∑=−
=1
1 )1(1ˆ
and ,11
1 ∑=
=p
i i
i
dpa
λ since
[ ]
).ˆ()1(
)1(ˆ)1()1(
1
)1()1(
1))1(()1(
1
11
11
111
aapn
panapnpn
dn
dpndn
pn
p
i i
ip
i i
iiip
i i
iii
−−=
−−−−
=
⎥⎦
⎤⎢⎣
⎡ −−
−=
−−− ∑∑∑
===
λυλυλ
Because, as ,∞→n
),ˆ()ˆ()1( 1111 aanpaapn −≈−− then ),2,0()ˆ( 211 aNaanp D⎯→⎯− and, with
a simple linear transformation, we have
).2
,(ˆ 211 np
aaNa D⎯→⎯ (B.1)
Recall that ,1 2
12
2
31 ii
p
i i
i
dpnnb υ
λ∑=
−= and .1
12
2
1 ∑=
=p
i i
i
dpa
λ The second element is also
considered and we obtain
.)2)(1(
)1)(1()2)(1()2(
)1()1(
1)2)(1(
)1)(1()2)(1()1(
1)2)(1())1)(1((
)1(1
213
12
2
12
221
2
22
⎥⎥⎦
⎤
⎢⎢⎣
⎡
++
+−−
++−
−
−=
⎥⎥⎦
⎤
⎢⎢⎣
⎡
++
+−−
++−=
++
+−−
−
∑∑
∑
==
=
nnpann
nnnpbn
pn
nndnn
nndpn
nndnn
pnp
i i
ip
i i
iii
p
i i
iii
λυλ
υλ
Since, as ,∞→n
),(
)(1)2)(1(
)1)(1()3)(1()2(
)1()1(
1
21
2121
3
abnp
npanpbnpnn
pannnnn
pbnpn
−=
−≈⎥⎥⎦
⎤
⎢⎢⎣
⎡
++
+−−
++−
−
−
then ),8,0()( 421 aNabnp D⎯→⎯− and, with a simple linear transformation, we
obtain the result
85
).8
,( 421 np
aaNb D⎯→⎯ (B.2)
To find the distribution of ,2b the Lindeberg Central Limit Theorem from
Billingsley (1995: 359) is also made use of.
Theorem B.2. (Lindeberg Central Limit Theorem from Billingsley (1995: 359))
Let nXX ,...,1 be a sequence of independent random variables which satisfies
i) ,0)( =iXE and ii) ).( 22ii XE=σ
Let 01
22 >σ=∑=
n
iinS and iP be the distribution function of .iX
If ,01
22
1→∑
=∫ε≥
n
insiX
idPiX
nSfor ,0>ε then
).1,0(1 ND
nS
n
i iX
nSnM
⎯→⎯∑==
Proof (See proof in Billingsley (1995: 359)).
Srivastava (2005) produced an important result, which is used for the next
proof, as
)1,0(~1
Nn
ij
−
υ,
as ,∞→n leading to 21
2
~1
χυ−nij , which are asymptotically independently distributed
for all distinct i and .j
Note that 2b was defined in (A.4) in Appendix A, and so now let
).1
1()1(
2 22 jjiiij
ji
jiij ndpdn
υυυλλ
η−
−−
=
86
From Lemma A.1., we have ,0)( =ijE η and let
,4
)1()1)(2(4
)(
11
)1(4
)1
1()1(
2)(
4222
4224
2
224
22
2
⎟⎟⎠
⎞⎜⎜⎝
⎛−≈
⎟⎟⎠
⎞⎜⎜⎝
⎛−
−+−
=
=
⎟⎠⎞
⎜⎝⎛
−−
−=
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
−−
−==
∑
∑∑
<
<<
paa
n
paa
nnn
bVar
nVar
ddpn
ndpdnVarVarS
p
jijjiiij
ji
ji
p
jijjiiij
ji
jip
jiijp
υυυλλ
υυυλλ
η
as .),( ∞→np
Let .1
1)1(
22
22 b
nddpnM
p
jijjiiij
ji
jip
jiijp =⎟
⎠⎞
⎜⎝⎛
−−
−== ∑∑
<<
υυυλλ
η
If ijP is the distribution function of ,ijη for ,0>ε then
08
)1(
)1)(2(8
11
)1(4
11
)1(4
)(1
11
222222
22
222224
22
2
222
22
2224
2
22224
222
222
22
→≈
−
+−=
⎟⎠⎞
⎜⎝⎛
−−
−=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛
−−
−=
=
<
∑
∑
∑
∑
∑
∑ ∫∑ ∫
<
<
<
<
<
<< >
p
ji jip
ji
p
ji jip
ji
p
jijjiiij
ji
ji
p
p
jijjiiij
ji
ji
p
p
jiij
p
p
jiijij
p
p
ji Sijij
p
ddSpn
ddSpn
nn
nE
ddSpn
nddE
Spn
ES
dPS
dPS
pij
ε
λλ
ε
λλ
υυυλλ
ε
υυυλλ
ε
ηε
ηε
ηεη
as .∞→p
87
Subsequently, it follows from the Lindeberg Central Limit Theorem in Theorem B.2.,
).1,0(2 42
2
2 N
pa
a
bnSM D
p
p ⎯→⎯
−
=
Subsequently, by a linear transformation, we obtain
.4,0 42222 ⎟⎟
⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛−⎯→⎯
pa
an
Nb D (B.3)
By applying Lemma A.6 in Appendix A, 1b and 2b are asymptotically independent.
Note that 2a is a linear function of the two random variables 1b and ,2b i.e.
2121
2
2 ])1)(2(
)1(ˆ [ bbbbnn
na +≈++−
−= ,
as .∞→n By applying Lemma A.6 and (A.5) in Appendix A, as well as (B.2) and
(B.3), we obtain
.48,ˆ 222422 ⎟⎟⎠
⎞⎜⎜⎝
⎛+⎯→⎯ a
na
npaNa D (B.4)
From (A.7) in Appendix A, we have ,4
)ˆ,ˆ( 321 np
aaaCOV ≈ and (B.1) and (B.4), then
we obtain the joint distribution of 1a and 2a as
.484
42
,ˆˆ
2224
3
32
2
12
2
1
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
+⎟⎠⎞
⎜⎝⎛⎯→⎯⎟
⎠⎞
⎜⎝⎛
an
anpnp
anpa
npa
aaNa
a D
The proof is completed.
APPENDIX C
FORTRAN Syntax for One High-Dimensional Data
INTEGER IRANK,ISEED,J,LDR,LDRSIG,NOUT,LDCOV,LDINCD,IDO,
& IFRQ,INCD(LDINCD,1), IWT, MOPT, NMISS, NOBS, NROW,
& N, P, ITERATION, LDX, COUNT1, COUNT2,COUNT3,CORCASE,
& ALTER_COV_CASE
PARAMETER (K=10000,N=10,P=10,SIG_SQ=1,IDINCD=1)
C SIG_SQ IS SIGMA SQUARE
REAL V0(P,P),V1(P,P),X(N,P),RSIG(P,P),S(P,P),TR_V0,
& V0_INV(P,P),A(P,P),TR_A,B(P,P),TR_B,
& T1_HAT,ABOVE,BELOW,SS(P,P),TR_SS,D,SUMWT,XMEAN(P),
& EVAL0(P),EVAL1(P),UNI(P),UNI1(P),R(P)
EXTERNAL CHFAC,RNMVN,RNSET,UMACH,LINRG,WRRRN,MRRRR,
& ANORIN,RNUN,SSCAL,SADD,EVLRG,WRCRN,CORVC
C SET A KNOWN POSITIVE DEFINITE MATRIX, V0, UNDER H0
CALL UMACH(2,NOUT)
LDRSIG =P
LDR =N
LDCOV =P
C ALL COVARIANCE MATRIX CONSIDERED
DO 9999 CORCASE=1,1
DO I =1,P
89
DO J =1,P
IF (I.EQ.J)THEN
V0(I,J)=1.0
ELSE
V0(I,J) = 0.0
ENDIF
END DO
END DO
IF(CORCASE.EQ.1) THEN
WRITE(6,*)'MATRIX V0, UN STRUCTURE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V0(I,I)=1.0
ELSE
V0(I,J) = ((-1)**(I+J))*(I/(2.0*J))
ENDIF
END DO
END DO
ELSEIF(CORCASE.EQ.2) THEN
WRITE(6,*)'MATRIX V0, CS STRUCTURE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V0(I,J)=1.0
ELSE
V0(I,J) = 0.5
ENDIF
END DO
END DO
90
ELSEIF(CORCASE.EQ.3) THEN
WRITE(6,*)'MATRIX V0, CSH STRUCTURE'
CALL UMACH(2,NOUT)
ISEED=6250
CALL RNSET(ISEED)
ALO=2.0
BHI=3.0
CALL RNUN(P,UNI)
CALL SSCAL(P,BHI-ALO,UNI,1)
CALL SADD(P,ALO,UNI,1)
CALL WRRRN('UNIFORM',1,P,UNI,1,0)
DO I=1,P
V0(I,I)=UNI(I)
END DO
RHO=0.50
DO I=1,P
DO J=1,P
IF(I.EQ.J) THEN
V0(I,I)=UNI(I)
ELSE
V0(I,J)=((UNI(I)*UNI(J))**0.5)*RHO
ENDIF
END DO
END DO
ELSEIF (CORCASE.EQ.4) THEN
WRITE(6,*)'MATRIX V0, TOEP STRUCTURE'
DO I=1,P-1
V0(I,I+1)=-0.5
IF(I+2.LE.P) THEN
91
V0(I,I+2) = 0.0
ENDIF
END DO
ELSEIF(CORCASE.EQ.5) THEN
WRITE(6,*)'MATRIX V0 CASE 1,SIMPLE STRUCTURE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V0(I,J)=2.0
ELSE
V0(I,J) = 0.0
ENDIF
END DO
END DO
END IF
DO I=1,P-1
DO J=I+1,P
V0(J,I)=V0(I,J)
END DO
END DO
C REPORT HEADER
CALL WRRRN('V0',P,P,V0,P,0)
C COMPUTE ALL OF EIGENVALUES OF MATRIX V0
CALL EVLRG(P,V0,P,EVAL0)
CALL WRCRN('EIGEN VALUES OF MATRIX V0',1,P,EVAL0,1,0)
TR_V0 = 0.0
92
DO I =1,P
TR_V0=V0(I,I)+TR_V0
END DO
C THE INVERSE OF V0, V_INV
CALL LINRG(P,V0,P,V0_INV,P)
CALL WRRRN ('COVARIANCE INVERSE, V0_INV',P,P,V0_INV,P,0)
WRITE(6,*)'TRACE OF V0 =TR_V0=',TR_V0
C SETA GIVEN KNOWN POPULATION COVARIANCE MATRIX, UNDER H1,
C V1
DO 999 ALTER_COV_CASE=6,6
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V1(I,J)=1.0
ELSE
V1(I,J) = 0.0
ENDIF
END DO
END DO
IF(ALTER_COV_CASE.EQ.1) THEN
WRITE(6,*)'MATRIX V1, UN STRUCTURE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V1(I,I)=1.0
ELSE
V1(I,J) = ((-1)**(I+J))*(I/(4.0*J))
ENDIF
93
END DO
END DO
ELSEIF(ALTER_COV_CASE.EQ.2) THEN
WRITE(6,*)'MATRIX V1 CASE 1, CS STRUCTURE '
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V1(I,J)=1.0
ELSE
V1(I,J) = 0.1
ENDIF
END DO
END DO
ELSEIF(ALTER_COV_CASE.EQ.3) THEN
WRITE(6,*)'MATRIX V1, CSH STRUCTURE'
CALL UMACH(2,NOUT)
ISEED=6250
CALL RNSET(ISEED)
ALO=3.0
BHI=4.0
CALL RNUN(P,UNI1)
CALL SSCAL(P,BHI-ALO,UNI1,1)
CALL SADD(P,ALO,UNI1,1)
CALL WRRRN('UNIFORM1',1,P,UNI1,1,0)
DO I=1,P
V1(I,I)=UNI1(I)
END DO
RHO=0.50
94
DO I=1,P
DO J=1,P
IF(I.EQ.J) THEN
V1(I,I)=UNI1(I)
ELSE
V1(I,J)=((UNI1(I)*UNI1(J))**0.5)*RHO
ENDIF
END DO
END DO
ELSEIF(ALTER_COV_CASE.EQ.4) THEN
WRITE(6,*)'MATRIX V1, TOEP STRUCTURE'
DO I=1,P-1
V1(I,I+1)=-0.45
IF(I+2.LE.P) THEN
V1(I,I+2) = 0.0
ENDIF
END DO
ELSEIF(ALTER_COV_CASE.EQ.5) THEN
WRITE(6,*)'MATRIX V1, VC STRUCTURE'
N_HALF=P/2
WRITE(6,*)'P/2 = ', N_HALF
DO I =1,P
DO J =1,P
IF (I.GT.N_HALF)THEN
V1(I,I)=0.5
ENDIF
END DO
END DO
ELSEIF(ALTER_COV_CASE.EQ.6) THEN
95
WRITE(6,*)'MATRIX V1,VC STRUCTURE'
CALL UMACH(2,NOUT)
ISEED=6250
CALL RNSET(ISEED)
CALL RNUN(P,R)
CALL WRRRN('UNIFORM VECTOR =',1,P,R,1,0)
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V1(I,J)=2.0*R(I)
ELSE
V1(I,J) = 0.0
ENDIF
END DO
END DO
END IF
DO I=1,P-1
DO J=I+1,P
V1(J,I)=V1(I,J)
END DO
END DO
999 END DO
C REPORT HEADER
CALL WRRRN('V1',P,P,V1,P,0)
C COMPUTE ALL EIGENVALUES OF MATRIX V1
CALL EVLRG(P,V1,P,EVAL1)
CALL WRCRN('EIGEN VALUES OF MATRIX V1',1,P,EVAL1,1,0)
96
C CONTRUCTE DATA X UNDER V1
C OBTAIN THE CHOLESKY FACTORIZATION
CALL CHFAC (P,V1,P,0.00001,IRANK,RSIG,LDRSIG)
C INITIALIZE SEED OF RANDOM NUMBER GERNATOR
ITERATION=0
C INITIALIZE THE NUMBER OF TIME THAT TEST STATISTIC FALL IN C
C THE CRITICAL REGION
COUNT1 = 0
COUNT2 = 0
COUNT3 = 0
C INITIALIZE SEED OF RANDOM NUMBER GENERATOR, K
ISEED =6250
CALL RNSET(ISEED)
DO 10 ITER = 1,K
ITERATION = ITERATION +1
WRITE(6,*)'ITER',ITERATION
CALL RNMVN(N,P,RSIG,LDRSIG,X,LDR)
CALL WRRRN(NXP MATRIX X',N,P,X,N,0)
C COMPUTE SAMPLE COVARIANCE MATRIX,S
CALL UMACH(2, NOUT)
IDO = 0
NROW = N
LDX = N
IFRQ = 0
IWT = 0
MOPT = 0
97
ICOPT = 0
CALL CORVC(IDO, NROW, P, X, N, IFRQ, IWT, MOPT, ICOPT,
& XMEAN, S, LDCOV, INCD, LDINCD,NOBS, NMISS, SUMWT)
CALL WRRRN('SAMPLE COV MATRIX, S', P,P, S, LDCOV, 0)
TR_S=0.0
DO I =1,P
DO J=1,P
IF (I.EQ.J) THEN
TR_S=TR_S+S(I,J)
ELSE
ENDIF
END DO
END DO
WRITE(6,*) 'TRACE OF SAMPLE VAR-COV MATRIX= TR_S = ',TR_S
C MULTIPLY POPULATION VARIANCE-COVARIANCE INVERSE
C (COV_INV) WITH SAMPLE COVARIANCE MATRIX (S)
CALL MRRRR (P,P,V0_INV,P,P,P,S,P,P,P,A,P)
CALL WRRRN ('A=V0_INV * S ',P,P,A,P,0)
CALL MRRRR(P,P,A,P,P,P,A,P,P,P,B,P)
CALL WRRRN ('B = (V0_INV * S) ^ 2', P,P,B,P,0)
TR_A=0.0
TR_B = 0.0
DO I =1,P
TR_A=TR_A+A(I,I)
TR_B=TR_B+B(I,I)
END DO
98
C============================================================
C COMPUTE THE PROPOSED TEST STATISTIC, T1
C============================================================
A1=TR_A/P
D = (N-2.0)*(N+1.0)
E=(N-1)**2/D
A2=E*(TR_B-(TR_A**2)/(N-1))/P
T =A2-2.0*(SIG_SQ**2)*A1+SIG_SQ**2
ABOVE=(N-1)*
BELOW=2*((2*(SIG_SQ**2)*(N-1)-4*SIG_SQ*(N-1)+2*(N-1))/P + 1)**0.5
T1 = ABOVE/BELOW
WRITE(6,*) 'TEST STATISTIC = T1 = ',T1
CRV=ANORIN(0.95)
WRITE(6,*) 'CRITICAL VALUE =',CRV
IF (T1.GT.CRV) THEN
COUNT1 = COUNT1+1
ENDIF
C============================================================
C COMPUTE SRIVASTAVA (2005)’ S STATISTIC, TS1
C============================================================
CALL MRRRR (P,P,S,P, P,P,S,P, P,P,SS,P)
CALL WRRRN ('S SQUARE = S * S = ',P,P,SS,P,0)
TR_SS = 0.0
DO I =1,P
TR_SS = TR_SS + SS(I,I)
END DO
WRITE(6,*) 'TRACE OF S SQUARE =',TR_SS
99
A1_HAT=TR_S/P
A2_HAT=(TR_SS-1.0*TR_S**2/(N-1))/P*(N-1)**2/(N-2)/(N+1)
TS1=((N-1)/2.0)*(A2_HAT/(A1_HAT**2)-1)
WRITE(6,*) 'SRIVASTAVA STATISTIC, TS1 = ',TS1
IF (TS1.GT.CRV) THEN
COUNT2 = COUNT2+1
ENDIF
C============================================================
C COMPUTE LEDOIT AND WOLF (2002)’ S STATISTIC, UJ
C============================================================
U=1.0*(TR_SS/P)/(TR_S/P)**2-1
UJ=((N*1.0-1)*U-P-1.0)/2.0
WRITE(6,*)'UJ=',UJ
IF (UJ.GT.CRV) THEN
COUNT3 = COUNT3+1
ENDIF
10 CONTINUE
C============================================================
C COMPUTE PROPROTIONS OF REJECTIONS OF THE TEST STATISTICS
C=========================================================
PROP_T1=1.0*COUNT1/K*1.0
WRITE(6,60) PROP_T1
60 FORMAT(' PROP1_T1 = (#T1 > Z(ALPHA))/ K = ',T35,F7.4)
PROP_UJ=1.0*COUNT3/K*1.0
WRITE(6,120) PROP_UJ
100
70 FORMAT(' PROP_UJ = (#UJ > Z(ALPHA))/ K = ',T35,F7.4)
PROP _TS1=1.0*COUNT2/K*1.0
WRITE(6,100) PROP _TS1
80 FORMAT(' PROP_TS1 = (#TS1 > Z(ALPHA))/ K = ',T35,F7.4)
9999 CONTINUE
STOP
END
APPENDIX D
FORTRAN Syntax for Two High-Dimensional Data
INTEGER I,IRANK,ISEED,J,LDR1,LDR2,LDRSIG,NOUT, N1,N2,P,
& ITERATION,COUNT1,COUNT2,COUNT3,COUNT4,n,
& LDCOV,LDINCD
PARAMETER (K=10000,N1=20,N2=20,P=20,LDCOV=P,LDINCD=1)
INTEGER ICOPT,IDO,IFRQ,INCD(LDINCD,1),IWT,MOPT,NMISS,NOBS,
& NROW
REAL V1(P,P),V2(P,P),RSIG(P,P),X1(N1,P),S1(P,P),XMEAN1(P), SS1(P,P),
& X2(N2,P),S2(P,P),SS2(P,P),XMEAN2(P), S(P,P),SS(P,P),SSS(P,P),
& SSSS(P,P), A2,A4,TAU,b,d,c_star,e,T2_SQ,M1(P,P),M2(P,P),
& MM1(P,P),MM2(P,P),M(P,P),MM(P,P), MMM(P,P),MMMM(P,P),
& C0,C1,C2,C3,S1S2(P,P),TJ,TJ_SQ, U1(P),U2(P),DIAG(P,P),
& DT(P,P),DID(P,P),EVAL1(P),EVAL2(P)
EXTERNAL CHFAC,RNMVN,RNSET,UMACH,LINRG,WRRRN,MRRRR,
& ANORIN,CHIIN,CORVC,RNUN,SSCAL,SADD,EVLRG,WRCRN
C SET POPULATION COVARIANCE MATRICES
CALL UMACH(2,NOUT)
LDRSIG =P
LDR1 =N1
LDR2 =N2
C ALL COVARIANCE MATRICES CONSIDERED
DO 9999 COV1=8,8
C SET FIRST POPULATION COVARIANCE MATRIX, SIGMA1=V1
102
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V1(I,J)=1.0
ELSE
V1(I,J) = 0.0
ENDIF
END DO
END DO
IF(COV1.EQ.1) THEN
WRITE(6,*)'MATRIX V1, UN STRUCTURE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V1(I,I)=1.0
ELSE
V1(I,J) = ((-1)**(I+J))*(I/(10.0*J))
ENDIF
END DO
END DO
ELSEIF(COV1.EQ.2) THEN
WRITE(6,*)'MATRIX V1, UN STRUCTURE'
CALL UMACH(2,NOUT)
ISEED=6250
CALL RNSET(ISEED)
AAA=1.0
BBB=5.0
CALL RNUN(P,U1)
CALL SSCAL(P,BBB-AAA,U1,1)
CALL SADD(P,AAA,U1,1)
103
CALL WRRRN('U1',1,P,U1,1,0)
DO I=1,P
DO J=1,P
IF (I.EQ.J)THEN
DIAG(I,I)=U1(I)
ELSE
DIAG(I,J)=0.0
ENDIF
END DO
END DO
CALL WRRRN('DIAG',P,P,DIAG,P,0)
L=0
WW=0.1
DO I=1,P
DO J=1,P
Y=ABS(I-J)
DT(I,J)=(-1)**(I+J)*(0.2*(L+2.0))**(Y**WW)
END DO
END DO
CALL WRRRN('DELTA MATRIX',P,P,DT,P,0)
CALL MRRRR (P,P,DIAG,P,P,P,DT,P,P,P,DID,P)
CALL WRRRN('DIAG MATRIX * DELTA MATRIX',P,P,DID,P,0)
CALL MRRRR (P,P,DID,P,P,P,DIAG,P,P,P,V1,P)
CALL WRRRN('DIAG *DELTA *DIAG = V1',P,P,V1,P,0)
ELSEIF(COV1.EQ.3) THEN
WRITE(6,*)'MATRIX V1, CS STRUCTRUE'
DO I =1,P
DO J =1,P
104
IF (I.EQ.J)THEN
V1(I,J)=1.0
ELSE
V1(I,J) = 0.01
ENDIF
END DO
END DO
ELSEIF(COV1.EQ.4) THEN
WRITE(6,*)'MATRIX V1, CSH STRUCTURE'
CALL UMACH(2,NOUT)
ISEED=6250
CALL RNSET(ISEED)
AAA=5.0
BBB=6.0
CALL RNUN(P,U1)
CALL SSCAL(P,BBB-AAA,U1,1)
CALL SADD(P,AAA,U1,1)
CALL WRRRN('U1',1,P,U1,1,0)
DO I=1,P
V1(I,I)=U1(I)
END DO
RHO=0.50
DO I=1,P
DO J=1,P
IF(I.EQ.J) THEN
V1(I,I)=U1(I)
ELSE
V1(I,J)=((U1(I)*U1(J))**0.5)*RHO
ENDIF
105
END DO
END DO
ELSEIF(COV1.EQ.5) THEN
WRITE(6,*)'MATRIX V1, SIM STRUCTRUE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V1(I,J)=1.0
ELSE
V1(I,J) = 0.0
ENDIF
END DO
END DO
ELSEIF(COV1.EQ.6) THEN
WRITE(6,*)'MATRIX V1, SIM STRUCTRUE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V1(I,J)=2.0
ELSE
V1(I,J) = 0.0
ENDIF
END DO
END DO
ELSEIF(COV1.EQ.7) THEN
WRITE(6,*)'MATRIX V1, TOEP STRUCTRUE'
DO I=1,P-1
V1(I,I+1)=-0.5
IF(I+2.LE.P) THEN
106
V1(I,I+2) = 0.0
ENDIF
END DO
ELSEIF(COV1.EQ.8) THEN
WRITE(6,*)'MATRIX V1, VC STUCTURE'
CALL UMACH(2,NOUT)
ISEED=6250
CALL RNSET(ISEED)
AAA=1.0
BBB=2.0
CALL RNUN(P,U1)
CALL SSCAL(P,BBB-AAA,U1,1)
CALL SADD(P,AAA,U1,1)
CALL WRRRN('U1',1,P,U1,1,0)
DO I=1,P
DO J=1,P
V1(I,J)=0.0
END DO
END DO
DO I=1,P
V1(I,I)=U1(I)
END DO
END IF
DO I=1,P-1
DO J=I+1,P
V1(J,I)=V1(I,J)
END DO
END DO
107
C REPORT HEADER
CALL WRRRN('V1',P,P,V1,P,0)
C ALL COVARIANCE MATRICES CONSIDERED
DO 8888 COV2=8,8
C SET SECOND POPULATION COVARINCE MATRIX SIGMA2=V2
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V2(I,J)=1.0
ELSE
V2(I,J) = 0.0
ENDIF
END DO
END DO
IF(COV2.EQ.1) THEN
WRITE(6,*)'MATRIX V2, UN STRUCTURE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V2(I,I)=1.0
ELSE
V2(I,J) = ((-1)**(I+J))*(I/(20.0*J))
ENDIF
END DO
END DO
ELSEIF(COV2.EQ.2) THEN
WRITE(6,*)'MATRIX V2, UN STRUCTURE'
CALL UMACH(2,NOUT)
ISEED=6250
108
CALL RNSET(ISEED)
AAA=1.0
BBB=5.0
CALL RNUN(P,U1)
CALL SSCAL(P,BBB-AAA,U1,1)
CALL SADD(P,AAA,U1,1)
CALL WRRRN('U1',1,P,U1,1,0)
DO I=1,P
DO J=1,P
IF (I.EQ.J)THEN
DIAG(I,I)=U1(I)
ELSE
DIAG(I,J)=0.0
ENDIF
END DO
END DO
CALL WRRRN('DIAG',P,P,DIAG,P,0)
L=2
WW=0.10
DO I=1,P
DO J=1,P
Y=ABS(I-J)
DT(I,J)=(-1)**(I+J)*(0.2*(L+2.0))**(Y**WW)
END DO
END DO
CALL WRRRN('DELTA MATRIX',P,P,DT,P,0)
CALL MRRRR (P,P,DIAG,P,P,P,DT,P,P,P,DID,P)
CALL WRRRN('DIAG MATRIX * DELTA MATRIX',P,P,DID,P,0)
109
CALL MRRRR (P,P,DID,P,P,P,DIAG,P,P,P,V2,P)
CALL WRRRN('DIAG *DELTA *DIAG = V2',P,P,V2,P,0)
ELSEIF(COV2.EQ.3) THEN
WRITE(6,*)'MATRIX V2, CS STRUCTURE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
V2(I,J)=1.0
ELSE
V2(I,J) = 0.05
ENDIF
END DO
END DO
ELSEIF(COV2.EQ.4) THEN
WRITE(6,*)'MATRIX V2, CSH STRUCTURE'
CALL UMACH(2,NOUT)
ISEED=6250
CALL RNSET(ISEED)
AAA=4.0
BBB=5.0
CALL RNUN(P,U1)
CALL SSCAL(P,BBB-AAA,U1,1)
CALL SADD(P,AAA,U1,1)
CALL WRRRN('U1',1,P,U1,1,0)
DO I=1,P
V2(I,I)=U1(I)
END DO
RHO=0.40
110
DO I=1,P
DO J=1,P
IF(I.EQ.J) THEN
V2(I,I)=U1(I)
ELSE
V2(I,J)=((U1(I)*U1(J))**0.5)*RHO
ENDIF
END DO
END DO
ELSEIF(COV2.EQ.5) THEN
WRITE(6,*)'MATRIX V2, VC STRUCTURE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
MMOD=MOD(I,4)
ZERO=0
IF (MMOD.EQ.0) THEN
V2(I,I)=2.0
ELSE
ENDIF
ELSE
V2(I,J)=0.0
ENDIF
END DO
END DO
ELSEIF(COV2.EQ.6) THEN
WRITE(6,*)'MATRIX V2, SIM STRUCTURE'
DO I =1,P
DO J =1,P
IF (I.EQ.J)THEN
111
V2(I,J)=1.5
ELSE
V2(I,J) = 0.0
ENDIF
END DO
END DO
ELSEIF(COV2.EQ.7) THEN
WRITE(6,*)'MATRIX V2, TOEP STRUCTRUE'
DO I=1,P-1
V2(I,I+1)=-0.30
IF(I+2.LE.P) THEN
V2(I,I+2) = 0.0
ENDIF
END DO
ELSEIF(COV2.EQ.8) THEN
WRITE(6,*)'MATRIX V2, VC STUCTURE'
CALL UMACH(2,NOUT)
ISEED=6250
CALL RNSET(ISEED)
AAA=1.5
BBB=2.5
CALL RNUN(P,U2)
CALL SSCAL(P,BBB-AAA,U2,1)
CALL SADD(P,AAA,U2,1)
C CALL WRRRN('U2',1,P,U2,1,0)
DO I=1,P
DO J=1,P
V2(I,J)=0.0
END DO
END DO
112
DO I=1,P
V2(I,I)=U2(I)
END DO
ENDIF
DO I=1,P-1
DO J=I+1,P
V2(J,I)=V2(I,J)
END DO
END DO
C REPORT HEADER
CALL WRRRN('V2',P,P,V2,P,0)
C============================================================
C COMPUTE EIGENVALUES OF MATRIX V1 AN V2
C============================================================
CALL EVLRG(P,V1,P,EVAL1)
CALL WRCRN('EIGEN VALUES OF MATRIX V1',1,P,EVAL1,1,0)
CALL EVLRG(P,V2,P,EVAL2)
CALL WRCRN('EIGEN VALUES OF MATRIX V2',1,P,EVAL2,1,0)
C INITIALIZE SEED OF RANDOM NUMBER GERNATOR
ITERATION=0
C INITIALIZE THE NUMBER OF TIME,COUNT,THAT TEST STATISTIC FALL
C IN THE CRITICAL REGION
COUNT1= 0
COUNT2= 0
COUNT3= 0
COUNT4= 0
113
C INITIALIZE SEED OF RANDOM NUMBER GENERATOR, K
ISEED =6250
CALL RNSET(ISEED)
DO 10 ITER = 1,K
ITERATION = ITERATION +1
WRITE(6,*)'#',ITERATION
C============================================================
C PHASE 1 FOR POPULATION 1
C============================================================
C OBTAIN THE CHOLESKY FACTORIZATION
CALL CHFAC (P,V1,P,0.00001,IRANK,RSIG,LDRSIG)
C CONSTRUCT FIRST SAMPLE DATA, X1, BASED ON V1
CALL RNMVN(N1,P,RSIG,LDRSIG,X1,LDR1)
CALL WRRRN ('(N1)x P MATRIX X1',N1,P,X1,N1,0)
C CALCULATE FIRST SAMPLE COVARIANCE MATRIX, S1
CALL UMACH(2, NOUT)
IDO = 0
NROW = N1
IFRQ = 0
IWT = 0
MOPT = 0
ICOPT = 0
CALL CORVC(IDO, NROW, P, X1, N1, IFRQ, IWT, MOPT, ICOPT,
& XMEAN1, S1, LDCOV, INCD, LDINCD,NOBS, NMISS, SUMWT)
CALL WRRRN('1ST SAMPLE COV MATRIX = S1', P,P, S1, LDCOV, 0)
CALL MRRRR (P,P,S1,P,P,P,S1,P,P,P,SS1,P)
114
CALL WRRRN ('S1^2 ',P,P,SS1,P,0)
TR_S1=0.0
TR_SS1=0.0
DO I=1,P
TR_S1=TR_S1+S1(I,I)
TR_SS1=TR_SS1+SS1(I,I)
END DO
A21_HAT=(N1-1)**2*(TR_SS1-TR_S1**2/(N1-1))/(N1-2)/(N1+1)/P
C WRITE(6,*)'TRACE OF S1= ',TR_S1
C WRITE(6,*)'TRACE OF S1^2 = ',TR_SS1
C WRITE(6,*)'A21_HAT = ',A21_HAT
C============================================================
C PHASE 2 FOR POPULATION 2
C============================================================
C OBTAIN THE CHOLESKY FACTORIZATION
CALL CHFAC (P,V2,P,0.00001,IRANK,RSIG,LDRSIG)
C CONSTRUCT SECOND SAMPLE DATA X2 BASED ON V2
CALL RNMVN(N2,P,RSIG,LDRSIG,X2,LDR2)
CALL WRRRN ('(N2)x P MATRIX X2',N2,P,X2,N2,0)
C CALCULATE SECOND SAMPLE COVARIANCE MATRIX, S2
CALL UMACH(2,NOUT)
NROW = N2
CALL CORVC(IDO, NROW, P, X2, N2, IFRQ, IWT, MOPT, ICOPT,
& XMEAN2, S2, LDCOV, INCD, LDINCD,NOBS, NMISS, SUMWT)
CALL WRRRN('2ND SAMPLE COV MATRIX = S2', P,P, S2, LDCOV, 0)
CALL MRRRR (P,P,S2,P,P,P,S2,P,P,P,SS2,P)
115
CALL WRRRN ('S2^2 ',P,P,SS2,P,0)
TR_S2=0.0
TR_SS2=0.0
DO I=1,P
TR_S2=TR_S2+S2(I,I)
TR_SS2=TR_SS2+SS2(I,I)
END DO
A22_HAT=(N2-1)**2*(TR_SS2-TR_S2**2/(N2-1))/(N2-2)/(N2+1)/P
WRITE(6,*)'TRACE OF S2= ',TR_S2
WRITE(6,*)'TRACE OF S2^2 = ',TR_SS2
WRITE(6,*)'A22_HAT = ',A22_HAT
C============================================================
C PHASE 3 COMPUTE THE PROPOSED TEST STATISTIC, T2
C============================================================
C ESTIMATE THE COMMON COVARIANCE MATRIX, UNDER H0:
C SIGMA1=SIGMA2=SIGMA
C THE POOLED COVARIANCE MATRIX, S, IS UNBIASED ESTIMATOR OF
C THE SIGMA MATRIX
DO I =1,P
DO J =1,P
S(I,J)=((N1-1)*S1(I,J)+(N2-1)*S2(I,J))/(N1+N2-2)
END DO
END DO
CALL WRRRN('POOLED COV MATRIX= S',P,P,S,P,0)
CALL MRRRR (P,P,S,P,P,P,S,P,P,P,SS,P)
CALL WRRRN ('S^2 ',P,P,SS,P,0)
CALL MRRRR (P,P,S,P,P,P,SS,P,P,P,SSS,P)
116
CALL WRRRN ('S^3 ',P,P,SSS,P,0)
CALL MRRRR (P,P,SS,P,P,P,SS,P,P,P,SSSS,P)
CALL WRRRN ('S^4 ',P,P,SSSS,P,0)
TR_S=0.0
TR_SS=0.0
TR_SSS=0.0
TR_SSSS=0.0
DO I=1,P
TR_S=TR_S+S(I,I)
TR_SS=TR_SS+SS(I,I)
TR_SSS=TR_SSS+SSS(I,I)
TR_SSSS=TR_SSSS+SSSS(I,I)
END DO
WRITE(6,*)'TRACE OF POOLED-COV MATRIX = TR_S = ',TR_S
WRITE(6,*)'TRACE OF S^2 = TR_SS = ',TR_SS
WRITE(6,*)'TRACE OF S^3 = TR_SSS = ',TR_SSS
WRITE(6,*)'TRACE OF S^4 = TR_SSSS = ',TR_SSSS
n=(N1-1)+(N2-1)
WRITE(6,*)'SMALL n = ',n
A2=(n**2)*(TR_SS-(TR_S**2)/n)/P/(n-1)/(n+2)
WRITE(6,*)'A2 = ',A2
d1=1.0*(n**5.0)
d2=1.0*(n**2.0+1.0*n+2.0)
d3=1.0*(n+1)
d4=1.0*(n+2)
d5=1.0*(n+4)
d6=1.0*(n+6)
117
d7=1.0*(n-1)
d8=1.0*(n-2)
d9=1.0*(n-3)
TAU=1.0*(d1*d2)/(d3*d4*d5*d6*d7*d8*d9)
b=-4.0/n
c_star=-1.0*(2.0*n**2+3.0*n-6)/n/(n**2+n+2)
d=2.0*(5.0*n+6)/n/(n**2+n+2)
e=-1.0*(5.0*n+6)/n**2/(n**2+n+2)
A4=1.0*TAU*(TR_SSSS+b*TR_SSS*TR_S+c_star*TR_SS**2+d*TR_SS*
& TR_S**2+e*TR_S**4)/P
DELTA_SQ=8.0*A4*(1.0/(N1-1)+1.0/(N2-1))/(A2**2)/P+4.0*
& (1.0/(N1-1)**2+1.0/(N2-1)**2)
DELTA=1.0*SQRT(DELTA_SQ)
T2=(A21_HAT/A22_HAT-1)/DELTA
T2_SQ=T2**2
C WRITE(6,*)'T2 = ',T2
C============================================================
C PHASE 4 COMPUTE SRIVASTAVA(2007)'S STATISTIC,TS2
C============================================================
DO I =1,P
DO J =1,P
M1(I,J)=(N1-1)*S1(I,J)
M2(I,J)=(N2-1)*S2(I,J)
END DO
END DO
CALL WRRRN('(N1-1)*S1= M1',P,P,M1,P,0)
CALL WRRRN('(N2-1)*S2= M2',P,P,M2,P,0)
CALL MRRRR (P,P,M1,P,P,P,M1,P,P,P,MM1,P)
118
CALL WRRRN ('M1^2 ',P,P,MM1,P,0)
CALL MRRRR (P,P,M2,P,P,P,M2,P,P,P,MM2,P)
CALL WRRRN ('M2^2 ',P,P,MM2,P,0)
DO I =1,P
DO J =1,P
M(I,J)=M1(I,J)+M2(I,J)
END DO
END DO
CALL WRRRN('M1+M2 = M',P,P,M,P,0)
CALL MRRRR (P,P,M,P,P,P,M,P,P,P,MM,P)
CALL WRRRN ('M^2 ',P,P,MM,P,0)
CALL MRRRR (P,P,MM,P,P,P,MM,P,P,P,MMMM,P)
CALL WRRRN ('M^4 ',P,P,MMMM,P,0)
TR_M1=0.0
TR_M2=0.0
TR_MM1=0.0
TR_MM2=0.0
TR_M=0.0
TR_MM=0.0
TR_MMMM=0.0
DO I=1,P
TR_M1=TR_M1+M1(I,I)
TR_M2=TR_M2+M2(I,I)
TR_MM1=TR_MM1+MM1(I,I)
TR_MM2=TR_MM2+MM2(I,I)
119
TR_M=TR_M+M(I,I)
TR_MM=TR_MM+MM(I,I)
TR_MMMM=TR_MMMM+MMMM(I,I)
END DO
WRITE(6,*)'TR_M1=',TR_M1, ' TR_M2=',TR_M2
WRITE(6,*)'TR_MM1=',TR_MM1, ' TR_MM2',TR_MM2
WRITE(6,*)'TR_M=',TR_M,' TR_MM=',TR_MM
WRITE(6,*)'TR_MMMM=',TR_MMMM
C0=1.0*n*(n**3+6.0*n**2+21.0*n+18)
C1=2.0*n*(2.0*n**2+6.0*n+9)
C2=2.0*n*(3.0*n+2)
C3=1.0*n*(2.0*n**2+5.0*n+7)
A21S=1.0*(TR_MM1-(TR_M1**2)/(N1-1))/(P*(N1-2)*(N1+1))
A22S=1.0*(TR_MM2-(TR_M2**2)/(N2-1))/(P*(N2-2)*(N2+1))
A1S=TR_M/(n*P)
A2S=(TR_MM-TR_M**2/n)/((n-1)*(n+2)*p)
A4S=1.0*(1.0*TR_MMMM/P-1.0*P*C1*A1S-1.0*(P**2)*C2*(A1S**2)*A2S
& -1.0*P*C3*(A2S**2)-1.0*n*(P**3)*(A1S**4))/C0
ET1_SQ=4.0*A2S**2*(1.0+2.0*(N1-1)*A4S/P/A2S**2)/
& (N1-1)**2
ET2_SQ=4.0*A2S**2*(1.0+2.0*(N2-1)*A4S/P/A2S**2)/
& (N2-1)**2
PLUS=1.0*(ET1_SQ+ET2_SQ)
TS2_SQ=1.0*(A21S-A22S)**2/PLUS
WRITE(6,*)'TS2_SQ=',TS2_SQ
C============================================================
CPHASE 5 COMPUTE SRIVASTAVA&YANAGIHARA(2010)'S STATISTIC,TSY
C============================================================
CALL MRRRR (P,P,M,P,P,P,MM,P,P,P,MMM,P)
CALL WRRRN ('M^3 ',P,P,MMM,P,0)
120
TR_MMM=0.0
DO I=1,P
TR_MMM=TR_MMM+MMM(I,I)
END DO
A11S=1.0*TR_M1/P/(N1-1)
A12S=1.0*TR_M2/P/(N2-1)
A3S=1.0*(TR_MMM/P-3.0*n*(n+1)*P*A2S*A1S-n*P**2
& *A1S**3)/n/(n**2+3*n+4)
GAM1=A21S/A11S**2
GAM2=A22S/A12S**2
R=(1.0*(A2S**3/A1S**6-2.0*A2S*A3S/A1S**5+A4S/A1S**4))/P
SI1_SQ=4.0*(A2S**2/A1S**4+2.0*(N1-1)*R)/(N1-1)**2
SI2_SQ=4.0*(A2S**2/A1S**4+2.0*(N2-1)*R)/(N2-1)**2
SUM=1.0*(SI1_SQ+SI2_SQ)
TSY_SQ=1.0*(GAM1-GAM2)**2/SUM
C WRITE(6,*)'TSY_SQ=',TSY_SQ
C============================================================
C PHASE 6 COMPUTE SCHOTT(2007)'S STATISTIC, TJ,
C============================================================
CALL MRRRR (P,P,S1,P,P,P,S2,P,P,P,S1S2,P)
CALL WRRRN ('S1*S2 = S1S2 ',P,P,S1S2,P,0)
TR_S1S2=0.0
DO I=1,P
TR_S1S2=TR_S1S2+S1S2(I,I)
END DO
WRITE(6,*) 'TR(S1*S2) =',TR_S1S2
TJ=(N1-1)*(N2-1)*(A21_HAT+A22_HAT-2.0*TR_S1S2/P)/2.0/
& (N1+N2-2)/A2
121
TJ_SQ=TJ**2
WRITE(6,*)'TJ = ',TJ,' TJ_SQ = ',TJ_SQ
C============================================================
C PHASE 7 COMPUTE PROPORTIONS OF REJECTIONS
C============================================================
CRV=ANORIN(0.95)
WRITE(6,*) 'CRITICAL VALUE =Z(ALPHA)=',CRV
CRV1=CHIIN(0.95,1.0)
WRITE(6,*)'95th PERCENTAGE OF A CHI-SQUARED WITH 1 D.F.=',CRV1
IF (T2_SQ.GT.CRV1) THEN
COUNT1 =COUNT1+1
ENDIF
IF (TJ.GT.CRV) THEN
COUNT2 =COUNT2+1
ENDIF
IF (TS2_SQ.GT.CRV1) THEN
COUNT3 =COUNT3+1
ENDIF
IF (TSY_SQ.GT.CRV1) THEN
COUNT4 =COUNT4+1
ENDIF
10 CONTINUE
ASL1=1.0*COUNT1/K*1.0
WRITE(6,11) ASL1
122
11 FORMAT(' ASL1= (# T2^2 > CHI(ALPHA)))/K = ',T35,F7.4)
ASL2=1.0*COUNT2/K*1.0
WRITE(6,13) ASL2
12 FORMAT(' ASL2= (# TJ > Z(ALPHA)) K = ',T35,F7.4)
ASL3=1.0*COUNT3/K*1.0
WRITE(6,14) ASL3
13 FORMAT(' ASL3 = (# TS2^2 > CHI(ALPHA))/ K = ',T35,F7.4)
ASL4=1.0*COUNT4/K*1.0
WRITE(6,14) ASL4
14 FORMAT(' ASL4 = (# TSY^2 > CHI(ALPHA))/ K) = ',T35,F7.4)
8888 CONTINUE
9999 CONTINUE
STOP
END
BIOGRAPHY
NAME Miss Saowapha Chaipitak
ACADAMIC BACKGROUND B.Sc. (Statistics), Naresuan University,
Thailand, 1999.
M.S. (Applied Statistics), National Institute
of Development Administration, Thailand,
2006.
PRESENT POSITION Lecturer,
Faculty of Science and Technology,
Rajamangala University of Technology
Thanyaburi (RMUTT), Thailand.
EXPERIENCE 1999-2000: General Administrative Officer,
Phitsanulok Municipal Court, Thailand.
2000-present: Lecturer,
Faculty of Science and Technology,
Rajamangala University of Technology
Thanyaburi (RMUTT), Thailand.
2007: Received a scholarship from “the
Commission on Higher Education, Thailand”
for enrolling in the doctoral level program at
the School of Applied Statistics, National
Institute of Development Administration
(NIDA), Thailand.
124
Publication: Saowapha Chaipitak and Samruam
Chongcharoen, 2013. A Test for Testing the
Equality of Two Covariance Matrices for High-
dimensional Data. Journal of Applied
Sciences. 13 (February): 270-277.