the unweighted mean estimator in a growth curve...

Examensarbete

The unweighted mean estimator in a Growth Curve model

Emil Karlsson

LiTH - MAT - EX - - 2016 / 03 - - SE

The unweighted mean estimator in a Growth Curve model

Mathematical statistics, Linkopings universitet

Emil Karlsson

LiTH - MAT - EX - - 2016 / 03 - - SE

Examensarbete: 30 hp

Level: A

Supervisor: Martin Singull,Mathematical statistics, Linkopings universitet

Examiner: Martin Singull,Mathematical statistics, Linkopings universitet

Linkoping: April 2016

Abstract

The field of statistics is becoming increasingly more important as the amount of data inthe world grows. This thesis studies the Growth Curve model in multivariate statisticswhich is a model that is not widely used. One difference compared with the linearmodel is that the Maximum Likelihood Estimators are more complicated. That makesit more difficult to use and to interpret which may be a reason for its not so widespreaduse.From this perspective this thesis will compare the traditional mean estimator for theGrowth Curve model with the unweighted mean estimator. The unweighted mean esti-mator is simpler than the regular MLE. It will be proven that the unweighted estimatoris in fact the MLE under certain conditions and examples when this occurs will bediscussed. In a more general setting this thesis will present conditions when the un-weighted estimator has a smaller covariance matrix than the MLEs and also presentconfidence intervals and hypothesis testing based on these inequalities.

Keywords: Growth Curve model, maximum likelihood, eigenvalue inequality, un-weighted mean estimator, covariance matrix, circular symmetric Toeplitz, intra-class, generalized intraclass

URL for electronic version:

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-131043

Karlsson, 2016. v

Acknowledgements

First I would like to thank my supervisor Martin Singull for introducing me to the fieldof multivariate statistics. I would also like to thank him for all the support in my stud-ies overall and in this thesis specifically. It has been a pleasure discussing and workingwith you. I want to thank my opponent Jonas Granholm for reading my work and giv-ing valuable remarks.

I would like to give my deepest thanks to my Ghazaleh for walking through lifetogether with me and supporting my endeavor to study. I love you.

Karlsson, 2016. vii

Nomenclature

The reoccurring abbreviations and symbols are described here.

NotationThroughout this thesis matrices and vectors will be denoted by boldface transcription.Matrices will for the most part be denoted by capital letters and vectors by small lettersof Latin or Greek alphabets. Scalars and matrix elements will be denoted by ordinaryletters of Latin or Greek alphabets. Random variables will be denoted by capital lettersfrom the end of the Latin alphabet. The end of proofs are marked by .

LIST OF NOTATION

Am,n - matrix of size m× nMm,n - the set of all matrices of size m× naij - matrix element of the i-th row and j-th columnan - vector of size nc - scalarA′ - transposed matrixAIn - identity matrix of size n|A| - determinant ofArank(A) - rank ofAtr(A) - trace ofAX - random matrixx - random vectorX - random variableE[X] - expectationvar(X) - variancecov(X,Y ) - covariance of X and YS - sample dispersion matrixNp(µ,Σ) - multivariate normal distributionNp,n(M ,Σ,Ψ) - matrix normal distributionMLE - Maximum likelihood estimator

Karlsson, 2016. ix

List of Figures

2.1 Plot over the Reciprocal Condition Number of V . . . . . . . . . . . . 12

5.1 Histogram of the first anti-eigenvalue when Σ is symmetric circularToeplitz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.2 Histogram of the first anti-eigenvalue when Σ has no structure. . . . . 27

Karlsson, 2016. xi

xii List of Figures

Contents

1 Introduction 11.1 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Mathematical background 32.1 Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 General definitions and theorems . . . . . . . . . . . . . . . . 32.1.2 Generalized inverse . . . . . . . . . . . . . . . . . . . . . . . 42.1.3 Anti-eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 52.1.4 Kronecker product . . . . . . . . . . . . . . . . . . . . . . . 62.1.5 Matrix derivatives . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 General concepts . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 The Maximum Likelihood Estimator . . . . . . . . . . . . . . 82.2.3 The matrix normal distribution . . . . . . . . . . . . . . . . . 82.2.4 Growth Curve model . . . . . . . . . . . . . . . . . . . . . . 92.2.5 Wishart distribution . . . . . . . . . . . . . . . . . . . . . . . 11

3 Unweighted estimator for Growth Curve model underR(Σ−1B) = R(B) 133.1 Unweighted vs weighted mean estimator . . . . . . . . . . . . . . . . 133.2 Previous results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 Maximum likelihood estimator . . . . . . . . . . . . . . . . . . . . . 153.4 Intraclass model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.5 Generalized intraclass model . . . . . . . . . . . . . . . . . . . . . . 17

4 Unweighted estimator for Growth Curve model without assumptions onB 194.1 Background and previous results . . . . . . . . . . . . . . . . . . . . 194.2 Eigenvalues and coefficients . . . . . . . . . . . . . . . . . . . . . . 194.3 Unweighted vs weighted estimator with known and fixed eigenvectors 20

4.3.1 Distributions of eigenvalues . . . . . . . . . . . . . . . . . . 204.3.2 Hypothesis testing and confidence interval . . . . . . . . . . . 214.3.3 Symmetric circular Toeplitz structured covariance matrix . . . 22

5 Simulations 255.1 Distribution of the first anti-eigenvalue . . . . . . . . . . . . . . . . . 25

5.1.1 Symmetric circular Toeplitz covariance matrix . . . . . . . . 255.1.2 Unknown covariance matrix . . . . . . . . . . . . . . . . . . 26

5.2 Eigenvalue confidence interval . . . . . . . . . . . . . . . . . . . . . 265.3 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Karlsson, 2016. xiii

xiv Contents

6 Further research 31

Chapter 1

Introduction

In this thesis the unweighted mean estimator for a Growth Curve model is studied.TheGrowth Curve model was introduced by [Potthoff and Roy, 1964] when the observa-tion matrix is normally distributed with an unknown covariance matrix. The GrowthCurve model belongs to the curved exponential family, since it is a generalized mul-tivariate analysis of variance model (GMANOVA). The difference of the structurefor the Growth Curve model in comparison to the ordinary multivariate linear model(MANOVA) is that the Growth Curve model is bilinear. There are two design matricesinvolved for a bilinear model, in contrary to the MANOVA, where there is one designmatrix. For more details and references about the Growth Curve model see Chapter 2.

For the Growth Curve model, originally [Potthoff and Roy, 1964] derived a classof weighted estimators for the mean parameter matrix. These results were later ex-tended by [Khatri, 1966], who showed that the standard Maximum Likelihood Esti-mators (MLEs) also belong to this class. Under some conditions, that will be studiedin detail in Chapter 3, [Reinsel, 1982] have shown that the unweighted estimator fora Growth Curve model is the MLE. The proof was done by using that the unweightedestimator is the least square estimator and under these conditions the least square esti-mators coincide with the MLE.

The studies of patterned covariance matrices started with [Wilks, 1946], whenWilks published a paper dealing with measurements on k-equivalent psychologicaltests using a MANOVA model. He used the intraclass model, where the variance com-ponents are equal and the covariance between them are equal as well. Over the yearsdifferent directions and special kinds of structures has been studied. A common oneis the Toeplitz covariance matrix which arise in time series analysis, signal and imageprocessing, Markov chains and among other fields. During the last years there has beensome extra attention for the intraclass covariance structure for a Growth Curve model,originating from [Khatri, 1973]. E.g., [Zezula, 2006] derived some simple explicit esti-mators of the variance and the correlation given the intraclass covariance structure, [Yeand Wang, 2009] and [Klein and Zezula, 2010] developed estimators with the unbiasedestimating equations using orthogonal complement. Recently [Srivastava and Singull,2016a, Srivastava and Singull, 2016b] has studied the problem of testing sphericity andintraclass structure.

Karlsson, 2016. 1

2 Chapter 1. Introduction

1.1 Chapter outlineChapter 2:Introduces required concepts and results in mathematics, mainly in linear algebra andstatistics, that are used in this thesis.Chapter 3:Presents conditions when the unweighted mean estimator for Growth Curve modelaligns with the MLE. The example cases of intraclass and generalized intraclass arestudied in detail.Chapter 4:Compares the unweighted mean estimator with the MLE for a Growth Curve modeland presents a test to determine if it has better properties than the MLE for some ob-servations. The case when the covariance matrix has a circular Toeplitz structure isstudied in detail.Chapter 5:Simulations based on the results from the previous chapters.Chapter 6:Mentions improvements and directions of research which are interesting with regard tothe area.

Chapter 2

Mathematical background

This chapter will introduce some of the mathematics that are needed to understand thisthesis. Some of the material can be new to the reader and others not, but the aim of thischapter is to give a person studying a master program in mathematics enough back-ground to understand this thesis. The theorems will be given without proofs but theproofs can be accessed through the referenced sources of each section.

2.1 Linear algebraSince this thesis makes extensive use of linear algebra and matrices some common def-initions and theorems will be introduced here.

2.1.1 General definitions and theoremsIn the first part of this section follows some general definitions and theorems in reallinear algebra. This section presents some notation and expressions that are requiredlater. These results will be given without proofs but the proofs can be found in[Horn and Johnson, 2012, Kollo and von Rosen, 2011, Bernstein, 2009].

Definition 1. The set of all matrices with m rows and n columns is denoted as Mm,n

and similarly the set of all vectors with m rows will be denoted as Mm

Definition 2. A matrix A ∈ Mn,n is called the identity matrix of size n, denoted withIn, if the diagonal elements are 1 and the off-diagonal elements are 0.

Definition 3. A vector 1n ∈Mn is called the one-vector of size n, i.e.,

15 =(1 1 1 1 1

)′,

where 1′ denotes the transpose of 1.

Definition 4. A matrix A ∈ Mm,n is called the zero-matrix, denoted as 0m,n, if allmatrix elements ofA equals 0.

Definition 5. The range of a matrixA ∈Mm,n, denoted asR(A), is defined by

R(A) = Ax : x ∈Mn.

Karlsson, 2016. 3

4 Chapter 2. Mathematical background

Definition 6. The rank of a matrixA ∈Mm,n, denoted as rank(A), is defined as thenumber of linearly independent columns (or rows) of the matrix.

Definition 7. A matrixA ∈Mm,n is called symmetric ifA′ = A.

Definition 8. A matrixA ∈Mm,n is called normal ifAA′ = A′A.

Definition 9. A matrixA ∈Mm,n is called orthogonal ifA′A = AA′ = In.

Definition 10. A symmetric square matrix A ∈ Mn,n is positive (semi-)definite ifx′Ax > (≥)0 for any vector x 6= 0.

Definition 11. The values λi, that satisfy Axi = λixi, are called eigenvalues of thematrix A. The vector xi, that corresponds to the eigenvalue λi, is called eigenvectorofA corresponding to λi.

And lastly two theorems regarding matrix decomposition, called the spectral theo-rem and the singular value decomposition theorem.

Theorem 1 (Spectral decomposition theorem). Any normal matrix A ∈ Mn,n hasan orthonormal basis of eigenvectors. In other words, any normal matrix A can berepresented as

A = UDU ′,

where U ∈ Mn,n is an orthogonal matrix and D ∈ Mn,n is a diagonal matrix withthe eigenvalues ofA on the diagonal.

Here follows another decomposition theorem called the singular value decomposi-tion theorem.

Theorem 2 (Singular value decomposition). LetA ∈Mn,m, assume thatA is nonzero,let r = rank(A), and define B = diag[σ1(A), . . . , σr(A)], where σi are the singularvalues of A. Then, there exist orthogonal matrices S1 ∈ Mn,n and S2 ∈ Mm,m suchthat

A = S1

(B 0r,m−r

0n−r,r 0n−r,m−r

)S2.

Furthermore, each column of S1 is an eigenvector of AA′, while each column of S′2is an eigenvector ofA′A.

2.1.2 Generalized inverseThis section introduces the concept of generalized inverse, that is an extension of theregular inverse. These results can be found in [Bernstein, 2009].

Definition 12. Let A ∈ Mm,n. If A is nonzero, then, by the singular value decom-position theorem, there exist orthogonal matrices S1 ∈ Mn,n and S2 ∈ Mm,m suchthat

A = S1

(B 0r,m−r

0n−r,r 0n−r,m−r

)S2,

where B = diag[σ1(A), . . . , σr(A)], r = rank(A), and σ1(A) ≥ σ2(A) ≥ · · · ≥σr(A) > 0 are the positive singular values of A. Then, the (Moore-Penrose) general-ized inverseA+ ofA is the m× n matrix

A+ = S′2

(B−1 0r,n−r

0m−r,r 0m−r,n−r

)S′1.

IfA = 0n,m, thenA+ = 0m,n, although ifm = n and det(A) 6= 0, thenA+ = A−1.

2.1. Linear algebra 5

Lemma 1. The matrixA+ in Definition 12 above has the following properties.

(i) AA+A = A,

(ii) A+AA+ = A+,

(iii) (AA+)′ = AA+,

(iv) (A+A)′ = A+A,

The only generalized inverse A+ that satisfies the above properties is the Moore-Penrose generalized inverse from Definition 12.

Here follows some more properties regarding the Moore-Penrose generalized in-verse.

Proposition 1. LetA ∈Mm,n. Then the following statements hold:

(i) AA+ is the projector ontoR(A).

(ii) A+A is the projector ontoR(A′).

(iii) A+ = A′(A′A)+ = (A′A)+A′.

(iv) A+′ = (A′A)+A = A(A′A)+.

2.1.3 Anti-eigenvaluesThe information in this section and more information about anti-eigenvalues can forexample be found in [Rao, 2005].

LetA ∈Mp,p be a positive definite matrix. Then the cosine of the angle θ betweena vector x andAx is

cos θ =x′Ax√

(x′x)(xA2x),

which has the value 1 if x is an eigenvector of A, i.e., Ax = λx for some λ. Theconcept of anti-eigenvalue arise when the following question is raised: For what vectorx, does cos θ takes the minimum value or the angle of separation between x and Axis a maximum. The minimum value is attained at

x =

√λpx1 ±

√λ1xp√

λ1 + λp,

and the minimum value equals

µ1 =2√λ1λp

λ1 + λp,

that can be seen in [Rao, 2005]. The number µ1 above is called the first anti-eigenvalueof A and for µ1 there exist a pair of vectors, u1 and u2, that is called the first anti-eigenvectors of A. It is possible to continue to define more anti-eigenvalues but onlythe first one is of interest in this thesis. The above can be summarized in the followingdefinition and theorem.


Definition 13. LetA ∈Mp,p be a positive definite matrix with eigenvalues λ1 ≥ · · · ≥λp > 0. The first anti-eigenvalue ofA is defined as

µ1 =2√λ1λp

λ1 + λp,

with the corresponding first anti-eigenvectors

x =

√λpx1 ±

√λ1xp√

λ1 + λp= (u1,u2)

where x1,xp are the corresponding eigenvectors of λ1 and λp.

Theorem 3. Let A ∈ Mp,p be a positive definite matrix with the first anti-eigenvalueµ1. Then

cos θ =x′Ax√

(x′x)(xA2x)≥ µ1.

2.1.4 Kronecker productDefinition 14. LetA = (aij) ∈Mp,q andB = (bij) ∈Mr,s. Then the pr×qs-matrixA⊗B is called a direct product (Kronecker product) of the matricesA andB, if

A⊗B = [aijB], i = 1, . . . , p; j = 1, . . . , q

where

aijB =

aijb11 . . . aijb1s...

. . ....

aijbr1 . . . aijbrs

.

Definition 15. Let A = (a1, . . . ,aq) ∈ Mp,q , where ai, i = 1, . . . , q, is the i-thcolumn vector. The vectorization operator vec() is an operator from Rp×q to Rpq , with

vec(A) =

a1

...aq

.

Proposition 2. LetA ∈Mp,q ,B ∈Mq,r and C ∈Mr,s. Then

vecABC = (C ′ ⊗A)vecB.

2.1.5 Matrix derivativesIn this section some elementary matrix derivatives will be presented. These results canbe found in [Kollo and von Rosen, 2011].

Proposition 3 (Chain rule). Let Z ∈Mm,n be a function of Y and Y be a function ofX . Then

dZ

dX=dY

dX

dZ

dY.

2.2. Statistics 7

Proposition 4. LetA andB be constant matrices of proper sizes. Then

d(AXB)

dX= B ⊗A′

Proposition 5. Let all matrices be of proper sizes and the elements of X mathemati-cally independent and variable. Then

dtr(AXBX ′)

dX= vecA′XB′ + vecAXB.

Remark 1. In order to define matrix derivate some restrictions are made. When amatrix derivate is a derivate of a matrix Y by another matrixX a common assumptionis that the matrixX is mathematically independent and variable. It means that,

1. the elements ofX are non-constant,

2. no two or more elements are functionally dependent.

The restrictiveness exist because it excludes certain classes of matrices such as sym-metric matrices, diagonal matrices, triangular matrices among others fromX .

2.2 StatisticsThis section presents some general definitions in statistics as well as some results inmultivariate statistics and an introduction to the Growth Curve model. The ambition isto give enough substance to establish a basis for the understanding of this thesis. Theresults will be given without proof but the proofs can be found in the references foreach subsection.

2.2.1 General conceptsHere follows some definitions and properties regarding estimation in statistics. Thesedefinitions and theorems comes from [Casella, 2002].

At first we present the definition of an unbiased estimator.

Definition 16. The bias of a point estimator θ of a parameter θ is the difference be-tween the expected value of θ and the true value θ; that is, Biasθ(θ) = E |θ − θ|.An estimator whose bias is identically (in θ) equal to 0 is called unbiased and satisfiesE[θ] = θ for all θ.

Here follows a definition of efficiency between two unbiased estimators.

Definition 17. Let T 1 and T 2 be two unbiased estimators of the parameter T . ThenT 1 is multivariate more efficient then T 2 if the covariance matrix cov(T 1) ≤ cov(T 2)for all possible values of T .

Note 1. In the definition above cov(T 1) ≤ cov(T 2) means that the matrix cov(T 1)is smaller than cov(T 2) with regard to Loewner’s partial ordering. It means that thematrix cov(T 2) − cov(T 1) is positive semi-definite. If cov(T 1) < cov(T 2) it meansthat the matrix cov(T 2)− cov(T 1) instead is positive definite.


2.2.2 The Maximum Likelihood EstimatorIn this section the estimating procedure, called maximum likelihood, is examinedcloser and some results regarding properties of these estimators are presented. Theseresults can be found in [Casella, 2002].

First we present a formal definition of the maximum likelihood estimator (MLE) andthe likelihood function.

Definition 18. A likelihood function L(θ) is the probability density for the occurrenceof a sample configuration x1, . . . , xn given that probability density f(x;θ) with regardto parameter θ is known,

L(θ) = f(x1;θ) . . . f(xn;θ).

Definition 19. For each sample point x, let θ(x) be a parameter value at which thelikelihood function, denoted L(θ), attains its maximum as a function of θ, with x heldfixed. A maximum likelihood estimator (MLE) of the parameter θ based on a sampleX is θ(X).

The main reasons why the MLEs are popular are the following theorems regardingtheir asymptotic properties.

Theorem 4. Let X1,X2, . . . , be identically independent distributed f(x|θ), and let

L(θ|x) =n∏i=1

f(xi|θ) be the likelihood function. Let θ denote the MLE of θ. Let τ (θ)

be a continuous function of θ. Under some regularity conditions, that are described inthe note below, on f(x|θ) and hence, L(θ|x), for every ε > 0 and every θ ∈ Θ

limn→∞

Pθ(|τ (θ)− τ (θ)| ≥ ε) = 0.

That is, τ (θ) is consistent estimator τ (θ).

Theorem 5. Let X1,X2, . . . , be identically independent distributed f(x|θ), let θdenote the MLE of θ, and let τ (θ) be a continuous function of θ. Under some regularityconditions, that are described in the note below, on f(x|θ) and hence L(θ|x),

√n(τ (θ)− τ (θ))→ N(0, varLB(θ)),

where varLB(θ) is the Cramer-Rao Lower Bound. That is, τ (θ) is a consistent, asymp-totically unbiased and asymptotically efficient estimator of τ (θ).

Note 2. The regularity conditions in the two previous theorems are minor restrictionson the probability density function. Since the underlying distribution in this thesis is thenormal distribution, the regularity conditions poses no restriction in the developmentof results in this thesis. The specifics can be found in Miscellanea 10.6.2 in [Casella,2002].

2.2.3 The matrix normal distributionThis section presents some definitions and estimators for the normal distribution gener-alized into the multivariate and matrix normal distribution. These results can be foundin [Casella, 2002, Kollo and von Rosen, 2011, McCulloch et al., 2008, Srivastava andKhatri, 1979, Muirhead, 2005].

Here follows the definition of a univariate normal distribution.

2.2. Statistics 9

Definition 20. A random variable X is said to have a univariate normal distribution,i.e., X ∼ N(µ, σ2), if the probability density function of X is,

f(x) =1√

2πσ2exp

(− (x− µ)2

2σ2

).

The univariate model above is often generalized in a multivariate setting. That leadsto the following definition of a multivariate (vector-) normal distribution.

Definition 21. A random vector xm is said to have a m-variate normal distribution if,for every a ∈ Rm, the distribution of a′x is univariate normal distributed.

Theorem 6. If x has an m-variate normal distribution then both µ = E[x] and Σ =cov(x) exist and the distribution of x is determined by µ and Σ.

The following estimators are frequently used for a multivariate distribution.

Theorem 7. Let x1, . . . ,xn be random sample from a multivariate normal populationwith mean µ and covariance Σ. Then,

µ = x =1

n

n∑i=1

xi and Σ =1

n− 1

n∑i=1

(xi − x)(xi − x)′,

are the MLE respective corrected MLE of µ and Σ.

Theorem 8. The estimators in Theorem 7 are sufficient, consistent and unbiased withrespect to the multivariate normal distribution.

The matrix normal distribution adds another dimension of dependence in the dataand is often used to model multivariate datasets. Here follows its definition.

Definition 22. Let Σ = ττ ′ and Ψ = γγ′, where τ ∈Mp,r and γ ∈Mn,s. A matrixX ∈ Mp,n is said to be matrix normally distributed with parameters M ,Σ and Ψ, ifit has the same distribution as,

M + τUγ′,

where M ∈ Mp,n is non-random and U ∈ Mr,s consists of s independent and iden-tically distributed (i.i.d.) Nr(0, I) vectors ui, i = 1, 2, . . . , s. If X ∈ Mp,n is matrixnormally distributed, this will be denotedX ∼ Np,n(M ,Σ,Ψ).

The matrix normal distribution is nothing more than a multivariate normal distribu-tion with some additional structure on the covariance matrix, i.e., letX ∼ Np,n(M ,Σ,Ψ), then vecX = x ∼ Npn(vecM ,Ω), where Ω = Ψ⊗Σ.

2.2.4 Growth Curve modelIn this subsection The Growth Curve model will be presented. It was introduced by[Potthoff and Roy, 1964] and is defined as follows.

Definition 23. LetX ∈Mp,N and ξ ∈Mq,m be the observation and unknown param-eter matrices, respectively, and letB ∈Mp,q andA ∈Mm,N be the known within andbetween individual design matrices, respectively. Suppose that q ≤ p and r + p ≤ N ,where r = rank(A). The Growth Curve model is given by

X = BξA+ ε,


where the columns of ε are assumed to be independently p-variate normally distributedwith mean zero and an unknown positive definite covariance matrix Σ, i.e.,

ε ∼ Np,N (0,Σ, IN ).

The MLEs for the parameters ξ and Σ can be found in multiple places in the liter-ature but were first presented by [Khatri, 1966]. Here follows the estimators as well asthe covariance matrix of ξ which can be found in [Kollo and von Rosen, 2011].

Theorem 9. The MLEs of The Growth Curve model are

ξMLE = (B′V −1B)−1B′V −1XA′(AA′)−1

and

NΣMLE = (X −BξMLEA)(X −BξMLEA)′

where V = X(I − A′(AA′)−1A)X ′, A and B have full rank. The covariancematrix of ξMLE is given by

cov(ξMLE) =n− 1

n− 1− (p− q)(AA′)−1 ⊗ (B′Σ−1B)−1.

In [Potthoff and Roy, 1964] the following example of data is presented that theauthors modeled as a Growth Curve model.

Example 1. A certain measurement in a dental study was made on each of 11 girlsand 16 boys at the ages 8, 10, 12 and 14. We will assume that the covariance matrix ofthe 4 correlated observations is the same for all 27 individuals.

GirlsIndividual/Age in years 8 10 12 14

1 21 20 21.5 232 21 21.5 24 25.53 20.5 24 24.5 264 23.5 24.5 25 26.55 21.5 23 22.5 23.56 20 21 21 22.57 21.5 22.5 23 258 23 23 23.5 249 20 21 22 21.5

10 16.5 19 19 19.511 24.5 25 28 28

Mean 21.18 22.23 23.09 24.09

2.2. Statistics 11

BoysIndividual/Age in years 8 10 12 14

1 26 25 29 312 21.5 22.5 29 313 21.5 22.5 23 26.54 25.5 27.5 26.5 275 20 23.5 22.5 266 24.5 25.5 27 28.57 22 22 24.5 26.58 24 21.5 24.5 26.59 23 20.5 31 26

10 27.5 28 31 31.511 23 23 23.5 2512 21.5 23.5 24 2813 17 24.5 26 29.514 22.5 25.5 25.5 2615 23 24.5 26 3016 22 21.5 23.5 25

Mean 22.87 23.81 25.72 27.47

This is a typical application for the Growth Curve model. We have a relation be-tween observations of certain individuals, the difference in age between the observa-tions, as well as properties of the individuals themselves.

Example 2. Since the inverse of V is used when estimating the MLE of ξ problems canarise when the dimension of p is close to N . When p gets closer to N , the conditionnumber decrease and the numerical properties of the inverse of V becomes increas-ingly worse. This also affect the covariance matrix of the estimator of ξ, that increasewhen p is close to N and the estimator loses precision. In Figure 2.1 is a plot contain-ing the reciprocal condition number of V for a Growth Curve model. We assume thatX = BξA+E ∼ Np(BξA,Σ, IN ) whereN = 100,m = 20, q = 20 and p rangingfrom 20− 80. It can be seen as p gets closer to N the condition number decrease. Forthe problem were Σ and B generated for each sample since their dimension changeswith p. The matrices ξ andA were constant without any structure.

Since ξ is the mean parameters of the Growth Curve model it is possible to findmultiple unbiased estimators. One in particular that will be studied further is the un-weighted estimator of ξ. It will be shown that this estimator does not require the inverseof V hence its numerical properties are better when p is close to N .

2.2.5 Wishart distributionIn this subsection the Wishart distribution and some useful theorems that will be usedin this thesis will be introduced. These results can be found in [Kollo and von Rosen,2011].

Definition 24. An random matrix W ∈ Mp,p is said to be Wishart distributed if andonly if W = XX ′ for some matrix X , where X ∼ Np,n(M ,Σ, I),Σ ≥ 0. If M =0, it is called a central Wishart distribution that will be denoted W ∼ Wp(Σ, n),


20 30 40 50 60 70 80

p

0

1

2

3

4

5

6

7

8

Recip

rocal C

onditio

n N

um

ber

×10 -5

Figure 2.1: Plot over the Reciprocal Condition Number of V .

and if M 6= 0 it is called a non-central Wishart distribution which will be denotedWp(Σ, n,Q), whereQ = MM ′.

The Wishart distribution is an generalization to multiple dimensions of the χ2-distribution. These distributions are useful when estimating covariance matrices inmultivariate statistics. The Wishart distribution stays Wishart distributed under lineartransformations, a property that is summarized in the following Theorem.

Theorem 10. LetW ∼Wp(Σ, n,Q) andA ∈Mq,p. Then

AWA′ ∼Wq(AΣA′, n,AQA).

From the Theorem above it is possible to see the connection to a χ2-distribution forthe following special case.

Corollary 1. LetW ∼Wp(Σ, n) and a ∈Mp,1. Then

a′Wa

a′Σa∼ χ2(n).

A combination of independent χ2-distributions is F -distributed which can be seenbelow.

Definition 25. Let X ∼ χ2(m) and Y ∼ χ2(n) be independent random variables.Then X/m

Y/n ∼ F (m,n).

Chapter 3

Unweighted estimator forGrowth Curve model underR(Σ−1B) = R(B)

This chapter studies the unweighted estimator for a Growth Curve model under certainconditions of Σ and B studied by [Srivastava and Singull, 2016a, Srivastava and Sin-gull, 2016b]. It presents a new proof of the MLE under these conditions and introducestwo useful cases with intraclass respective generalized intraclass where this conditionoccurs in practice.

3.1 Unweighted vs weighted mean estimatorCharacteristic for the Growth Curve model is the post-matrix A that describes thecovariance between rows of the result matrix. This type of modeling assumes that theobserved objects have equal covariance between their observations. One common wayto use this is to estimate the covariance matrix for the relationship between the resultover time. Growth Curves for children can for example be represented in this fashionand a model that can estimate the length of a child, with certain attributes, over time canbe presented. The General Linear model only has the relationship between columns,in the previous example that is the relationship between the children’s attributes. Thedesign matrix B describes the relationship between rows. But this extra ability needsto be addressed when estimating the mean. This special property is the reason that theMLE is a weighted estimator.

The MLEs for the parameters ξ and Σ can be found in multiple places in the liter-ature but were first presented by [Khatri, 1966] and are described in Chapter 2.

The estimator ξMLE is called weighted because it contains the matrix V −1. Thisestimator and the covariance estimator that follows from it are complex expressionssince it contains an inverse that is unstable when p is close to n, this makes the covari-ance matrix for the estimator larger with respec to Loewner’s partial ordering. Prob-lems can arise since the variance of the mean can be extremely high which render theestimations useless. Especially when it is compared with the mean estimator for a Gen-eral Linear model. [Srivastava and Singull, 2016a, Srivastava and Singull, 2016b] hasshown that an unweighted version of the estimators above can be motivated for tests

Karlsson, 2016. 13

14 Chapter 3. Unweighted estimator for Growth Curve model under R(Σ−1B) = R(B)

and estimations. This estimator has better properties when V is close to singular.

Theorem 11. The unweighted estimator ξUW is an unbiased estimator of ξ and isdefined as

ξUW = (B′B)−1B′XA′(AA′)−1

andNΣUW = (X −BξUWA)(X −BξUWA)′.

The covariance matrix for the unweighted mean estimator is given by

cov(ξUW ) = (AA′)−1 ⊗ (B′B)−1B′ΣB(B′B)−1.

The difference between the MLE and the unweighted estimator can be seen in thelack of V or its inverse that heavily simplifies the expressions.

This chapter will follow onto the lines of thought in [Srivastava and Singull, 2016a]and view some cases where it at least is performing equally well before showing, in thenext chapter, conditions for when it performs better.

3.2 Previous resultsIn order to compare the efficiency of the unweighted and the weighted mean estimator,their respective covariance matrices must be studied. In [Baksalary and Puntanen,1991] the following inequality is presented which is a generalization of the inequalitypublished in [Khatri, 1966, Gaffke and Krafft, 1977]. This inequality can be used tocompare the covariance matrices of the estimators of interest but also see when theycoincide.

Theorem 12. Let Σ ∈ Mp,p be a positive definite matrix and B ∈ Mp,q be of fullcolumn rank q ≤ p. Then

(B′Σ−1B)−1 ≤ (B′B)−1B′ΣB(B′B)−1,

where≤ is regard to Loewner’s partial ordering, with equality if and only ifR(Σ−B) =R(B).

The inequality above can also be rewritten in the following way using Proposition 1.

(B′Σ−1B)−1 ≤ (B′B)−1B′ΣB(B′B)−1

⇐⇒ (B′Σ−1B)−1 ≤ B+ΣB+′

In order to establish the condition R(Σ−1B) = R(B) assumptions need to be madeon Σ and B. In practice this essentially means that columns spanning Σ, except forthe diagonal matrix part of Σ, needs to be part ofB. This is certainly an restriction butspecific vectors such as 1 often occurs in the design matrix B. This means that thereare common applications where this is applicable. Some examples will be presentedin following sections. There have been some comparisons between models where un-weighted respective weighted estimators has been used with different types of models.But in [Srivastava and Singull, 2016a] those ideas were applied to a Growth Curvemodel.

3.3. Maximum likelihood estimator 15

3.3 Maximum likelihood estimatorThe MLEs for a Growth Curve model under some special conditions has been proposedby [Khatri, 1966, Reinsel, 1982]. The proofs has been done using least square estima-tors and the requirements has not been clearly specified. In this section follows an clearpresentation and proof for the condition where the MLE is the unweighted estimator.The approach of the proof is based on the usual Maximum Likelihood method.

Theorem 13. The MLEs for ξ and Σ in a Growth Curve model with conditionR(Σ−1B) = R(B) are

ξMLE = (B′B)−1B′XA′(AA′)−1

ΣMLE = (X −BξUWA)(X −BξUWA)′

Proof. For the Growth Curve model the likelihood function with respect to ξ and Σcan be obtained using the the density of the matrix normal distribution that yields

L(ξ,Σ) = (2π)−12pn|Σ|− 1

2nexp(−1

2tr(Σ−1(X −BξA)(X −BξA)′)

)).

Since exp is a monotone function the only part of L that depends on ξ istr(Σ−1(X −BξA)(X −BξA)′)

). Thus by maximizing that expression with re-

gard to ξ will maximize the likelihood function. If you derivetr(Σ−1(X −BξA)(X −BξA)′)

)with regard to the matrix ξ with the chain rule,

see Proposition 3, you reach

d(X −BξA)

dξ

d(tr(Σ−1(X −BξA)(X −BξA)′

))

d(X −BξA).

Using the rules of matrix derivation of Proposition 4 and Proposition 5, it equals

(A⊗B′)(vec(Σ−1(X −BξA)) + vec(Σ−1(X −BξA))

)= 2(A⊗B′)vec(Σ−1(X −BξA)) = 2vec(B′Σ−1(X −BξA)A′)

where Proposition 2 is used for the last equality.Equating this expression to zero yield the linear equation in ξ

B′Σ−1(X −BξA)A′ = 0

⇐⇒ BΣ−1XA′ = B′Σ−1BξAA′.

From this equation the solution for ξ can be simplified using the fact the whenR(Σ−1B) =R(B) the following inverse (B′Σ−1B)−1 = B+ΣB+′.

BΣ−1XA′ = B′Σ−1BξAA′

⇐⇒ B+ΣB+′B′Σ−1XA′ = ξAA′.

The expressionB+′B′ is a projector ontoC(B) it is equivalent to Σ−1B(Σ−1B)+ =Σ−1BB+Σ and it is possible to rewrite it as

B+ΣΣ−1BB+ΣΣ−1XA′ = ξAA′


⇐⇒ B+BB+XA′ = ξAA′

⇐⇒ B+XA′ = ξAA′.

with the help of the properties of the Moore-Penrose inverse, see Proposition 1. TheMLE for ξ can now be reached with

ξMLE = B+XA′(AA′)+.

The MLE for Σ will then simply be the following

ΣMLE = (X −BξMLEA)(X −BξMLEA)′ = V + (I −AA+)V (I −AA+)

where V = X(I −A(AA′)+A)X ′ in accordance with the standard MLE. In orderto conclude the proof it needs to be shown that the estimators actually achieve a max-imum. This can be done in a similar fashion as for the usual MLE in [Kollo and vonRosen, 2011] and will not be done here.

3.4 Intraclass modelA common vector in the design matrixB is the 1-vector. This comes from the fact thatstatistic models often has an offset from origo as a static parameter. The interpretationof this parameter is the average of all samples if all other factors are set to zero.

In this section we will see that if a covariance matrix of an Growth Curve model hasan intraclass structure and the design matrixB contains an 1-vector thenR(Σ−1B) =R(B) holds.

An intraclass structure on the covariance matrix arise in different applications instatistics, e.g., cluster randomized studies, and is defined in the following way

Definition 26. An covariance matrix Σ is said to be of intraclass if it can be writtenas

ΣIC = σ2((1− ρ)I + ρ11′)

where − 1p−1 < ρ < 1 and σ > 0.

The inverse of an covariance matrix with intraclass structure is also an intraclasswhich is summarized in the following lemma that will be used in the proof below.

Lemma 2. For an intraclass covariance matrix ΣIC defined above the inverse can bewritten as

Σ−1IC =1

σ2(1− ρ)

(I − ρ

1 + (n− 1)ρ11′)

which is also an intraclass matrix.

The result that the conditionR(Σ−1B) = R(B) holds for an intraclass covariancematrix where an 1 is included in the design matrix is presented in the following theorem

Theorem 14. Assume that the matrix B contains an 1-vector and ΣIC is intraclassthenR(Σ−1ICB) = R(B).

3.5. Generalized intraclass model 17

Proof. Without loss of generality the problem can be rearranged such that the designmatrix B can be written as B = [1 B0]. The inverse of ΣIC can, with the addedassumptions onB, be simplified to

Σ−1ICB =1

σ2(1− ρ)

(I − ρ

1 + (n− 1)ρ11′)

[1 B0]

=1

σ2(1− ρ)

[(1− (p− 1)ρ

1 + (p− 1)ρ

)1 B0 −

ρ

1 + (p− 1)ρ11′B0

].

Since C = ρ1+(p−1)ρ11′B0 is of rank 1 there exist a αi ∈ R such that ci = αi(1 −

(p−1)ρ1+(p−1)ρ )1 for all i where bi is the column i inB0. Thus

R(Σ−1ICB) = R(

1

σ2(1− ρ)

[(1− (p− 1)ρ

1 + (p− 1)ρ

)1 B0 −

ρ

1 + (p− 1)ρ11′B0

])= R([1 B0]) = R(B).

That concludes the proof.

These results can be extended for a combination of known matrices and correspond-ing w-vectors.

3.5 Generalized intraclass modelIn this section it will be shown that the results regarding intraclass can be extended to acovariance matrix of an Growth Curve model with an generalized intraclass structure.This means that instead of assuming that the one-vector 1 exist in the design matrixBa known general vector w from the design matrix B. Hence this is a generalization ofthe model in the previous section.

The generalized intraclass covariance matrix can be defined in the following way.

Definition 27. An covariance matrix Σ is said to be of generalized intraclass if it canbe written as

ΣGIC = σ2((1− ρ)G+ ρww′),

where − 1p−1 ≤ ρ < 1, σ > 0 and G is a known positive definite matrix and w is a

known vector.

The inverse of an generalized intraclass covariance matrix is also generalized intra-class which is summarized in the following lemma and will be used in the proof below.

Lemma 3. For an generalized intraclass covariance matrix ΣGIC defined above itsinverse can be written as

Σ−1GIC =1

σ2(1− ρ)

(G−1 − ρ

1 + (n− 1)ρww′

)which is also an generalized intraclass matrix.

The proof follows directly since the matrixG and vectorW are known.The result that the condition R(Σ−1B) = R(B) holds for an intraclass covari-

ance matrix where an w is included in the design matrix is presented in the followingtheorem


Theorem 15. Assume that the design matrixB contains an known vectorw and ΣGIC

is generalized intraclass with a known positive definite matrix G and vector w, thenR(Σ−1GICB) = R(B).

Proof. Without loss of generality the problem can be rearranged such that the designmatrix B can be written as B = [w B0]. The inverse of ΣGIC can with the addedassumptions onB be simplified to

Σ−1GICB =1

σ2(1− ρ)

(G−1 − ρ

1 + (n− 1)ρww′

)[w B0]

=1

σ2(1− ρ)

[(G−1 − (p− 1)ρ

1 + (p− 1)ρ

)w G−1B0 −

ρ

1 + (p− 1)ρww′B0

].

SinceC = ρ1+(p−1)ρww

′B0 is of rank 1 there exist a αi ∈ R such that ci = αi(G−1−

(p−1)ρ1+(p−1)ρ )w for all i, where ci is a row in C. Thus

R(Σ−1GICB) = R(

1

σ2(1− ρ)

[(G−1 − (p− 1)ρ

1 + (p− 1)ρ

)w G−1B0 −C

])= R([w B0]) = R(B).

Chapter 4

Unweighted estimator forGrowth Curve model withoutassumptions onB

4.1 Background and previous resultsIn the previous chapter we developed some conditions when the unweighted estimatoris the MLE for a Growth Curve model. It is also possible to compare the unweightedestimator with the weighted estimator of ξ without any assumptions on B. This hasbeen done for a singular linear model in [Liu and Neudecker, 1997, Rao, 2005] but notfor a Growth Curve model. It will be shown that the loss of precision will depend onthe eigenvalues of Σ as well as the dimensions of the problem and in plenty of casesoutside R(Σ−1B) = R(B), the unweighted estimator of ξ will still be preferred.An hypothesis testing procedure for this is also presented when the eigenvector ofthe covariance matrix is known. This is applicable when the covariance matrix, forexample, have a circular Toeplitz structure.

Below follows an linear algebra inequality that is an matrix generalization of Kan-torovich inequality. It is presented in [Liu and Neudecker, 1997] and presents an upperbound on the equality in Theorem 12 with regard to Σ:s eigenvalues and it will serveas a basis for the comparisons between the estimators in this chapter.

Theorem 16. Let Σ ∈Mp,p be a positive definite matrix with eigenvalues λ1 ≥ · · · ≥λp > 0 and U ∈Mp,q with full column rank q ≤ p. Then

U+ΣU+′ ≤ 1

µ21

(U ′Σ−1U)−1

where µ1 is the first anti-eigenvalue of Σ given by µ1 =2√λ1λp

λ1+λpor its reciprocal.

4.2 Eigenvalues and coefficientsAs can be seen in the previous section an important part of the comparisons between theestimators are done using the first anti-eigenvalue or indirect the smallest and largest

Karlsson, 2016. 19

20 Chapter 4. Unweighted estimator for Growth Curve model without assumptions on B

eigenvalue. Combining the results of Theorem 9, Theorem 11 and Theorem 16 thefollowing theorem is yielded for when the unweighted estimators are preferred.

Theorem 17. If(λ1 + λp)

2

4λ1λp≤ n− 1

n− 1− (p− q), (4.1)

then the unweighted estimator ofB has a smaller covariance matrix, with accordanceto Loewners partial ordering, than the MLE.

For a large number of observations N the weighted mean estimator, the MLE, ofξ has a smaller covariance matrix. This is expected since it is the MLE and withregard to Theorem 5 it should be the unbiased estimator with the smallest covariancematrix whenN tends to infinity. However, when p is close toN the factor (n−1)

(n−1)−(p−q)increases and the covariance matrix can be larger for the weighted estimator comparedto the unweighted estimator with regard to Loewner’s partial ordering.

4.3 Unweighted vs weighted estimator with known andfixed eigenvectors

There exist many covariance matrices that the conditions R(Σ−1B) = R(B) arenot fulfilled for. Since the condition heavily restricts Σ:s span it also restricts itseigenspace. Therefore untrivial eigenspaces are difficult to handle from the perspectiveof Theorem 12.

In this section we will see that it is possible to establish distributions for eigen-values and ratios of them when the covariance matrix has fixed eigenvectors. Theseresults can be used together with Theorem 17 in order to determine if the unweightedrespective weighted estimator performs better. This section will use the notation ΣFE

for a covariance matrix with known and fixed eigenvectors. We will denote the knownand fixed eigenvectors with γk with the corresponding eigenvalues denoted λk.

4.3.1 Distributions of eigenvalues

In matrix theory, eigenvalues and their eigenvectors play an important role for the anal-ysis of matrices. In statistics there exist some general results regarding eigenvaluesand eigenvectors. The usual problem is that they require asymptotic properties to beuseful or their exact expressions are complicated. But in this section it will be shownthat under some common structures for a covariance matrix useful theorems regardingeigenvalues can be established. These theorems can be used to differentiate betweenthe unweighted and weighted estimators in a Growth Curve model and make inferenceabout the first anti-eigenvalue.

Some progress can be made if the estimator of a covariance matrix follow a Wishartdistribution. In Theorem 3.3.12 in [Srivastava and Khatri, 1979] the relationship be-tween the Wishart and the χ2−distribution can be seen. The following theorem is aspecial case where the Wishart distribution is central.

Theorem 18. Let W ∼ Wp(D, n), where W = (Wij) and D = (Dii) is a diagonalmatrix. Then Wii/Dii are independently distributed, each having the distribution of aχ2 random variable with n degrees of freedom.

4.3. Unweighted vs weighted estimator with known and fixed eigenvectors 21

When the eigenvectors are fixed, the following estimator is proposed for estimatingeigenvalues.

Proposition 6. Let W ∼ Wp(ΣFE , n), where ΣFE is a covariance matrix withknown and fixed eigenvectors. Let λ1 ≥ · · · ≥ λp and v1, . . . ,vp be the eigenvaluesand corresponding eigenvectors of Σ. Then the proposed estimator is λi = v′iWvi.

Since the eigenvalue λi of a matrix Σ can be defined as v′iΣvi, this is a logical es-timator. The usual problem is that the estimators of the eigenvectors vi themselves arerandom variables. This makes it difficult to derive something useful from the estimatorλi above. But if the covariance matrix has known and fixed eigenvectors the followingtheorem for distribution of the proposed estimator can be established.

Theorem 19. The estimators λi from Proposition 6 are independent and follow a χ2-distribution with n degrees of freedom.

Proof. Let v1, . . . ,vp denote the known eigenvectors for the covariance matrix ΣFE .Since every covariance matrix has a spectral decomposition, see Theorem 1, it canbe rewritten as Σ = V DV ′, where V = (v1, . . . ,vp)

′ and D is a diagonal matrixcomposed of Σ:s eigenvalues. The matrix L = diag(λ1, . . . , λp) = V ′WV . Ac-cording to Theorem 10 it is Wishart distributed with V ′ΣV = V ′V DV V ′ = D =diag(λ1, . . . , λp) and n degrees of freedom. In Theorem 18 it can be seen that theestimators λi ∼ χ2(n) are independently distributed.

Lemma 4. For two λi and λj with i 6= j from Theorem 19, the random variable λi

λjis

F -distributed with (n, n) degrees of freedom.

Proof. Since λi and λj are independent and follow a χ2-distribution with n degreesof freedom. It follows from Definition 25 that the expression λi

λjis F-distributed with

(n, n) degrees of freedom.

4.3.2 Hypothesis testing and confidence intervalSome theorems regarding eigenvalue estimation as well as their relationship and distri-bution have now been presented. This leads back to the starting point of this discussion.When does the weighted respective unweighted mean estimator do better in a GrowthCurve model? In Theorem 16 the answer to this question depend on the first anti-eigenvalue of the covariance matrix, or indirectly the largest and smallest eigenvalue.In order to make inference regarding which estimator performs best we need to makeinference about the first anti-eigenvalue. However, the theory of anti-eigenvalues, orspecifically the first anti-eigenvalue, has not been studied much in statistics. In orderto better understand this some simulations will be done in Chapter 5. From the defini-tion of the first anti-eigenvalue, we can use the smallest and largest eigenvalue to makeinference regarding it. From equation (4.1) follows a condition when the unweightedestimator is preferred. Now the question regarding which estimator performs best canbe answered.

The condition (λ1+λp)2

4λ1λp≤ n−1

n−1−(p−q) can be rewritten in the following way,

λ14λp

+λp4λ1

+1

2≤ n− 1

n− 1− (p− q)⇐⇒ l +

1

l≤ 4(n− 1)

n− 1− (p− q)− 2,


where l = λ1

λpis the quotient of the largest and smallest eigenvalue. Hence, it is

possible to determine which estimator performs best with help of the ratio between thelargest and smallest eigenvalue. The distribution of the ratio is F-distributed when theeigenvectors are fixed and known from Lemma 4. Since the distribution is known, it ispossible to calculate a confidence interval as well as hypothesis testing if wanted.

The main equation for this inference is

λp/λp

λ1/λ1=λp/λ1λp/λ1

∼ F (n, n),

where n is the degrees of freedom of the covariance matrix.An example of an hypothesis test will now be proposed.

Example 3. Construct the test statistic according to the following T =(λ1+λp)

2

4λ1λp=

λ1

4λp+

λp

4λ1+ 1

2 that is, the first anti-eigenvalue squared. Then the hypothesis if theunweighted or weigthed mean estimator performs best can be derived from equation(4.1).

H0 :(λ1 + λp)

2

4λ1λp=

n− kn− k − (p− q)

vs H1 :(λ1 + λp)

2

4λ1λp6= n− kn− k − (p− q)

The limits of the test can be derived as follows: Assume that P (a ≤ λp/λ1 ≤ b) =

P ( 1b ≤ λ1/λp ≤

1a ) = 1− α. Combining these will result in TCI = P (a4 + 1

4b + 12 ≤

T ≤ b4 + 1

4a + 12 ) = 1− α. Hence if

T ≤ a

4+

1

4b+

1

2

the unweighted estimator performs better than the weighted. If instead

T ≥ b

4+

1

4a+

1

2

the weighted estimator performs better.

4.3.3 Symmetric circular Toeplitz structured covariance matrixA test has now been proposed where the covariance matrix has known and fixed eigen-vectors. This property arises in different structures. One of the covariance matricescarrying this structure is the symmetric circular Toeplitz matrix. This kind of matrixhas a known structure with eigenvectors that only depend on the size of the matrix,not the parameters. Therefore it is especially suitable for the conditions given in Theo-rem 16.

Here follows the definition and expressions for eigenvalues and eigenvectors for asymmetric circular Toeplitz matrix that can, for example, be found in [Olkin and Press,1969].

Definition 28. A covariance matrix T ∈ Mp,p is called symmetric circular Toeplitzwhen

T =

t0 t1 t2 . . . t1t1 t0 t1 . . . t2

t2 t1 t0. . .

......

. . .. . .

. . . t1t1 t2 . . . t1 t0

,

4.3. Unweighted vs weighted estimator with known and fixed eigenvectors 23

where tij = t|i−j|. The matrix T depends on p+32 parameters when p is odd and p+2

2when p is even.

Theorem 20. A symmetric positive definite circular Toeplitz matrix has eigenvalues

λk =

p∑i=1

tk cos(2πp−1(k − 1)(p− i+ 1))

and corresponding eigenvectors γk defined by

γik = p−12

[cos(2πp−1(i− 1)(k − 1)) + sin(2πp−1(i− 1)(k − 1))

]where i = 1, . . . , p and k = 1, . . . , p

Since the eigenvectors γk does not depend on the parameters of the matrix it ispossible to use the established theorems to make statistical inference.

Two other covariance matrices that has known and fixed eigenvectors are the intr-aclass and generalized intraclass covariance matrix described in the previous chapter.These are actually specific examples of the symmetric circular Toeplitz covariance ma-trix.

Chapter 5

Simulations

In this chapter some simulations relevant to this thesis will be presented. There will besome simulations regarding the distribution of anti-eigenvalues as well as most relevanttheorems from this thesis. The purpose is to simulate the effects of the statements andgive an idea of the performance and significance of them.

5.1 Distribution of the first anti-eigenvalueIn this section some histograms regarding the distribution of the first anti-eigenvaluedefined in Definition 13 will be presented. First an simulation will be made when thecovariance matrix has a symmetric circular Toeplitz structure, Definition 28, after thatan unstructured covariance matrix will be studied.

5.1.1 Symmetric circular Toeplitz covariance matrix

In this simulation it is assumed that x ∼ N(18,ΣCT ), where

ΣCT =

8 0 2 0 0 0 2 00 8 0 2 0 0 0 22 0 8 0 2 0 0 00 2 0 8 0 2 0 00 0 2 0 8 0 2 00 0 0 2 0 8 0 22 0 0 0 2 0 8 00 2 0 0 0 2 0 8

has a symmetric circular Toeplitz structure. A simulation of 10 000 samples of thedistribution of the first anti-eigenvalue of the estimated covariance matrix ΣCT weremade. For each sample 80 observations were made as a basis for the estimation of thecovariance matrix. This yielded the following result:

In the Figure 5.1 above it seems that the first anti-eigenvalue follows a symmetricdistribution.

Karlsson, 2016. 25

26 Chapter 5. Simulations

0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46

Value of the first anti-eigenvalue

0

100

200

300

400

500

600

700

Num

ber

of sam

ple

s

Figure 5.1: Histogram of the first anti-eigenvalue when Σ is symmetric circularToeplitz.

5.1.2 Unknown covariance matrixIn this simulation we assumed that x ∼ N(18,Σ) where

Σ =

1.70726 1.76325 1.89110 0.91624 1.74651 1.85767 1.45273 0.996761.76325 2.51264 2.12305 1.03537 2.30951 2.20502 1.97452 2.037141.89110 2.12305 3.11924 0.94676 2.11125 2.55778 1.57962 1.774440.91624 1.03537 0.94676 0.63925 1.17584 0.83902 0.75082 0.801871.74651 2.30951 2.11125 1.17584 3.14451 2.52562 1.67662 2.167321.85767 2.20502 2.55778 0.83902 2.52562 3.19486 1.76564 1.687791.45273 1.97452 1.57962 0.75082 1.67662 1.76564 1.78145 1.155480.99676 2.03714 1.77444 0.80187 2.16732 1.68779 1.15548 2.77791

.

A simulation of 10 000 samples of the distribution of the first anti-eigenvalue of theestimated covariance matrix Σ were made. For each sample 80 observations weremade as a basis for the estimation of the covariance matrix. This yielded the followingresult:

In Figure 5.2 it seems that the first anti-eigenvalue follows a symmetric distributionas in Figure 5.1 which is interesting since this indicates that there might exist moregeneral theorems regarding the distribution of anti-eigenvalues.

5.2 Eigenvalue confidence intervalIn this section confidence intervals for eigenvalues of the covariance matrix will becalculated based on the results in Chapter 4 as an example of what can be done. Weassume that X ∼ N5,n(µ,ΣCT , In) where ΣCT is a circular Toeplitz covariancematrix and n are different number of observations where

5.2. Eigenvalue confidence interval 27

1 1.2 1.4 1.6 1.8 2 2.2 2.4

Value of the first anti-eigenvalue×10 -3

0

100

200

300

400

500

600

700

Num

ber

of sam

ple

s

Figure 5.2: Histogram of the first anti-eigenvalue when Σ has no structure.

µ =

12345

and

ΣCT =

8 1 2 2 11 8 1 2 22 1 8 1 22 2 1 8 11 2 2 1 8

.

The true eigenvalues of ΣCT are

e =

5.381975.381977.618037.61803

14

.

In order to get a sense of the performance of this confidence interval the confidence lim-its are presented. The results will be calculated for different number of observations,n = 20, 50, 100 and 500. The mean lower limit, mean upper limit and the attainedsignifcance levels for 10000 confidence intervals has been calculated for every eigen-value. The simulation yielded the following results:


Information n=20 n=50 n=100 n=500Eigenvalue 1 (lower bound) 3.12691 3.76135 4.15440 4.77486Eigenvalue 1 (upper bound) 11.53382 8.37052 7.27246 6.12131Attained significance level 0.9478 0.9509 0.9490 0.9479Eigenvalue 2 (lower bound) 3.11252 3.76030 4.15451 4.77518Eigenvalue 2 (upper bound) 11.48076 8.36818 7.27267 6.12172Attained significance level 0.9539 0.9492 0.9507 0.9485Eigenvalue 3 (lower bound) 4.40181 5.32882 5.86475 6.75127Eigenvalue 3 (upper bound) 16.23641 11.85876 10.26651 8.65503Attained significance level 0.9474 0.9523 0.9520 0.9514Eigenvalue 4 (lower bound) 4.38061 5.28956 5.84687 6.74837Eigenvalue 4 (upper bound) 16.15820 11.77140 10.23521 8.65132Attained significance level 0.9512 0.9507 0.9520 0.9494Eigenvalue 5 (lower bound) 8.11224 9.76951 10.79543 12.42969Eigenvalue 5 (upper bound) 29.92258 21.74111 18.89790 15.93469Attained significance level 0.9454 0.9506 0.9519 0.9515

This confirms the theoretical result from Proposition 6 and shows that the range ofthe confidence intervals are useful in practice

5.3 Hypothesis testingIn this section we will simulate hypothesis testing based on the testing procedure inChapter 4. The purpose of this simulation is to see how well the test distinguish be-tween the unweighted and the weighted mean estimator for a Growth Curve model inpractice. We assume that X = BξA+E ∼ N6,N (BξA,ΣCT , In) where ΣCT is acircular Toeplitz covariance matrix and N are the number of observations. We modelthis as a Growth Curve model and make 10 000 tests with significance level α = 0.05per N to determine the power of the test. The true value of the H1 is displayed as ref-erence point. If it is greater than zero the unweighted estimator is preferred and belowzero the standard MLE is preferred. For this test

B =

0.24790 0.14523 0.17287 0.305860.39350 0.83045 0.02769 0.380800.88236 0.58860 0.06529 0.134570.32888 0.62643 0.45441 0.189830.82305 0.29646 0.12651 0.991400.47549 0.06442 0.27167 0.62054

,

ξ =

1 2 3 42 2.5 3 3.51 3 5 71 4 2 5

and

Σ =

25 7 10 5 10 77 25 7 10 5 1010 7 25 7 10 55 10 7 25 7 1010 5 10 7 25 77 10 5 10 7 25

,

5.3. Hypothesis testing 29

Since the size of A depends on N it was randomized for each sample. Here followsthe results

N Power of the test Value of H1 = n−1n−1−(p−q) −

(λ1+λp)2

4λ1λp

5 0.9313 0.87506 0.8283 0.54177 0.7308 0.37508 0.6392 0.27509 0.5908 0.208310 0.5128 0.160711 0.4830 0.125012 0.4534 0.097213 0.4257 0.075014 0.3901 0.056815 0.3673 0.041716 0.3423 0.028817 0.3222 0.017918 0.3078 0.008320 0.2806 -0.007422 0.2501 -0.019750 0.1981 -0.0824100 0.5124 -0.1044200 0.9308 -0.1148300 0.9961 -0.1183500 1.0000 -0.1210

This shows that the tests perform well in practice and when N goes to infinity thestandard MLE is preferred as expected. But for small values ofN the unweighted meanestimator will be preferred.

Chapter 6

Further research

Further research topics in this area could be a continuation and exploration of anti-eigenvalues in statistics. Since it is a measure of the rotation of a matrix it would be in-teresting to see its place in multivariate statistics. Another area could be the unweightedestimators and find out more of how they compare against MLEs and other commonlyused estimators, both for the Growth Curve model and other statistical models.

Karlsson, 2016. 31

32 Chapter 6. Further research

Bibliography

[Baksalary and Puntanen, 1991] Baksalary, J. and Puntanen, S. (1991). Generalizedmatrix versions of the cauchy-schwarz and kantorovich inequalities. AequationesMathematicae, 41:103–110.

[Bernstein, 2009] Bernstein, D. S. (2009). Matrix Mathematics: Theory, Facts, andFormulas. Princeton University Press.

[Casella, 2002] Casella, G. (2002). Statistical inference. Duxbury Thomson Learning,Pacific Grove, CA.

[Gaffke and Krafft, 1977] Gaffke, N. and Krafft, O. (1977). Optimum properties oflatin square designs and a matrix inequality. Math. Operationsforsch. Statist. Ser.Statist., 8:345–350.

[Horn and Johnson, 2012] Horn, R. A. and Johnson, C. R. (2012). Matrix Analysis.Cambridge University Press.

[Khatri, 1966] Khatri, C. G. (1966). A note on a manova model applied to problemsin growth curve. Annals of the Institute of Statistical Mathematics, 18:75–86.

[Khatri, 1973] Khatri, C. G. (1973). Testing some covariance structures under agrowth curve model. Journal of Multivariate Analysis, 3(1):102–116.

[Klein and Zezula, 2010] Klein, D. and Zezula, I. (2010). Orthogonal decompositionsin growth curve models. Acta et Commentationes Universitatis Tartuensis de Math-ematica, 14:35–44.

[Kollo and von Rosen, 2011] Kollo, T. and von Rosen, D. (2011). Advanced Multi-variate Statistics with Matrices. Springer.

[Liu and Neudecker, 1997] Liu, S. and Neudecker, H. (1997). Kantorovich andcauchy-schwarz inequalities involving positive semidefinite matrices, and efficiencycomparisions for a singular linear model. Linear Algebra and Its Applications,259:209–221.

[McCulloch et al., 2008] McCulloch, C. E., Searle, S. R., and Neuhaus, J. M. (2008).Generalized, Linear, and Mixed Models. Wiley-Interscience.

[Muirhead, 2005] Muirhead, R. J. (2005). Aspects of Multivariate Statistical Theory.Wiley-Interscience.

[Olkin and Press, 1969] Olkin, I. and Press, S. J. (1969). Testing and estimation for acircular stationary model. The Annals of Mathematical Statistics, 40(4):1358–1373.

Karlsson, 2016. 33

34 Bibliography

[Potthoff and Roy, 1964] Potthoff, R. F. and Roy, S. N. (1964). A generalized mul-tivariate analysis of variance model useful especially for growth curve problems.Biometrika, 51(3/4):313–326.

[Rao, 2005] Rao, C. (2005). Antieigenvalues and antisingularvalues of a matrix andapplications to problems in statistics. Res. Lett. Inf. Math. Sci., 8:53–76.

[Reinsel, 1982] Reinsel, G. (1982). Multivariate repeated-measurement or gorwthcurve models with multivariate random-effects covariance structure. Journal of theAmerican Statistical Association, 77(377):190–195.

[Srivastava and Khatri, 1979] Srivastava, M. S. and Khatri, C. G. (1979). Introductionto Multivariate Statistics. Elsevier Science Ltd.

[Srivastava and Singull, 2016a] Srivastava, M. S. and Singull, M. (2016a). Test for themean matrix in a growth curve model for high dimensions. Accepted for publicationin Communications in Statistics - Theory and Methods.

[Srivastava and Singull, 2016b] Srivastava, M. S. and Singull, M. (2016b). Testingsphericity and intraclass covariance structures under a growth curve model in highdimension. Accepted for publication in Communications in Statistics - Simulationand Computation.

[Wilks, 1946] Wilks, S. S. (1946). Sample criteria for testing equality of means, equal-ity of variances, and equality of covariances in a normal multivariate distribution.The Annals of Mathematical Statistics, 17(3):257–281.

[Ye and Wang, 2009] Ye, R. D. and Wang, S. G. (2009). Estimating parameters inextended growth curve models with special covariance structures. Journal of Statis-tical Planning and Inference, 139(8):2746–2756.

[Zezula, 2006] Zezula, I. (2006). Special variance structures in the growth curvemodel. Journal of Multivariate Analysis, 97(3):606–618.

Copyright

The publishers will keep this document online on the Internet - or its possible re-placement - for a period of 25 years from the date of publication barring exceptionalcircumstances. The online availability of the document implies a permanent permis-sion for anyone to read, to download, to print out single copies for your own use andto use it unchanged for any non-commercial research and educational purpose. Sub-sequent transfers of copyright cannot revoke this permission. All other uses of thedocument are conditional on the consent of the copyright owner. The publisher hastaken technical and administrative measures to assure authenticity, security and acces-sibility. According to intellectual property law the author has the right to be mentionedwhen his/her work is accessed as described above and to be protected against infringe-ment. For additional information about the Linkoping University Electronic Press andits procedures for publication and for assurance of document integrity, please refer toits WWW home page: http://www.ep.liu.se/

Upphovsratt

Detta dokument halls tillgangligt pa Internet - eller dess framtida ersattare - under25 ar fran publiceringsdatum under forutsattning att inga extraordinara omstandigheteruppstar. Tillgang till dokumentet innebar tillstand for var och en att lasa, ladda ner,skriva ut enstaka kopior for enskilt bruk och att anvanda det oforandrat for ickekom-mersiell forskning och for undervisning. Overforing av upphovsratten vid en senaretidpunkt kan inte upphava detta tillstand. All annan anvandning av dokumentet kraverupphovsmannens medgivande. For att garantera aktheten, sakerheten och tillgang-ligheten finns det losningar av teknisk och administrativ art. Upphovsmannens ideellaratt innefattar ratt att bli namnd som upphovsman i den omfattning som god sed kravervid anvandning av dokumentet pa ovan beskrivna satt samt skydd mot att dokumentetandras eller presenteras i sadan form eller i sadant sammanhang som ar krankande forupphovsmannens litterara eller konstnarliga anseende eller egenart. For ytterligare in-formation om Linkoping University Electronic Press se forlagets hemsida http://www.ep.liu.se/

c© 2016, Emil Karlsson

Karlsson, 2016. 35

the unweighted mean estimator in a growth curve...

Documents