chapter 5 statistical inference estimation and testing hypotheses

Chapter 5Statistical Inference Estimation and Testing Hypotheses

5.1 Data Sets & Matrix Normal Distribution5.1 Data Sets & Matrix Normal Distribution5.1 Data Sets & Matrix Normal Distribution5.1 Data Sets & Matrix Normal Distribution

variables

nsobservatio :

1

111

p

n

XX

XX

npn

p

X

Data matrix

where n rows X1, …, Xn are iid . ,Σμ pN

Vec(X') is an np×1 random vector with

ΣΣ,,Σ

μ

μ

μ

n

n

diag matrix covariance I

1

r mean vecto

We write More general, we can define matrix normal distribution.

.Σ , μ~X nnpn 'N I1

Definition 5.1.1

An n×p random matrix X is said to follow a matrix normal distribution if where

In this case,

where W=BB', V=AA', Y has i.i.d. elements each following N(0,1).

VWM, pnN ,μ~X VW, npN'Vec .μ 'MVec

'BYAM X

Theorem 5.5.1

The density function of with W > 0, V >0

is given by

where etr(A)= exp(tr(A)).

VWM, pnN~X

,

M-XVM-XWVW 11222

21

2 --npnpetr

Corollary 1:

Let X be a matrix of n observations from Then the density function of X is

where

,Σ Σ

A122

21

2 -nnpetr

.μXμX 'jj

n

1jA

.Σμ ,N p

5.2 Maximum Likelihood Estimation5.2 Maximum Likelihood Estimation5.2 Maximum Likelihood Estimation5.2 Maximum Likelihood Estimation

A. Review

21 are , ...,, μNdiiXX n

Step 1. The likelihood function

n

ii

nn

μxn

i

μxexp

eμLi

1

2

22

2

1

1

2

21

2

21

2

,

Step 2. Domain (parameter space)

0: 22 ,, Rμ μ

The MLE of maximizes over H. 2, μ 2, μL

Step 3. Maximization

.,,

,

02

12

221

2

221

2

222

2

2

1

2

222

xL

xxexp

xnexpxxexpμL

n

ii

n

n

ii

nn

n

i

n

i

n

xxnn

ag

aexpgxxa

x

1

21

22

2

12

22221

.1

ˆ0

. 2

2 andLet

.ˆ that impliesIt

Results 4.9 (p168 of textbook)

B. Multivariate population

.Σ,μ X,,X of samples are 1 pn N

Step 1. The likelihood function

'

etrL

jj

-nnp

where

21

2 122

μxμxA

AΣ Σ Σ,μ

n

1j

Step 2. Domain

0 : : Σ,Σ,μΣ,μ ppR p

Step 3. Maximization

(a)

.xxxxB

,BΣΣ

Σ,x Σ,μ

Σ

ΣΣ,μ

'

etrmax

maxLmax

j

n

jj

-nnp

where

21

2

1

12

0

2

0 0

We can prove that P(B > 0) = 1 if n > p .

(b)

CΣCΣ

CCΣΣ.C , C ,CCB*

*

11Then

Let 0:Let -'-

'pp'

1

111

*Σ

CΣCCCΣBΣ

tr

'tr'trtrWe have

.Σ Σ B

BΣ Σ

B Σ C Σ C Σ

**

0Σ

0Σ

*

*

122

12

11

21

21

-nn

-n

--

etrmax

etrmax

'

(c) Let λ1, …, λp be the eigenvalues of Σ * .

p

j

n

j

-n

j

p

emax

etrmax

1

21

2

12

21

01

,,

**

0ΣΣ Σ

*

The function g(λ)= λ-n/2 e -1/ 2λ arrives its maximum at λ=1/n. The function L(Σ *) arrives its maximum at λ1 =1/n, …, λp =1/n

and . IΣ̂ *

pn1

(d) The MLE of Σ is

. BCCCΣ̂CΣ̂ *

n'

n'

11

Theorem 5.2.1

Let X1, …, Xn be a sample from with n > p and . Then the MLEs of are

Σ,μ pNΣ μ and

respectively, and the maximum likelihood is

,xxxx Σ̂ xμ̂ 1

and1

n

jjj '

n

.BΣ,x 22222 npnpnnp

enL

0Σ

Theorem 5.2.2

Under the above notations, we have

a) are independent;

b)

c) is a biased estimator of

Σ̂ x and

Σ ,μ ~ x

nN p

1

Σ̂ Σ

n

nE

1ˆ Σ

A unbiased estimator of is recommended by Σ

'n j

n

jj

11

1xxxxS

called the sample covariance matrix.

Matalb code: mean, cov, corrcoef

Theorem 5.2.3

Let be the MLE of and be a measurable function. Then is the MLE of .

θ̂ θ θ f θ̂ f θ f

Corollary 1

The MLE of the correlations isij

. where, ij

jjii

ijij b

bb

br B

5.3 Wishart distribution5.3 Wishart distribution5.3 Wishart distribution5.3 Wishart distribution

A. Chi-square distribution

Let X1, …, Xn are iid N(0,1). Then , the chi-square distribution with n degrees of freedom or

2221 nnXXY ~

Definition 5.1.1

If x ~ Nn(0, In), then Y= x'x is said to have a chi-square distribution with n degrees of freedom, and write .2

nY ~

21

2

then If

1 then If

n-

n

nnn

'YN

'YN

~xΣ x,Σ 0,~x

~xx,I 0,~x2

2

B. Wishart distribution (obtained by Wishart in 1928)

Definition 5.1.1

Let . Then we said that W= x'x is distributed according to a Wishart distribution .

.Σ I 0,~x npnN

.Σ,~x-xxxB

,

W,WΣWW

Σ,Σ,

Σ,Σ,

j 1

otherwise0

0 if21

p

is 0 ofdensity The

. where 1

1

112

1

222

nW'-

etrC

pnnW

nWp

p

n

jj

-pn

p

np

Σ, nWp

A. Unbiaseness

Let be an estimator of . If is called unbiased estimator of .

5.4 Discussion on estimation5.4 Discussion on estimation5.4 Discussion on estimation5.4 Discussion on estimation

θ̂ θ θθ̂ Eθ

Theorem 5.4.1

Let X1, …, Xn be a sample from Then

are unbiased estimators of and , respectively.

Matlab code: mean, cov, corrcoef

n

jjj

n

jj

'n

n

1

1

1

1

and1

xxxxS

xx

μ Σ

.Σμ ,N p

B. Decision Theory

θXx

t,θ

Xθxt

θ parameter with the ofdensity the:

function loss a :

sampleon based ofestimator an :

p

L

Then the average of loss is give by

That is called the risk function.

xxxt,θt,θt,θ θθ dpLLER

employed. is ifrisk maximum the: tt,θ θ

Rmax

Definition 5.4.2

An estimator t(X) is called a minimax estimator of if θ

t,θ t,θ

θtθRmaxminRmax

Example 1

Under the loss function

the sample mean is a minimax estimator of .

,tθtθt,θ 'L

μ x

C. Admissible estimation

Definition 5.4.3

An estimator t1(x) is said to be at least as good as another t2(x) if

And t1 is said to be better than or strictly dominates t2 if the above

inequality holds with strict inequality for at least one .

θ,t,θt,θ 21 RR

θ

Definition 5.4.4

An estimator t* is said to be inadmissible if there exists another estimator t** that is better than t*. An estimator t* is admissible if it is not inadmissible.

The admissibility is a weak requirement.

Under the loss , the sample mean is an inadmissible if the population is

James & Stein pointed out

is better than The estimator is called James-Stein estimator.

tμ tμ t,μ 'L x .Σ , μ 3 and pN p

. x 3when p

xxx

μ̂

'np-

-

21

μ̂

5.5 Inferences about a mean vector (Ch.5 Textbook)5.5 Inferences about a mean vector (Ch.5 Textbook)5.5 Inferences about a mean vector (Ch.5 Textbook)5.5 Inferences about a mean vector (Ch.5 Textbook)

Let X1, …, Xn be iid samples from .Σ , μpN

00 μμμμ : ,: 10 HH

Case A: is known.

a) p = 1

b) p > 1

Σ

1 0,~0 Nnμx

u

. 100 μxΣ μx 'nT 2

0

Under the hypothesis H0 , Then .1

~ 0

Σ , μ xn

N p

. ,1

2

1

0 ppNn

I0~y ,y Σμx

yμxΣ

02

1

n

20

10

20 p''n ~yyμxΣμxT

Theorem 5.5.1

Let X1, …, Xn be a sample from where is known. The null distribution of under

is and the rejection area is

Σ , Σμ ,N p2

0T2p

00 : μμ H . 22

0 pT

Case B: is unknown.

a) Suggestion: Replace by the Sample Covariance Matrix S in , i.e.

where

There are many theoretic approaches to find a suitable statistic. One of the methods is the Likelihood Ratio Likelihood Ratio Criterion.Criterion.

Σ

Σ 2

0T

0

10

001

02

1

μxBμx

μxSμxT

-

-

'nn

'n

n

jjjn-n- 1

'1

1

1

1xxxxBS

The Likelihood Ratio Criterion (LRC)The Likelihood Ratio Criterion (LRC)

Step 1 The likelihood function

'

etrL

j

n

jj

-nnp

where

2

12 ,

1

122

μxμxA

AΣ Σ Σμ

Step 2 Domains

0 ,| ,

0 ,| ,

Σμμ Σμ

ΣRμ Σμ

0

p

Step 3 Maximization Σ,μ

Σ,μ

Lmax

Lmax

We have obtained

2222 2 ,npnpnnp

enLmax

B Σμ

By a similar way we can find

2220

2 2 npnpnnp

enLmax

A Σ,μ

where

'n

'

'

j

n

jj

j

n

jj

00

01

0

01

00

μxμxB

μxxxμxxx

μxμxA

under

0H

0H

Then, the LRC is

200

2

2

20

n

n

n

n

'n μxμxB

B

B

A

Note

20

10

000

11

1 1

1

Tn

'n

n

''n

- BμxBμxB

Bμx

μxμxμxB

Finally2

22

11 T

nT

n

Remark: Let t(x) be a statistic for the hypothesis and f(u) is a strictly monotone function. Then

is a statistic which is equivalent to t(x). We write

xtfx

xtx

5.6 5.6 TT22-statistic-statistic5.6 5.6 TT22-statistic-statistic

Definition 5.6.1

Let and be independent with n > p. The distribution of

is called T2 distribution.

Σ ~W n,Wp Σ 0 ~μ ,N p

μ Wμ 12 'nT

• The distribution T2 is independent of , we shall write

•

• As

Σ 22npTT ,~

121

pnpFTnppn

,~

2

101

02

0

1

1

np

pp

Tn'nnT

nWNn

,~μxBμx

Σ,~B,Σ,0~μx

And

pnpFnT

ppn

,~1

2

Theorem 5.6.1

pnpnp FTpn

pnTTH

,, ~~,μμ 2

122

00 1 and :Under

Theorem 5.6.2

The distribution of is invariant under all affine transformations

of the observations and the hypothesis

2T1:0: ppp d,G,G,dGXy

Confidence Region

• A 100 (1- )% confidence region for the mean of a p-dimensional normal distribution is the ellipsoid determined by all such that

α

μ

)()1(

)()'( ,1 αμxSμx pnpF

pn

npn

Original After transformation

observationsmean

Given mean

H0

Sample MeanSample Covariance Matrix

Proof:

X1, …, Xn

xS

μ 0μ

0μμ

nn Gxdy,,Gxdy 11

xGdy

y' SGSG

*μdGμ *μdGμ 00

** μμ 0

20

10

01

0

*0

1*0

2

x-

-yy

T'n

''n

nT

μ-xSμ-x

μ-xGGSGμ-xG

μ-yS' μ-y

Example 5.6.1 (Example 5.2 in Textbook)

Perspiration from 20 healthy females was analysis.SWEAT DATA

Individual X1 (Sweat rate) X2 (Sodium) X3 (Potassium)

1 3.7 48.5 9.32 5.7 65.1 8.03 3.8 47.2 10.94 3.2 53.2 12.05 3.1 55.5 9.76 4.6 36.1 7.97 2.4 24.8 14.08 7.2 33.1 7.69 6.7 47.4 8.510 5.4 54.1 11.311 3.9 36.9 12.712 4.5 58.8 12.313 3.5 27.8 9.814 4.5 40.2 8.415 1.5 13.5 10.116 8.5 56.4 7.117 4.5 71.6 8.218 6.5 52.8 10.919 4.1 44.4 11.220 5.5 40.9 9.4

Source: Courtest of Dr. Gerald Bargman.

,

10

50

4

: ,

10

50

4

: 10

μμ HH

Computer calculations provide:

3.6285.640-1.810-

5.640-199.78810.010

1.810-10.0102.879

S.

9.965

45.400

4.640

x

and

.402.002-.258

.002-.006.022-

.258.022-.586

S 1 -

We evaluate

74.9

160.

042.

467.

035.,600.4,640.20

10965.9

50400.45

4640.4

402.002.258.

002.006.022.

258.022.586.

10965.9,50400.45,4640.4202

T

Comparing the observed with the critical value

we see that and consequently, we reject H0 at the 10% level of significance.

7492 .T

,.. 1887492 T

188442353310

17319

101

173 ..... ,,

FFpnpn

pnp

Definition 5.6.1

Let x and y be samples of a population G with mean and covariance matrix The quadratic forms

are called Mahalanobis distance (M-distance) between x and y, and x and G, respectively.

μ .Σ 0

μxΣμ-xGx

yxΣyxyx

-,

,12

12

-M

-M

'D

and'D

Mahalanobis DistanceMahalanobis Distance

If can be verified that

yx,y,x,y,x 00 MM DD

x,yy,x MM DD

zy,xyzzxyx , ,,,, MMM DDD

Gxμ xΣμ x , 01

02

0 M- nD-'-nT

•

•

•

•

5.7 5.7 Two Samples Problems (Section 6.3, Textbook)Two Samples Problems (Section 6.3, Textbook)5.7 5.7 Two Samples Problems (Section 6.3, Textbook)Two Samples Problems (Section 6.3, Textbook)

5.7 5.7 Two Samples Problems (Section 6.3, Textbook)Two Samples Problems (Section 6.3, Textbook)5.7 5.7 Two Samples Problems (Section 6.3, Textbook)Two Samples Problems (Section 6.3, Textbook)

We have two samples from the two populations

pmNG

pnNG

mp

np

,y,,y,Σ , μ

,x,,x,Σ , μ

122

111

:

:

where are unknown.Σ μ,μ and 21

: , : 211210 μμ μμ HH

The LRC is yxSyx

12 -pooled'

mn

nmT

m

jjj

n

iiipooled

m

jj

n

ii

''mn

mn

11

11

2

1

11

yyyyxxxxS

yy,xxwhere

Under the hypothesis

1,22

1-m ,2 ~

2

1 and ~

pmnpnp FT

pmn

pmnTT

The confidence region of is %1100 21 μμ a '

,aSayx a

μμ aaSayx a

21

2

21

21

2

pooled

pooled

'nm

mnT'

''nm

mnT'

where . 21

2 mnpTT ,

Example 5.7.1(p.338-339)

Jolicoeur and Mosimann (1960) studied the relationship of size and shape for painted turtles. The following table contains their measurements on the carapaces of 24 female and 24 male turtles.

Length(x1) Width(x2) Height(x3) Length(x1) Width(x2) Height(x3)

98 81 38 93 74 37103 84 38 94 78 35103 86 42 96 80 35105 86 42 101 84 39109 88 44 102 85 38123 92 50 103 81 37123 95 46 104 83 39133 99 51 106 83 39133 102 51 107 82 38133 102 51 112 89 40134 100 48 113 88 40136 102 49 114 86 40138 98 51 116 90 43138 99 51 117 90 41141 105 53 117 91 41147 108 57 119 93 41149 107 55 120 89 40153 107 56 120 93 44155 115 63 121 95 42155 117 60 125 93 45158 115 62 127 96 45159 118 63 128 95 45162 124 61 131 95 46177 132 67 135 106 47

Female Male

708340

291788

375011

041752

5833102

0417136

.

.

.

y,x

.

.

.

9982.377491.616649.101

7491.618869.1100607.175

6649.1010607.1751431.295

pooledS

30.40782.23224243

132424

3816.72 2424

2424

01.044,32

12

FTF

'T -pooled yxSyx

5.8 5.8 Multivariate Analysis of VarianceMultivariate Analysis of Variance5.8 5.8 Multivariate Analysis of VarianceMultivariate Analysis of Variance

A. Review

There are k normal populations

k

kn

kkk

n

xxxNG

xxxNG

k,,,,,:

,,,,,:

12

111

12

11 1

σ μ

σ μ

One wants to test equality of the means kμ,,μ 1

jiHH jk somefor : : 1110 ,μμ ,μμ

The analysis of variance employs decomposition of sum squares

squares of sum total,

group within squares of sum,

s treatmentamong squares of sum,

k

1 1

2

k

1 1

2

k

1

2

a

n

j

a

a

n

ja

a

aaa

a

a

SST

SSE

nTRSS

xx

xx

xx

j

j

where

ka

n

j

an

j

a

aa nnn

nn

aa

1

k

1 11

1

1,xx,xx jj

The testing statistics is

knk

H

FknSSE

kTRSSF

,~ 1

0

1

B. Multivariate population (pp295-305)

kn

kkpk

knp

kNG

NG

xxΣ μ

xxΣ μ

,,,,:

,,,,:

1

1111 1

Σ is unknown, one wants to test

jiHH jk somefor : : 1110 ,μμ ,μμ

I. The likelihood ratio criterionI. The likelihood ratio criterion

Step 1 The likelihood function

k

aa

aj

n

ja

aj

-nnp

k

'

etrL

a

1 1

1221

where

, 21

2

μxμxA

AΣ Σ Σ ,μ ,,μ

Step 2 The domains

0 : , ,,

01 : , ,,

11

1

Σ,RμμΣμμ ω

ΣRμΣμμ p

p

kk

jk ,k, ,j,

Step 3 Maximization

2

21

22

1

1 2

1

2

nnp

k

nnp

k

ne,,,Lmax

ne,,,Lmax

E Σμμ

T Σμμ

where '

'

aa

j

k

a

n

ja

aj

aj

k

a

n

j

aj

a

a

,

1 1

1 1

xxxxE

xxxxT

are the total sum of squares and products matrix and the error sum of squares and products matrix, respectively.

k

a

n

j

aj

n

J

aj

aa

aa

nn 1 11

11xx,xx

The treatment sum of squares and product matrix

.xxxxETB 1

'n a

k

aaa

The LRC

.BE

E

T

E

T

E

2

2

n

n

Definition 5.8.1

Assume are independent, where . The distribution

is called Wilks -distribution and write .

Σ,~BΣ,~A and mWnW pp0 Σ,, pmpn

BA

A

~ ,, mnp

Theorem 5.8.1

Under H0 we have

1)

2)

3)

1

~ 1, , ~ , , ~ 1,

The LRC under the hypothesis has a

p p p

p,n-k,k -

W n W n k W k

T Σ E Σ B Σ

E and B are independent

Special cases of the Wilks -distributions mnp ,,

122

22

1

112

11

12

111

nm

nm

pnp

pnp

Fm

np

Fmn

p

Fp

pnm

Fppn

m

,

,

,

,

~

~

~

~

See pp300-305, Textbook for example.

2. Union-Intersection Decision Rule

kH μμ 10 :

Consider projection hypothesis

)()(1

)1()1(111

000

10

',,')','(:

',,':)','(:

0,,'':

1

kn

kkak

na

aa

pka

kxaxaaaμaNG

xaxaaaμaNG

HH

aRaμaμaH

Σ

Σ

For projection data, we have

Taa

EaaBa a

'SST

'SSE,'TRSS

and the F-statistic

0

1

1~H

k- ,n-k

' k -F F

' n k

a

a Ba

a Ea

With the rejection region

The rejection region for H0 is

. 1*

a ,n-kk-FF

p

αFFR knkRa

*,1aa

that implies the testing statistic is oraamax F

pR

EaaBaa

max0a '

'

Lemma 1 Let A be a symmetric matrix of order p. Denote by , the eigenvalues of A, and , the associat

ed eigenvectors of A. Thenp 21 pll 1

pxx

xx

Axxxx

Axx

Axxxx

Axx

''

'

''

'

minmin

maxmax

10

110

Lemma 2 Let A and B are two p× p matrices and A’=A, B>0. Denote by and , the eigenvalues of

and associated eigenvectors. Then p 1 pll 1

21

21 ABB

Remark1:

Remark2:

Remark3: Let be eigenvalues of . The Wilks -statistic can be expressed as

.BA,, 0 of seigenvalue are 1 p

.EB 0 of

seigenvaluelargest theis statisticon intersecti-union The

p 12

12

1 BEE .

11

1

p

i i

1,,1,1

0,,,1,0

10

'

'

'

'

min

max

'

pkk

xkilx

x

Bxx

Axx

Bxx

Axx

i

chapter 5 statistical inference estimation and testing hypotheses

Documents