chapter 5 statistical inference estimation and testing hypotheses
TRANSCRIPT
Chapter 5Statistical Inference Estimation and Testing Hypotheses
5.1 Data Sets & Matrix Normal Distribution5.1 Data Sets & Matrix Normal Distribution5.1 Data Sets & Matrix Normal Distribution5.1 Data Sets & Matrix Normal Distribution
variables
nsobservatio :
1
111
p
n
XX
XX
npn
p
X
Data matrix
where n rows X1, …, Xn are iid . ,Σμ pN
Vec(X') is an np×1 random vector with
ΣΣ,,Σ
μ
μ
μ
n
n
diag matrix covariance I
1
r mean vecto
We write More general, we can define matrix normal distribution.
.Σ , μ~X nnpn 'N I1
Definition 5.1.1
An n×p random matrix X is said to follow a matrix normal distribution if where
In this case,
where W=BB', V=AA', Y has i.i.d. elements each following N(0,1).
VWM, pnN ,μ~X VW, npN'Vec .μ 'MVec
'BYAM X
Theorem 5.5.1
The density function of with W > 0, V >0
is given by
where etr(A)= exp(tr(A)).
VWM, pnN~X
,
M-XVM-XWVW 11222
21
2 --npnpetr
Corollary 1:
Let X be a matrix of n observations from Then the density function of X is
where
,Σ Σ
A122
21
2 -nnpetr
.μXμX 'jj
n
1jA
.Σμ ,N p
5.2 Maximum Likelihood Estimation5.2 Maximum Likelihood Estimation5.2 Maximum Likelihood Estimation5.2 Maximum Likelihood Estimation
A. Review
21 are , ...,, μNdiiXX n
Step 1. The likelihood function
n
ii
nn
μxn
i
μxexp
eμLi
1
2
22
2
1
1
2
21
2
21
2
,
Step 2. Domain (parameter space)
0: 22 ,, Rμ μ
The MLE of maximizes over H. 2, μ 2, μL
Step 3. Maximization
.,,
,
02
12
221
2
221
2
222
2
2
1
2
222
xL
xxexp
xnexpxxexpμL
n
ii
n
n
ii
nn
n
i
n
i
n
xxnn
ag
aexpgxxa
x
1
21
22
2
12
22221
.1
ˆ0
. 2
2 andLet
.ˆ that impliesIt
Results 4.9 (p168 of textbook)
B. Multivariate population
.Σ,μ X,,X of samples are 1 pn N
Step 1. The likelihood function
'
etrL
jj
-nnp
where
21
2 122
μxμxA
AΣ Σ Σ,μ
n
1j
Step 2. Domain
0 : : Σ,Σ,μΣ,μ ppR p
Step 3. Maximization
(a)
.xxxxB
,BΣΣ
Σ,x Σ,μ
Σ
ΣΣ,μ
'
etrmax
maxLmax
j
n
jj
-nnp
where
21
2
1
12
0
2
0 0
We can prove that P(B > 0) = 1 if n > p .
(b)
CΣCΣ
CCΣΣ.C , C ,CCB*
*
11Then
Let 0:Let -'-
'pp'
1
111
*Σ
CΣCCCΣBΣ
tr
'tr'trtrWe have
.Σ Σ B
BΣ Σ
B Σ C Σ C Σ
**
0Σ
0Σ
*
*
122
12
11
21
21
-nn
-n
--
etrmax
etrmax
'
(c) Let λ1, …, λp be the eigenvalues of Σ * .
p
j
n
j
-n
j
p
emax
etrmax
1
21
2
12
21
01
,,
**
0ΣΣ Σ
*
The function g(λ)= λ-n/2 e -1/ 2λ arrives its maximum at λ=1/n. The function L(Σ *) arrives its maximum at λ1 =1/n, …, λp =1/n
and . IΣ̂ *
pn1
(d) The MLE of Σ is
. BCCCΣ̂CΣ̂ *
n'
n'
11
Theorem 5.2.1
Let X1, …, Xn be a sample from with n > p and . Then the MLEs of are
Σ,μ pNΣ μ and
respectively, and the maximum likelihood is
,xxxx Σ̂ xμ̂ 1
and1
n
jjj '
n
.BΣ,x 22222 npnpnnp
enL
0Σ
Theorem 5.2.2
Under the above notations, we have
a) are independent;
b)
c) is a biased estimator of
Σ̂ x and
Σ ,μ ~ x
nN p
1
Σ̂ Σ
n
nE
1ˆ Σ
A unbiased estimator of is recommended by Σ
'n j
n
jj
11
1xxxxS
called the sample covariance matrix.
Matalb code: mean, cov, corrcoef
Theorem 5.2.3
Let be the MLE of and be a measurable function. Then is the MLE of .
θ̂ θ θ f θ̂ f θ f
Corollary 1
The MLE of the correlations isij
. where, ij
jjii
ijij b
bb
br B
5.3 Wishart distribution5.3 Wishart distribution5.3 Wishart distribution5.3 Wishart distribution
A. Chi-square distribution
Let X1, …, Xn are iid N(0,1). Then , the chi-square distribution with n degrees of freedom or
2221 nnXXY ~
Definition 5.1.1
If x ~ Nn(0, In), then Y= x'x is said to have a chi-square distribution with n degrees of freedom, and write .2
nY ~
21
2
then If
1 then If
n-
n
nnn
'YN
'YN
~xΣ x,Σ 0,~x
~xx,I 0,~x2
2
B. Wishart distribution (obtained by Wishart in 1928)
Definition 5.1.1
Let . Then we said that W= x'x is distributed according to a Wishart distribution .
.Σ I 0,~x npnN
.Σ,~x-xxxB
,
W,WΣWW
Σ,Σ,
Σ,Σ,
j 1
otherwise0
0 if21
p
is 0 ofdensity The
. where 1
1
112
1
222
nW'-
etrC
pnnW
nWp
p
n
jj
-pn
p
np
Σ, nWp
A. Unbiaseness
Let be an estimator of . If is called unbiased estimator of .
5.4 Discussion on estimation5.4 Discussion on estimation5.4 Discussion on estimation5.4 Discussion on estimation
θ̂ θ θθ̂ Eθ
Theorem 5.4.1
Let X1, …, Xn be a sample from Then
are unbiased estimators of and , respectively.
Matlab code: mean, cov, corrcoef
n
jjj
n
jj
'n
n
1
1
1
1
and1
xxxxS
xx
μ Σ
.Σμ ,N p
B. Decision Theory
θXx
t,θ
Xθxt
θ parameter with the ofdensity the:
function loss a :
sampleon based ofestimator an :
p
L
Then the average of loss is give by
That is called the risk function.
xxxt,θt,θt,θ θθ dpLLER
employed. is ifrisk maximum the: tt,θ θ
Rmax
Definition 5.4.2
An estimator t(X) is called a minimax estimator of if θ
t,θ t,θ
θtθRmaxminRmax
Example 1
Under the loss function
the sample mean is a minimax estimator of .
,tθtθt,θ 'L
μ x
C. Admissible estimation
Definition 5.4.3
An estimator t1(x) is said to be at least as good as another t2(x) if
And t1 is said to be better than or strictly dominates t2 if the above
inequality holds with strict inequality for at least one .
θ,t,θt,θ 21 RR
θ
Definition 5.4.4
An estimator t* is said to be inadmissible if there exists another estimator t** that is better than t*. An estimator t* is admissible if it is not inadmissible.
The admissibility is a weak requirement.
Under the loss , the sample mean is an inadmissible if the population is
James & Stein pointed out
is better than The estimator is called James-Stein estimator.
tμ tμ t,μ 'L x .Σ , μ 3 and pN p
. x 3when p
xxx
μ̂
'np-
-
21
μ̂
5.5 Inferences about a mean vector (Ch.5 Textbook)5.5 Inferences about a mean vector (Ch.5 Textbook)5.5 Inferences about a mean vector (Ch.5 Textbook)5.5 Inferences about a mean vector (Ch.5 Textbook)
Let X1, …, Xn be iid samples from .Σ , μpN
00 μμμμ : ,: 10 HH
Case A: is known.
a) p = 1
b) p > 1
Σ
1 0,~0 Nnμx
u
. 100 μxΣ μx 'nT 2
0
Under the hypothesis H0 , Then .1
~ 0
Σ , μ xn
N p
. ,1
2
1
0 ppNn
I0~y ,y Σμx
yμxΣ
02
1
n
20
10
20 p''n ~yyμxΣμxT
Theorem 5.5.1
Let X1, …, Xn be a sample from where is known. The null distribution of under
is and the rejection area is
Σ , Σμ ,N p2
0T2p
00 : μμ H . 22
0 pT
Case B: is unknown.
a) Suggestion: Replace by the Sample Covariance Matrix S in , i.e.
where
There are many theoretic approaches to find a suitable statistic. One of the methods is the Likelihood Ratio Likelihood Ratio Criterion.Criterion.
Σ
Σ 2
0T
0
10
001
02
1
μxBμx
μxSμxT
-
-
'nn
'n
n
jjjn-n- 1
'1
1
1
1xxxxBS
The Likelihood Ratio Criterion (LRC)The Likelihood Ratio Criterion (LRC)
Step 1 The likelihood function
'
etrL
j
n
jj
-nnp
where
2
12 ,
1
122
μxμxA
AΣ Σ Σμ
Step 2 Domains
0 ,| ,
0 ,| ,
Σμμ Σμ
ΣRμ Σμ
0
p
Step 3 Maximization Σ,μ
Σ,μ
Lmax
Lmax
We have obtained
2222 2 ,npnpnnp
enLmax
B Σμ
By a similar way we can find
2220
2 2 npnpnnp
enLmax
A Σ,μ
where
'n
'
'
j
n
jj
j
n
jj
00
01
0
01
00
μxμxB
μxxxμxxx
μxμxA
under
0H
0H
Then, the LRC is
200
2
2
20
n
n
n
n
'n μxμxB
B
B
A
Note
20
10
000
11
1 1
1
Tn
'n
n
''n
- BμxBμxB
Bμx
μxμxμxB
Finally2
22
11 T
nT
n
Remark: Let t(x) be a statistic for the hypothesis and f(u) is a strictly monotone function. Then
is a statistic which is equivalent to t(x). We write
xtfx
xtx
5.6 5.6 TT22-statistic-statistic5.6 5.6 TT22-statistic-statistic
Definition 5.6.1
Let and be independent with n > p. The distribution of
is called T2 distribution.
Σ ~W n,Wp Σ 0 ~μ ,N p
μ Wμ 12 'nT
• The distribution T2 is independent of , we shall write
•
• As
Σ 22npTT ,~
121
pnpFTnppn
,~
2
101
02
0
1
1
np
pp
Tn'nnT
nWNn
,~μxBμx
Σ,~B,Σ,0~μx
And
pnpFnT
ppn
,~1
2
Theorem 5.6.1
pnpnp FTpn
pnTTH
,, ~~,μμ 2
122
00 1 and :Under
Theorem 5.6.2
The distribution of is invariant under all affine transformations
of the observations and the hypothesis
2T1:0: ppp d,G,G,dGXy
Confidence Region
• A 100 (1- )% confidence region for the mean of a p-dimensional normal distribution is the ellipsoid determined by all such that
α
μ
)()1(
)()'( ,1 αμxSμx pnpF
pn
npn
Original After transformation
observationsmean
Given mean
H0
Sample MeanSample Covariance Matrix
Proof:
X1, …, Xn
xS
μ 0μ
0μμ
nn Gxdy,,Gxdy 11
xGdy
y' SGSG
*μdGμ *μdGμ 00
** μμ 0
20
10
01
0
*0
1*0
2
x-
-yy
T'n
''n
nT
μ-xSμ-x
μ-xGGSGμ-xG
μ-yS' μ-y
Example 5.6.1 (Example 5.2 in Textbook)
Perspiration from 20 healthy females was analysis.SWEAT DATA
Individual X1 (Sweat rate) X2 (Sodium) X3 (Potassium)
1 3.7 48.5 9.32 5.7 65.1 8.03 3.8 47.2 10.94 3.2 53.2 12.05 3.1 55.5 9.76 4.6 36.1 7.97 2.4 24.8 14.08 7.2 33.1 7.69 6.7 47.4 8.510 5.4 54.1 11.311 3.9 36.9 12.712 4.5 58.8 12.313 3.5 27.8 9.814 4.5 40.2 8.415 1.5 13.5 10.116 8.5 56.4 7.117 4.5 71.6 8.218 6.5 52.8 10.919 4.1 44.4 11.220 5.5 40.9 9.4
Source: Courtest of Dr. Gerald Bargman.
,
10
50
4
: ,
10
50
4
: 10
μμ HH
Computer calculations provide:
3.6285.640-1.810-
5.640-199.78810.010
1.810-10.0102.879
S.
9.965
45.400
4.640
x
and
.402.002-.258
.002-.006.022-
.258.022-.586
S 1 -
We evaluate
74.9
160.
042.
467.
035.,600.4,640.20
10965.9
50400.45
4640.4
402.002.258.
002.006.022.
258.022.586.
10965.9,50400.45,4640.4202
T
Comparing the observed with the critical value
we see that and consequently, we reject H0 at the 10% level of significance.
7492 .T
,.. 1887492 T
188442353310
17319
101
173 ..... ,,
FFpnpn
pnp
Definition 5.6.1
Let x and y be samples of a population G with mean and covariance matrix The quadratic forms
are called Mahalanobis distance (M-distance) between x and y, and x and G, respectively.
μ .Σ 0
μxΣμ-xGx
yxΣyxyx
-,
,12
12
-M
-M
'D
and'D
Mahalanobis DistanceMahalanobis Distance
If can be verified that
yx,y,x,y,x 00 MM DD
x,yy,x MM DD
zy,xyzzxyx , ,,,, MMM DDD
Gxμ xΣμ x , 01
02
0 M- nD-'-nT
•
•
•
•
5.7 5.7 Two Samples Problems (Section 6.3, Textbook)Two Samples Problems (Section 6.3, Textbook)5.7 5.7 Two Samples Problems (Section 6.3, Textbook)Two Samples Problems (Section 6.3, Textbook)
5.7 5.7 Two Samples Problems (Section 6.3, Textbook)Two Samples Problems (Section 6.3, Textbook)5.7 5.7 Two Samples Problems (Section 6.3, Textbook)Two Samples Problems (Section 6.3, Textbook)
We have two samples from the two populations
pmNG
pnNG
mp
np
,y,,y,Σ , μ
,x,,x,Σ , μ
122
111
:
:
where are unknown.Σ μ,μ and 21
: , : 211210 μμ μμ HH
The LRC is yxSyx
12 -pooled'
mn
nmT
m
jjj
n
iiipooled
m
jj
n
ii
''mn
mn
11
11
2
1
11
yyyyxxxxS
yy,xxwhere
Under the hypothesis
1,22
1-m ,2 ~
2
1 and ~
pmnpnp FT
pmn
pmnTT
The confidence region of is %1100 21 μμ a '
,aSayx a
μμ aaSayx a
21
2
21
21
2
pooled
pooled
'nm
mnT'
''nm
mnT'
where . 21
2 mnpTT ,
Example 5.7.1(p.338-339)
Jolicoeur and Mosimann (1960) studied the relationship of size and shape for painted turtles. The following table contains their measurements on the carapaces of 24 female and 24 male turtles.
Length(x1) Width(x2) Height(x3) Length(x1) Width(x2) Height(x3)
98 81 38 93 74 37103 84 38 94 78 35103 86 42 96 80 35105 86 42 101 84 39109 88 44 102 85 38123 92 50 103 81 37123 95 46 104 83 39133 99 51 106 83 39133 102 51 107 82 38133 102 51 112 89 40134 100 48 113 88 40136 102 49 114 86 40138 98 51 116 90 43138 99 51 117 90 41141 105 53 117 91 41147 108 57 119 93 41149 107 55 120 89 40153 107 56 120 93 44155 115 63 121 95 42155 117 60 125 93 45158 115 62 127 96 45159 118 63 128 95 45162 124 61 131 95 46177 132 67 135 106 47
Female Male
708340
291788
375011
041752
5833102
0417136
.
.
.
y,x
.
.
.
9982.377491.616649.101
7491.618869.1100607.175
6649.1010607.1751431.295
pooledS
30.40782.23224243
132424
3816.72 2424
2424
01.044,32
12
FTF
'T -pooled yxSyx
5.8 5.8 Multivariate Analysis of VarianceMultivariate Analysis of Variance5.8 5.8 Multivariate Analysis of VarianceMultivariate Analysis of Variance
A. Review
There are k normal populations
k
kn
kkk
n
xxxNG
xxxNG
k,,,,,:
,,,,,:
12
111
12
11 1
σ μ
σ μ
One wants to test equality of the means kμ,,μ 1
jiHH jk somefor : : 1110 ,μμ ,μμ
The analysis of variance employs decomposition of sum squares
squares of sum total,
group within squares of sum,
s treatmentamong squares of sum,
k
1 1
2
k
1 1
2
k
1
2
a
n
j
a
a
n
ja
a
aaa
a
a
SST
SSE
nTRSS
xx
xx
xx
j
j
where
ka
n
j
an
j
a
aa nnn
nn
aa
1
k
1 11
1
1,xx,xx jj
The testing statistics is
knk
H
FknSSE
kTRSSF
,~ 1
0
1
B. Multivariate population (pp295-305)
kn
kkpk
knp
kNG
NG
xxΣ μ
xxΣ μ
,,,,:
,,,,:
1
1111 1
Σ is unknown, one wants to test
jiHH jk somefor : : 1110 ,μμ ,μμ
I. The likelihood ratio criterionI. The likelihood ratio criterion
Step 1 The likelihood function
k
aa
aj
n
ja
aj
-nnp
k
'
etrL
a
1 1
1221
where
, 21
2
μxμxA
AΣ Σ Σ ,μ ,,μ
Step 2 The domains
0 : , ,,
01 : , ,,
11
1
Σ,RμμΣμμ ω
ΣRμΣμμ p
p
kk
jk ,k, ,j,
Step 3 Maximization
2
21
22
1
1 2
1
2
nnp
k
nnp
k
ne,,,Lmax
ne,,,Lmax
E Σμμ
T Σμμ
where '
'
aa
j
k
a
n
ja
aj
aj
k
a
n
j
aj
a
a
,
1 1
1 1
xxxxE
xxxxT
are the total sum of squares and products matrix and the error sum of squares and products matrix, respectively.
k
a
n
j
aj
n
J
aj
aa
aa
nn 1 11
11xx,xx
The treatment sum of squares and product matrix
.xxxxETB 1
'n a
k
aaa
The LRC
.BE
E
T
E
T
E
2
2
n
n
Definition 5.8.1
Assume are independent, where . The distribution
is called Wilks -distribution and write .
Σ,~BΣ,~A and mWnW pp0 Σ,, pmpn
BA
A
~ ,, mnp
Theorem 5.8.1
Under H0 we have
1)
2)
3)
1
~ 1, , ~ , , ~ 1,
The LRC under the hypothesis has a
p p p
p,n-k,k -
W n W n k W k
T Σ E Σ B Σ
E and B are independent
Special cases of the Wilks -distributions mnp ,,
122
22
1
112
11
12
111
nm
nm
pnp
pnp
Fm
np
Fmn
p
Fp
pnm
Fppn
m
,
,
,
,
~
~
~
~
See pp300-305, Textbook for example.
2. Union-Intersection Decision Rule
kH μμ 10 :
Consider projection hypothesis
)()(1
)1()1(111
000
10
',,')','(:
',,':)','(:
0,,'':
1
kn
kkak
na
aa
pka
kxaxaaaμaNG
xaxaaaμaNG
HH
aRaμaμaH
Σ
Σ
For projection data, we have
Taa
EaaBa a
'SST
'SSE,'TRSS
and the F-statistic
0
1
1~H
k- ,n-k
' k -F F
' n k
a
a Ba
a Ea
With the rejection region
The rejection region for H0 is
. 1*
a ,n-kk-FF
p
αFFR knkRa
*,1aa
that implies the testing statistic is oraamax F
pR
EaaBaa
max0a '
'
Lemma 1 Let A be a symmetric matrix of order p. Denote by , the eigenvalues of A, and , the associat
ed eigenvectors of A. Thenp 21 pll 1
pxx
xx
Axxxx
Axx
Axxxx
Axx
''
'
''
'
minmin
maxmax
10
110
Lemma 2 Let A and B are two p× p matrices and A’=A, B>0. Denote by and , the eigenvalues of
and associated eigenvectors. Then p 1 pll 1
21
21 ABB
Remark1:
Remark2:
Remark3: Let be eigenvalues of . The Wilks -statistic can be expressed as
.BA,, 0 of seigenvalue are 1 p
.EB 0 of
seigenvaluelargest theis statisticon intersecti-union The
p 12
12
1 BEE .
11
1
p
i i
1,,1,1
0,,,1,0
10
'
'
'
'
min
max
'
pkk
xkilx
x
Bxx
Axx
Bxx
Axx
i