y ázhu/ams571/lecture7_571.pdf · 2017. 9. 26. · title: microsoft word - lecture7_571 author:...

18
1 Lecture 7, Review Data Reduction We should think about the following questions carefully before the "simplification" process: • Is there any loss of information due to summarization? • How to compare the amount of information about θ in the original data X and in ()? • Is it sufficient to consider only the "reduced data" T? 1. Sufficient Statistics A statistic T is called sufficient if the conditional distribution of X given T is free of θ (that is, the conditional is a completely known distribution). Example. Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ? Sufficiency Principle If T is sufficient, the "extra information" carried by X is worthless as long as θ is concerned. It is then only natural to consider inference procedures which do not use this extra irrelevant information. This leads to the Sufficiency Principle : Any inference procedure should depend on the data only through sufficient statistics.

Upload: others

Post on 23-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

1

Lecture 7, Review

Data Reduction

We should think about the following questions carefully before the "simplification" process:

• Is there any loss of information due to summarization?

• How to compare the amount of information about θ in the original data X and in 𝑇(𝑿)?

• Is it sufficient to consider only the "reduced data" T?

1. Sufficient Statistics

A statistic T is called sufficient if the conditional distribution of X given T is free of θ (that is, the conditional is a completely known distribution).

Example. Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ?

Sufficiency Principle

If T is sufficient, the "extra information" carried by X is worthless as long as θ is concerned. It is then only natural to consider inference procedures which do not use this extra irrelevant information. This leads to the Sufficiency Principle :

Any inference procedure should depend on the data only through sufficient statistics.

Page 2: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

2

Definition: Sufficient Statistic (in terms of Conditional Probability)

(discrete case):

For any x and t, if the conditional pdf of X given T:

𝑃 (𝐗 = 𝐱|𝑇(𝐗) = 𝑡) =𝑃 (𝐗 = 𝐱, 𝑇(𝐗) = 𝑡)

𝑃 (𝑇(𝐗) = 𝑡)=

𝑃 (𝐗 = 𝐱)

𝑃 (𝑇(𝐗) = 𝑡)

does not depend on θ then we say 𝑇(𝐗) is a sufficient statistic for θ.

Sufficient Statistic, the general definition (for both discrete and continuous variables):

Let the pdf of data X is 𝑓(𝐱; 𝜃) and the pdf of T be 𝑞(𝑡; 𝜃). If

𝑓(𝐱; 𝜃)/𝑞(𝑇 (𝐱); 𝜃) is free of θ, (may depend on x) (∗)

for all x and θ, then T is a sufficient statistic for θ.

Example: Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ?

𝑋 𝑖. 𝑖. 𝑑. Bernoulli: 𝑓(𝑥) = 𝜃 (1 − 𝜃) , 𝑥 = 0, 1

𝑓(𝒙; 𝜃) = 𝑓(𝑥 , … , 𝑥 ) = 𝜃∑ (1 − 𝜃) ∑ 𝑇 = ∑ 𝑋 ~ B(n,θ):

𝑞(𝑡; 𝜃) = 𝑞(∑ 𝑥 ) =𝑛

∑ 𝑥 𝜃∑ (1 − 𝜃) ∑

Thus

𝑓(𝐱; 𝜃)

𝑞(𝑇 (𝐱); 𝜃)= 1/

𝑛𝑡

is free of θ, So by the definition, ∑ 𝑋 is a sufficient statistic for 𝜃.

Page 3: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

3

Example. 𝑋 , … , 𝑋 iid 𝑁 (𝜃, 1). 𝑇 = 𝑋.

Remarks: The definition (*) is not always easy to apply.

• Need to guess the form of a sufficient statistic.

• Need to figure out the distribution of T.

How to find a sufficient statistic?

2. (Neyman-Fisher) Factorization theorem.

T is sufficient if and only if 𝑓(𝐱; 𝜃) can be written as the product 𝑔(𝑇(𝐱); 𝜃)ℎ(𝐱), where the first factor depends on x only though 𝑇(𝐱) and the second factor is free of θ.

Example. Binomial. iid bin(1, θ)

Solution 1:

Bernoulli: 𝑓(𝑥) = 𝜃 (1 − 𝜃) , 𝑥 = 0, 1

𝑓(𝐱; 𝜃) = 𝑓(𝑥 , … , 𝑥 ) = 𝜃∑ (1 − 𝜃) ∑

= 𝜃∑ (1 − 𝜃) ∑ ⋅ [1]

= 𝑔 𝑥 , 𝜃 ⋅ ℎ(𝑥 , … , 𝑥 )

So according to the factorization theorem, 𝑇 = ∑ 𝑋 is a sufficient statistic for 𝜃.

Solution 2:

𝑓(𝐱; 𝜃) = 𝑓(𝑥 , 𝑥 , … , 𝑥 |𝜃)

=𝜃 (1 − 𝜃) , 𝑖𝑓 𝑥 = 0,1, 𝑖 = 1,2, … , 𝑛

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

= 𝜃∑ (1 − 𝜃) ∑ , 𝑖𝑓 𝑥 = 0,1, 𝑖 = 1,2, … , 𝑛0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Page 4: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

4

= 𝜃 (1 − 𝜃) ℎ(𝑥 , 𝑥 , … , 𝑥 )

= 𝑔(𝑡, 𝜃)ℎ(𝑥 , … , 𝑥 ),

where 𝑔(𝑡, 𝜃) = 𝜃 (1 − 𝜃) and

ℎ(𝑥 , … , 𝑥 ) =1, 𝑖𝑓 𝑥 = 0,1, 𝑖 = 1, 2, … , 𝑛

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.

Hence T is a sufficient statistic for θ.

Example. Exp(λ).

Let 𝑋 , … , 𝑋 be a random sample from an exponential distribution with rate 𝜆. And Let 𝑇 = 𝑋 + 𝑋 + ⋯ + 𝑋 and 𝑓 be the joint density of 𝑋 , 𝑋 , … , 𝑋 .

𝑓(𝐱; 𝜆) = 𝑓(𝑥 , 𝑥 , … , 𝑥 |𝜆)

=𝜆𝑒 , 𝑖𝑓 𝑥 > 0, 𝑖 = 1,2, … , 𝑛

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

= 𝜆 𝑒 ∑ , 𝑖𝑓 𝑥 > 0, 𝑖 = 1,2, … , 𝑛0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

= 𝜆 𝑒 ℎ(𝑥 , … , 𝑥 )

= 𝑔(𝑡, 𝜆)ℎ(𝑥 , … , 𝑥 )

where 𝑔(𝑡, 𝜆) = 𝜆 𝑒 , and

ℎ(𝑥 , … , 𝑥 ) =1, 𝑖𝑓 𝑥 > 0, 𝑖 = 1, 2, … , 𝑛

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.

Hence T is a sufficient statistic for 𝜆.

Example. Normal. iid N(θ,1).

Please derive the sufficient statistic for θ by yourself.

Page 5: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

5

When the range of X depends on θ, should be more careful about factorization. Must use indicator functions explicitly.

Example. Uniform. iid 𝑼(𝟎, 𝜽).

Solution 1:

Let 𝑋 , … , 𝑋 be a random sample from an uniform distribution on (0, 𝜃). And Let 𝑇 = 𝑋( ) and 𝑓 be the joint density of

𝑋 , 𝑋 , … , 𝑋 .

Then

𝑓(𝐱; 𝜃) = 𝑓(𝑥 , 𝑥 , … , 𝑥 |𝜃)

=

1

𝜃, 𝑖𝑓 𝜃 > 𝑥 > 0, 𝑖 = 1,2, … , 𝑛

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

=1

𝜃, 𝑖𝑓 𝜃 > 𝑥 > 0, 𝑖 = 1,2, … , 𝑛

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

=1

𝜃, 𝑖𝑓 𝜃 > 𝑥( ) ≥ ⋯ ≥ 𝑥( ) > 0

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

= 𝑔(𝑡, 𝜃)ℎ(𝑥 , … , 𝑥 )

where

𝑔(𝑡, 𝜃) =, 𝑖𝑓 𝜃 > 𝑡 > 0

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒,

and

ℎ(𝑥 , … , 𝑥 ) =1, 𝑖𝑓 𝑥( ) > 0

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.

Hence T is a sufficient statistic for 𝜃.

*** I personally prefer this approach because it is most straightforward. Alternatively, one can use the indicator function and simplify the solution as illustrated next.

Page 6: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

6

Definition: Indicator function

𝐼 (𝑥) =1, 𝑖𝑓 𝑥 ∈ 𝐴0, 𝑖𝑓 𝑥 ∉ 𝐴

Solution 2 (in terms of the indicator function):

Uniform: 𝑓(𝑥) = , 𝑥 ∈ (0, 𝜃)

𝑓(𝐱; 𝜃) = 𝑓(𝑥 , … , 𝑥 ) =1

𝜃, 𝑥 ∈ (0, 𝜃), ∀𝑖

=1

𝜃𝐼( , )(𝑥 )

=1

𝜃𝐼( , ) 𝑥( ) ⋅ 𝐼( , ) 𝑥( )

= [1

𝜃𝐼( , ) 𝑥( ) ] ⋅ [𝐼( , ) 𝑥( ) ]

= 𝑔 𝑥( ), 𝜃 ⋅ ℎ(𝑥 , … , 𝑥 ) So by factorization theorem, 𝑥( ) is a sufficient statistic for 𝜃.

Example: Please derive the sufficient statistics for θ, when given a random sample of size n from 𝑈 (𝜃, 𝜃 + 1).

Solution:

1. Indicator function approach:

Uniform: 𝑓(𝑥) = 1, 𝑥 ∈ (𝜃, 𝜃 + 1)

𝑓(𝑥 , … , 𝑥 |𝜃) = (1) , 𝑥 ∈ (𝜃, 𝜃 + 1), ∀𝑖

= (1) 𝐼( , )(𝑥 )

= 𝐼( , ) 𝑥( ) ⋅ 𝐼( , ) 𝑥( )

= [𝐼( , ) 𝑥( ) ⋅ 𝐼( , ) 𝑥( ) ] ⋅ [1]

= 𝑔 𝑥( ), 𝑥( ), 𝜃 ⋅ ℎ(𝑥 , … , 𝑥 )

So, 𝑇 = 𝑋( ), 𝑋( ) is a SS for 𝜃.

2. Do not use the indicator function:

Page 7: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

7

𝑓(𝑥 , 𝑥 , … , 𝑥 |𝜃)

=1, 𝑖𝑓 𝜃 + 1 > 𝑥 > 𝜃, 𝑖 = 1,2, … , 𝑛

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

=1, 𝑖𝑓 𝜃 + 1 > 𝑥 > 𝜃, 𝑖 = 1,2, … , 𝑛

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

=1, 𝑖𝑓 𝜃 + 1 > 𝑥( ) ≥ ⋯ ≥ 𝑥( ) > 𝜃

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

= 𝑔(𝑥( ), 𝑥( ), 𝜃)ℎ(𝑥 , … , 𝑥 )

where

𝑔 𝑥( ), 𝑥( ), 𝜃 =1, 𝑖𝑓 𝜃 + 1 > 𝑥( ) 𝑎𝑛𝑑 𝑥( ) > 𝜃

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒,

and

ℎ(𝑥 , … , 𝑥 ) = 1

So 𝑇 = 𝑋( ), 𝑋( ) is a SS for 𝜃.

Two-dimensional Examples.

Example. Normal. iid 𝑵(𝝁, 𝝈𝟐). 𝜽 = (𝝁, 𝝈𝟐) (both unknown).

Let 𝑋 , … , 𝑋 be a random sample from a normal distribution 𝑁(𝜇, 𝜎 ). And Let

𝑋 =1

𝑛𝑋 ,

S =1

𝑛 − 1(𝑋 − 𝑋) ,

and let 𝑓 be the joint density of 𝑋 , 𝑋 , … , 𝑋 .

𝑓(𝐱; 𝜽) = 𝑓(𝑥 , 𝑥 , … , 𝑥 |𝜇, 𝜎 ) =1

√2𝜋𝜎𝑒

( )

Page 8: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

8

=1

2𝜋𝜎𝑒

∑ ( )

Now

(𝑥 − 𝜇) = (𝑥 − �̅� + �̅� − 𝜇)

= (𝑥 − �̅�) + 2 (𝑥 − �̅�)(�̅� − 𝜇) + (�̅� − 𝜇)

= (n − 1)s + 2(�̅� − 𝜇) (𝑥 − �̅�) + 𝑛(�̅� − 𝜇)

= (n − 1)s + 𝑛(�̅� − 𝜇) .

Thus,

𝑓(𝐱; 𝜽) = 𝑓(𝑥 , 𝑥 , … , 𝑥 | 𝜇, 𝜎 )

=1

2𝜋𝜎exp (−

1

2𝜎((𝑛 − 1)𝑠 + 𝑛(�̅� − 𝜇) ))

= 𝑔(�̅�, 𝑠 , 𝜇, 𝜎 )ℎ(𝑥 , … , 𝑥 ),

where

𝑔(�̅�, 𝑠 , 𝜇, 𝜎 )

=1

2𝜋𝜎𝑒𝑥𝑝 −

1

2𝜎(𝑛 − 1)𝑠 + 𝑛(�̅� − 𝜇) ,

and

ℎ(𝑥 , … , 𝑥 ) = 1.

In this case we say (𝑋, 𝑆 ) is sufficient for (𝜇, 𝜎 ).

Page 9: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

9

3. (Regular) Exponential Family

The density function of a regular exponential family is:

𝑓 (𝑥; 𝜽) = 𝑐(𝜽)ℎ(𝑥) exp 𝑤 (𝜽)𝑡 (𝑥) , 𝜽 = (𝜃 , … , 𝜃 )

Example. Poisson(θ)

𝑓 (𝑥; 𝜃) = exp (−𝜃)1

𝑥!exp[ln(𝜃) ∗ 𝑥]

Example. Normal. 𝑵(𝝁, 𝝈𝟐). 𝜽 = (𝜇, 𝜎 ) (both unknown).

𝑓𝑥; 𝜇, 𝜎 ) =1

√2𝜋𝜎𝑒

( )

=1

√2𝜋𝜎exp −

1

2𝜎(𝑥 − 𝜇)

=1

√2𝜋𝜎exp −

1

2𝜎(𝑥 − 2𝑥𝜇 + 𝜇 )

=1

√2𝜋𝜎exp −

𝜇

2𝜎exp −

1

2𝜎(𝑥 − 2𝑥𝜇)

4. Theorem (Exponential family & sufficient Statistic). Let 𝑋 , … , 𝑋 be a random sample from the regular exponential family.

Then

𝑻(𝑿) = 𝑡 (𝑋 ) , … , 𝑡 (𝑋 )

is sufficient for 𝜽 = (𝜃 , … , 𝜃 ).

Page 10: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

10

Example. Poisson(θ)

Let 𝑋 , … , 𝑋 be a random sample from Poisson(θ)

Then

𝑇(𝑿) = 𝑋

is sufficient for 𝜃 .

Example. Normal. 𝑵(𝝁, 𝝈𝟐). 𝜽 = (𝜇, 𝜎 ) (both unknown).

Let 𝑋 , … , 𝑋 be a random sample from 𝑵(𝜇, 𝜎 )

Then

𝑻(𝑿) = ( 𝑋 , 𝑋 )

is sufficient for 𝜽 = (𝜇, 𝜎 ).

Exercise.

Apply the general exponential family result to all the standard families discussed above such as binomial, Poisson, normal, exponential, gamma.

A Non-Exponential Family Example.

Discrete uniform.

𝑃 (𝑋 = 𝑥) = 1/𝜃, 𝑥 = 1, . . . , 𝜃, 𝜃 is a positive integer.

Another Non-exponential Example.

𝑋 , . . . , 𝑋 iid 𝑈 (0, 𝜃), 𝑇 = 𝑋( ).

Page 11: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

11

Universal Cases.

𝑋 , … , 𝑋 are iid with density 𝑓 .

• The original data 𝑋 , … , 𝑋 are always sufficient for 𝜃.

(They are trivial statistics, since they do not lead any data reduction)

• Order statistics 𝑇 = 𝑋( ), … , 𝑋( ) are always sufficient for 𝜃.

( The dimension of order statistics is 𝑛, the same as the dimension of the data. Still this is a nontrivial reduction as 𝑛! different values of data corresponds to one value of 𝑇 . )

5. Theorem (Rao-Blackwell)

Let 𝑋 , … , 𝑋 be a random sample from the population with pdf 𝑓 (𝒙; 𝜽). Let 𝑻(𝑿) be a sufficient statistic for θ, and 𝑼(𝑿) be any unbiased estimator of θ.

Let 𝑼∗(𝑿) = 𝐸[𝑼(𝑿)|𝑻], then

(1) 𝑼∗(𝑿) is an unbiased estimator of 𝜽, (2) 𝑼∗(𝑿) is a function of T, (3) 𝑽𝒂𝒓(𝑼∗) ≤ 𝑽𝒂𝒓(𝑼) for every 𝜽, and

𝑽𝒂𝒓(𝑼∗) < 𝑉𝑎𝑟(𝑈) for some 𝜽 unless 𝑼∗ = 𝑼 with probability 1 .

Rao-Blackwell theorem tells us that in searching for an unbiased estimator with the smallest possible variance (i.e., the best estimator, also called the uniformly minimum variance unbiased estimator – UMVUE, which is also referred to as simply the MVUE), we can restrict our search to only unbiased functions of the sufficient statistic T(X).

Page 12: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

12

6. Transformation of Sufficient Statistics

1. If 𝑇 is sufficient for 𝜃 and 𝑇 = 𝑐(𝑈), a mathematical function of some other statistic, then 𝑈 is also sufficient.

2. If 𝑇 is sufficient for 𝜃, and 𝑈 = 𝐺(𝑇 ) with 𝐺 being one-to-one, then 𝑈 is also sufficient.

Remark: When one statistic is a function of the other statistic and vice versa, then they carry exactly the same amount of information.

Examples:

• If ∑ 𝑋 is sufficient, so is 𝑋.

• If (∑ 𝑋 , ∑ 𝑋 ) are sufficient, so is (𝑋, 𝑆 ).

• If ∑ 𝑋 is sufficient, so is (∑ 𝑋 , ∑ 𝑋 ) is sufficient, and so is ∑ 𝑋 , ∑ 𝑋 ).

Examples of non-sufficiency.

Ex. 𝑋 , 𝑋 iid Poisson(𝜆). 𝑇 = 𝑋 – 𝑋 is not sufficient.

Ex. 𝑋 , . . . , 𝑋 iid pmf 𝑓 (𝑥; 𝜃). 𝑇 = (𝑋 , . . . , 𝑋 ) is not sufficient.

Page 13: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

13

7. Minimal Sufficient Statistics

It is seen that different sufficient statistics are possible. Which one is the "best"? Naturally, the one with the maximum reduction.

• For 𝑁 (𝜃, 1), 𝑋 is a better sufficient statistic for 𝜃 than ( 𝑋, 𝑆 ).

Definition:

𝑇 is a minimal sufficient statistic if, given any other sufficient statistic 𝑇′, there is a function 𝑐(•) such that 𝑇 = 𝑐(𝑇′).

Equivalently, 𝑇 is minimal sufficient if, given any other sufficient statistic 𝑇′, whenever 𝐱 and 𝐲 are two data values such that 𝑇′(𝐱) = 𝑇′(𝐲), then 𝑇 (𝐱) = 𝑇 (𝐲).

Partition Interpretation for Minimal Sufficient Statistics:

• Any sufficient statistic introduces a partition on the sample space.

• The partition of a minimal sufficient statistic is the coarsest.

• Minimal sufficient statistic has the smallest dimension among possible sufficient statistics. Often the dimension is equal to the number of free parameters (exceptions do exist).

Theorem (How to check minimal sufficiency).

A statistic T is minimal sufficient if the following property holds: For any two sample points x and y, 𝑓 (𝒙; 𝜃)/𝑓 (𝒚; 𝜃) does not depend on 𝜃 (i.e. 𝑓 (𝒙; 𝜃)/𝑓 (𝒚; 𝜃) is a constant function of 𝜃) if and only if 𝑇(𝒙) = 𝑇(𝒚).

Page 14: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

14

8. Exponential Families & Minimal Sufficient Statistic:

For a random sample from the regular exponential family with probability density 𝑓 (𝐱; 𝜽) = 𝑐(𝜽)ℎ(𝐱) exp [∑ 𝑤 (𝜽)𝑡 (𝐱)],

where 𝜽 is k dimensional, the statistic

𝑇(𝐗) = ( 𝑡 (𝐗 ) , … , 𝑡 (𝐗 ))

is minimal sufficient for 𝜽.

Example. Poisson(θ)

Let 𝑋 , … , 𝑋 be a random sample from Poisson(θ)

Then

𝑇(𝑋) = 𝑋

is minimal sufficient for 𝜃 .

Example. Normal. 𝑵(𝝁, 𝝈𝟐). 𝜽 = (𝜇, 𝜎 ) (both unknown).

Let 𝑋 , … , 𝑋 be a random sample from 𝑵(𝜇, 𝜎 )

Then

𝑻(𝑿) = ( 𝑋 , 𝑋 )

Is minimal sufficient for 𝜽 = (𝜇, 𝜎 ).

Remarks:

• Minimal sufficient statistic is not unique. Any two are in one-to-one correspondence, so are equivalent.

Page 15: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

15

9. Complete Statistics

Let a parametric family {𝑓(𝑥, 𝜃), 𝜃 ∈ 𝛩} be given. Let 𝑇 be a statistic. Induced family of distributions 𝑓 (𝑡, 𝜃), 𝜃 ∈ 𝛩.

A statistic 𝑇 is complete for the family {𝑓(𝑥, 𝜃), 𝜃 ∈ 𝛩}, or equivalently, the induced family 𝑓 (𝑡, 𝜃), 𝜃 ∈ 𝛩 is called complete, if 𝐸 (𝑔(𝑇)) = 0 for all 𝜃 ∈ 𝛩 implies that 𝑔(𝑇) = 0 with probability 1.

Example. Poisson(θ)

Let 𝑋 , … , 𝑋 be a random sample from Poisson(θ)

Then

𝑇(𝑋) = 𝑋

is minimal sufficient for 𝜃 . Now we show that T is also complete.

We know that 𝑇(𝑋) = ∑ 𝑋 ~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆 = 𝑛𝜃)

Consider any function 𝑢(𝑇). We have

𝐸[𝑢(𝑇)] =𝑢(𝑡)𝑒 𝜆

𝑡!

Because 𝑒 ≠ 0, setting 𝐸[𝑢(𝑇)] = 0 requires all the coefficient

𝑢(𝑡)

𝑡!

to be zero, which implies 𝑢(𝑇) = 0.

Example. Let 𝑋 , . . . , 𝑋 be iid from 𝐵𝑖𝑛(1, 𝜃). Show 𝑇 = ∑ 𝑋 is a complete statistic. (*Please read our text book for more examples – but the following result on the regular exponential family is the most important.)

Page 16: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

16

10. Exponential Families & Complete Statistics

Theorem. Let 𝑋 , . . . , 𝑋 be iid observations from the regular exponential family, with the pdf 𝑓 (𝐱; 𝜃) = 𝑐(𝜃)ℎ(𝐱) exp [∑ 𝑤 (𝜃)𝑡 (𝐱)], and 𝜃 =

(𝜃1, … , 𝜃 ). Then

𝑇(𝐗) = ( 𝑡 (𝐗 ) , … , 𝑡 (𝐗 ))

is complete if the parameter space {(𝑤 (𝜃), . . . , 𝑤 (𝜃)) ∶ 𝜃 ∈

𝛩} contains an open set in 𝑅 .

(This is only a sufficient condition, not a necessary condition)

Example. 𝑋 , . . . , 𝑋 ∼ 𝑁 (𝜃, 1), −∞ < 𝜃 < ∞.

Example. Poisson(θ); 0 < 𝜃 < ∞.

𝑓 (𝑥; 𝜃) = exp (−𝜃)1

𝑥!exp[ln(𝜃) ∗ 𝑥]

Example. Normal. 𝑵(𝝁, 𝝈𝟐). 𝜽 = (𝜇, 𝜎 ) (both unknown). −∞ < 𝜇 < ∞, 0 < 𝜎 < ∞.

𝑓𝑥; 𝜇, 𝜎 ) =1

√2𝜋𝜎𝑒

( )

=1

√2𝜋𝜎exp −

1

2𝜎(𝑥 − 𝜇)

=1

√2𝜋𝜎exp −

1

2𝜎(𝑥 − 2𝑥𝜇 + 𝜇 )

=1

√2𝜋𝜎exp −

𝜇

2𝜎exp −

1

2𝜎(𝑥 − 2𝑥𝜇)

Example. 𝑋 , . . . , 𝑋 ∼ 𝑁 (𝜃, 1), 𝜃 = 1, 2, is not complete.

Example. 𝑋 , . . . , 𝑋 ∼ 𝑁 (𝜃, 1), −∞ < 𝜃 < ∞, is complete.

Example. {Bin(2,𝑝), p = 1/2, p = 1/4} is not complete.

Page 17: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

17

Example. The family {Bin(2,p), 0 < p < 1} is complete.

Properties of the Complete Statistics

(i) If 𝑇 is complete and 𝑆 = 𝜓(𝑇 ), then 𝑆 is also complete.

(ii) If a statistic T is complete and sufficient, then any minimal sufficient statistic is complete.

(iii) Trivial (constant) statistics are complete for any family.

11. Theorem (Lehmann-Scheffe). (Complete Sufficient Statistic and the Best Estimator)

If T is complete and sufficient, then 𝑼 = 𝒉(𝑻) is the Best Estimator (also called UMVUE or MVUE) of its expectation.

Example. Poisson(θ)

Let 𝑋 , … , 𝑋 be a random sample from Poisson(θ)

Then

𝑇(𝑋) = 𝑋

is complete sufficient for 𝜃 . Since

𝑈 =𝑇(𝑋)

𝑛=

∑ 𝑋

𝑛

is an unbiased estimator of θ – by the Lehmann-Scheffe theorem we know that U is a best estimator (UMVUE/MVUE) for θ.

Page 18: y ázhu/ams571/Lecture7_571.pdf · 2017. 9. 26. · Title: Microsoft Word - Lecture7_571 Author: Wei Zhu Created Date: 9/26/2017 8:04:09 AM

18

12. Theorem (Basu)

A complete sufficient statistic T for the parameter θ is independent of any ancillary statistic – that is, a statistic whose distribution does not depend on θ

Example. Consider a random sample of size n from a normal distribution 𝑵(𝝁, 𝝈𝟐). 𝜽 = (𝜇, 𝜎 ).

Consider the MLEs 𝒇𝒐𝒓 𝝁, 𝝈𝟐 22

ˆ

( )ˆ i

X

X X

n

It is easy to verify that 𝑋 is a complete sufficient statistic for 𝜇, for fixed values of 𝜎 . Also:

𝒏𝝈𝟐

𝝈𝟐~𝝌𝟐(𝒏 − 𝟏)

which does not depend on 𝜇. It follows from the Basu Theorem that the two MLEs are independent to each other.