hypo-test-appendix text 2006-04-17...

Post on 19-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hypothesis TestingAppendix

James J. HeckmanUniversity of Chicago

Econ 312Spring 2019

A Further results on Bayesian vs.Classical Testing

(The phenomenon shows up even when we don’t have .)

Point mass placed at 0; 0 For rest of parameter space, wehave

1 0 = 1; 1 is density of

( | ) is model.

1

Marginal density of =

( ) =

Z( | ) ( )

= ( | 0) 0 + (1 0) 1 ( )

1 ( ) =

Z( | ) 1 ( )

2

Posterior probability that = 0 is

( 0 | ) =( | 0) 0

( )

=( | 0) 0

( | 0) 0 + (1 0) 1 ( )

=1

1 + (1 0)

0

1( )( | 0)

3

The posterior probability that for 0

Posterior Odds Ratio =1μ

1 0

0

¶1 ( )

( | 0)

=

μ0

1 0

¶μ( | 0)

1 ( )

Bayes factor:( | 0)

1 ( )

4

Example: Let ( 2); 2 known.

L ¡¯ 2

¢=

1p2 2

exp2 2

¡¯¢2¸

Let 1 = ( 12) be prior; then 1 = ( 2 + 2 )

( 0 | ) = 1 +1 0

h2³

2 +2´i 1

2exp

( ¯ )2

2 2+2

¸¡2

2 ¢ 12 exp

( ¯ 0)2

22

¸1

center at = 0. (This is a judgment about priors.)

5

( 0 | ) = 1 +(1 0)

0

exph

2

2[1+ 2 ( 2)]

i 1

(1 + 2 2)1 2

1

As we get “Lindley Paradox”

=

¯̄¯

0

¯̄³ ´

usual normal statistic (2 tail test).

6

Look at the following table of values:( 0 | ), = 0, 0 = 1 2, = .

(sample size)score value 1 5 10 20 50 100 10001.645 .1 .42 .44 .49 .56 .65 .72 .891.960 .05 .35 .33 .37 .42 .52 .60 .802.576 .01 .21 .13 .14 .16 .22 .27 .533.291 .001 .086 .026 .024 .026 .034 .045 .124

Observe: Not monotone in :

7

Now, consider the minimum of ( 0 | ) over all ; and1 = (marginal on alternative)Thm (Dickey and Savage):

( 0 | ) = 1 +(1 0)

0

( )

( | 0)

¸ 1

where( ) = sup

6= 0

( | )

usually obtained by substituting in maximum likelihood esti-mate.

8

The bound on the Bayes factor is

=( | 0)

1 ( )

( | 0)

( ).

The proof of this is really trivial.

9

Now, look at the example (MLE):

( ) =¡¯ | ¯¢

( 0 | )"1 +

1 0

0

(2 2 )12

(2 2 )12 exp

¡¯

0

¢2(2 2 )

# 1

= 1 +1 0

0exp

μ1

22

¶¸ 1

For 0 =12we get that for 1 68 (Berger and Selke; JASA, 1987).

10

( 0 | ) [ (value)]× [1 25 ] .

values Lower bound on Bound on Bayes factor( 0 | )

1.645 .1 .205 1/(3.87)1.960 .05 .127 1/(6.83)2.576 .01 .035 1/(27.60)3.291 .001 .0044 1/(224.83)

11

Thm. Berger and Selke.

=

½1 :

1 is symmetric about 0

and non-increasing in | 0|¾

values Lower bound on Bound on Bayes factor( 0 | )

1.645 .1 .390 1/(1.56)1.960 .05 .290 1/(2.45)2.576 .01 .109 1/(8.17)3.291 .001 .018 1/(54.55)

12

B Bayesian Credibility Sets

Def. A 100 (1 )% credible set for is a subset of suchthat

1 ( | ) =

Zd ( | ) ( )

=

Z( | ) d .

13

What is “best” ratio? A 100 (1 )% highest posterior densitycredible set is a subset C of such that

= { | ( | ) ( )}

where ( ) is the largest constant such that ( | ) 1 .

14

Example. Let ( | ) = ( ( ) 2) be a posterior 100 (1 )%credible set is given by³

( ) + 1³2

´( ) 1

³2

´ ´,

identical to the classical case for normal posteriors. Consider,however, what would happen in the Dickey Leamer case. Apossibility of disconnected credibility sets (multimodel). Thiscan happen in the classical case.

15

Problems. The forms of credible sets are not subject to simpleends.

Example. Suppose the posterior is

( | ) =1 + 2 0

ln ( | ) = ln + ln¡1 + 2

¢ln ( | )

= 12

1 + 2

16

2 ln ( | )2 =

2

1 + 2 +(2 ) (2 )¡1 + 2

¢2=

2

1 + 2

2 2

1 + 2 1

¸

=2

1 + 2

2 1

1 + 2

¸0,

where 1. This is increasing in . credible set is ( ( ) ).

17

Use a transformation:= exp ( )

( ) =¡1 + (log )2

¢ 1

decreasing on (1 exp ( )). we have a credible set given by(1 d ( )), and in terms of the original coordinate system, by

(0 lnd ( ))

(upper tail in original parametrization, lower tail in new para-metrization).

18

C Model Selection in Bayesian Way

The common classical approach is to use a pretest procedure.We seek ( | ), = + , (0 1 ). Predictivedensity under the hypothesis:

( | ) =

Z Z( | ) ( ) ,

( | ) = ( | ) ( )

Prior ( ) :¡; ( ) 1¢¡1 + ( ) 1 0¢

where we assume that is known.

19

( ) = 1 + ( ) 1 0

( ) = (2 )12 | ( )| 1

2 exp

μ1

2

¶= ( )0 1 ( )

20

Let = 1 .

1 ( ) =£( ) 1 0 + 1

¤ 1

=h

( 0 + )1 0

i| ( )| 1 = | + 0 | 1

= 0 ˆ = ( 0 )1 0

21

Exercise. Prove that

ˆ´0 ³

ˆ´

+³ˆ

´ ¡( ) 1 + 1

¢ 1³ˆ

´= ( )0 ( )³

ˆ´

( + ) 1³ˆ

´.

For the case when 1 is unknown (gamma-normal case),

( )¡ | 1 ( ) 1¢¡ | 2

1 1

¢.

22

Then, as shown, we have predictive density:

( ) =

Z· · ·Z

( | ) ( )

= ( 1 )

¯̄̄¯ 2

1

¯̄̄¯12μ

1 + 2

¶ 1+2

= ( + 0 )1 0

| | = | | | + 0 | 1

( 1 ) =( 1)

1 2¡

1

2+

21¢!

1

21¢!

23

The Bayes factor for relative to is

( 1 )

( 1 )

| | 12¯̄ ¯̄ 12

| + 0 | 12¯̄

+ 0 ¯̄ 12

· 1

1

³1 + 2

1

´ ( 1 + )

2

³+ 2

´ ( 1 + )

2

.

24

D Di use Prior Approach

Let | | 12 , = , get small and set 1 = 0 (in which case wedon’t get ). What types of di use priors? (Many exist andthey do not all lead to the same inference – a problem.)

= or

= ( ) 2 or

= ( ) 2 0 ,

and set 0. This is like saying we don’t have many a prioriobservations!

25

( | )

( | )2 2

¯̄ 0 ¯̄ 12

| 0 | 12μESSESS

¶2

,

where ESS is the regression sum of squares.

26

This strongly favors a small parameter model,

=

¯̄ 0 ¯̄ 12

| 0 | 12μESSESS

¶2

or

=

μESSESS

¶2

.

There is no unique di use prior.

27

E Dominated Priors and

As , , and 0 grow,

( | )

( | )=

¯̄ 0 ¯̄ 12

| 0 | 12μESSESS

¶2

.

, to derive , assume,:

0 X

28

You acquire the term from the determinant¯̄̄+

0 ¯̄̄ 12

¯̄̄+

0 ¯̄̄ 12

×

μ1

1

¶ ¡1 +

21

¢ 1 +

2

( 1 + 21 )

1 +

2

=

μ 21

21

¶2¡1 +

21

¢ 1 +

2

( 1 + 21 )

1 +

2

29

=1

1

2

2

( 1 + )1 +

2

( 1 + )1 +

2

2

´ ( 1 1 + ) 1

( 1 1 + ) 1

¡1 1 +

¢2¡

1 1 +¢2

= 2

μESSESS

¶2

.

30

Now observe . For nested models, Bayes factor 1

³1´,

where = is the number of restrictions.These methods easily handle non-nested setups (this is lackingin classical approaches), among other merits.Measures of location:

( | ) =X

( | ) ( | )

This allows for other model parameters to generate informationabout parameters. It also gives a measure of variability.

31

Issues:

1. Bayesian confidence intervals.

2. Bayesian values?

3. Multiple hypothesis testing.

Classical model selection, etc.

32

top related