probability matching priors for the bivariate normal...

PROBABILITY MATCHING PRIORS FOR THE BIVARIATE NORMALDISTRIBUTION

By

UPASANA SANTRA

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2008

1

c© 2008 Upasana Santra

2

To my parents who nurtured my academic interests making this milestone possible

3

ACKNOWLEDGMENTS

First, I offer my sincerest gratitude to my committee chair Dr. Malay Ghosh, who

supported me with his knowledge and patience. I would like to thank my supervisory

committee members Dr. Ramon Littell, Dr. Bhramar Mukherjee who also served as

cochair and Dr. Jonathan Shuster. Special thanks go to Dr. Bhramar Mukherjee for her

continuous guidance, support and help. I acknowledge her and Dr. Dalho Kim for doing

the simulation studies in my dissertation.

Finally I would like to thank my family, especially my husband Swadeshmukul Santra,

who first encouraged me to pursue this degree, and my daughter Laboni Santra. Without

their continuing support and encouragement, I would not have finished this degree.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

CHAPTER

1 LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2 Matching Via Posterior Quantiles . . . . . . . . . . . . . . . . . . . . . . . 15

1.2.1 Notation and Differential Equation . . . . . . . . . . . . . . . . . . 151.2.2 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.2.2.1 Case p=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.2.2.2 Case p=2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.2.3 Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . 191.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.3 Matching Priors for Distribution Functions . . . . . . . . . . . . . . . . . . 231.3.1 Notation and Differential Equation . . . . . . . . . . . . . . . . . . 231.3.2 Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . 241.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.4 Matching Priors for Highest Posterior Density Regions . . . . . . . . . . . 251.4.1 Notation and Differential Equation . . . . . . . . . . . . . . . . . . 261.4.2 Special Case: p=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.4.3 Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . 281.4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.5 Matching Priors Associated with Other Credible Regions . . . . . . . . . . 291.5.1 Matching Priors Associated with the LR Statistic . . . . . . . . . . 30

1.5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 301.5.1.2 Differential equation . . . . . . . . . . . . . . . . . . . . . 301.5.1.3 Special case: p=1 . . . . . . . . . . . . . . . . . . . . . . . 301.5.1.4 Nuisance parameters and orthogonality . . . . . . . . . . . 31

1.5.2 Matching Priors Associated with Rao’s Score and Wald’s Statistic . 321.5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 321.5.2.2 Differential equation . . . . . . . . . . . . . . . . . . . . . 331.5.2.3 Special case: p=1 . . . . . . . . . . . . . . . . . . . . . . . 34

2 MATCHING PRIORS FOR SOME BIVARIATE NORMAL PARAMETERS . . 35

2.1 The Orthogonal Reparameterization . . . . . . . . . . . . . . . . . . . . . 352.2 Quantile Matching Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5

2.3 Matching Via Distribution Functions . . . . . . . . . . . . . . . . . . . . . 432.4 Highest Posterior Density (HPD) Matching Priors . . . . . . . . . . . . . . 442.5 Matching Priors Via Inversion of Test Statistics . . . . . . . . . . . . . . . 462.6 Propriety of Posteriors and Simulation Study . . . . . . . . . . . . . . . . . 47

3 THE BIVARIATE NORMAL CORRELATION COEFFICIENT . . . . . . . . . 52

3.1 The Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . . 523.2 Quantile Matching Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.3 Highest Posterior Density (HPD) Matching Priors . . . . . . . . . . . . . . 573.4 Matching Priors Via Inversion of Test Statistics . . . . . . . . . . . . . . . 583.5 Propriety of the Posteriors . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.6 Likelihood Based Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.7 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4 RATIO OF VARIANCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.1 The Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . . 744.2 Quantile Matching Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3 Matching Via Distribution Functions . . . . . . . . . . . . . . . . . . . . . 784.4 Highest Posterior Density (HPD) Matching Priors . . . . . . . . . . . . . . 794.5 Matching Priors Via Inversion of Test Statistics . . . . . . . . . . . . . . . 804.6 Propriety of the Posteriors . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.7 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5 SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6

LIST OF TABLES

Table page

1-1 Fisher-Von Mises P(µ,λ)(0.05; µ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1-2 Fisher-Von Mises P(µ,λ)(0.95; µ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2-1 Frequentist Coverage Probabilities of 95% HPD Intervals for β, θ and η whenσ2

1 = 1 and σ22 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3-1 Simulation Comparing Priors for Bivariate Normal Correlation Coefficient . . . 69

4-1 Simulation Comparing Priors Suggested for Bivariate Normal Ratio of StandardDeviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7

LIST OF FIGURES

Figure page

3-1 Plot of Gelman-Rubin Diagnostic Statistic for ρ Under Prior III for n=10 Underthe Simulation Setting of Section 3.7. . . . . . . . . . . . . . . . . . . . . . . . . 68

3-2 Sample Trace Plot for All the Parameters under Prior III for n=10 Under theSimulation Setting of Section 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3-3 Posterior Distribution for ρ under Prior I for Different Sample Sizes, Under theSimulation Setting of Section 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3-4 Posterior Distribution for ρ under Prior II for Different Sample Sizes, Under theSimulation Setting of Section 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3-5 Sample Posterior Distribution for ρ under Prior III for Different Sample Sizes,Under the Simulation Setting of Section 3.7 . . . . . . . . . . . . . . . . . . . . 73

4-1 Sample Trace Plot for all Parameters under Prior 3 under the Simulation Settingof Section 4.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4-2 Plot of Gelman-Rubin Diagnostic Statistic for θ1 under Prior 3 . . . . . . . . . . 87

4-3 Posterior Distribution for θ1 under Prior 1 for Different Sample Sizes . . . . . . 88

4-4 Posterior Distribution for θ1 under Prior 2 for Different Sample Sizes . . . . . . 89

4-5 Sample Posterior Distribution for θ1 under Prior 3 for Different Sample Sizes . . 90

8

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

PROBABILITY MATCHING PRIORS FOR THE BIVARIATE NORMALDISTRIBUTION

By

Upasana Santra

May 2008

Chair: Malay GhoshCochair: Bhramar MukherjeeMajor: Statistics

In practice, most Bayesian analyses are performed with so called “non-informative”

priors. This is especially so when there is little or no prior information, and yet the

Bayesian technique can lead to solutions satisfactory from both the Bayesian and the

frequentist perspectives. The study of probability matching priors ensuring, upto the

desired order of asymptotics, the approximate frequentist validity of posterior credible

sets has received significant attention in recent years. In this dissertation we develop

some objective priors for certain parameters of the bivariate normal distribution. The

parameters considered are the regression coefficient, the generalized variance, the ratio

of one of the conditional variances to the marginal variance of the other variable, the

correlation coefficient and the ratio of the standard deviations. The criterion used is

the asymptotic matching of coverage probabilities of Bayesian credible intervals with

the corresponding frequentist coverage probabilities. Various matching criteria, namely,

quantile matching, matching of distribution functions, highest posterior density matching,

and matching via inversion of test statistics are used.

One particular prior is found which meets all the matching criteria individually for

the regression coefficient, the generalized variance and the ratio of one of the conditional

variances to the marginal variance of the other variable. For the correlation coefficient

though, each matching criterion leads to a different prior. There however, does not exist a

9

prior that satisfies the matching via distribution functions criterion in this case. Finally,

a general class of priors have been obtained for inference about the ratio of standard

deviations.

The propriety of the resultant posteriors is proved in each case under mild conditions

and simulation results suggest that the approximations are valid even for moderate sample

sizes. Further, several likelihood based methods have been considered for the correlation

coefficient. One common feature of all these modified likelihoods is that they are all

dependent on the data only through the sample correlation coefficient r.

10

CHAPTER 1LITERATURE REVIEW

1.1 Introduction

The Bayesian paradigm is an attempt to utilize all available information in decision-making.

Prior knowledge coming from experience, expert judgement, or previously collected data is

used with current data to characterize the current state of knowledge. However, even with

little or no prior information, one can often employ noninformative priors to draw reliable

inference which has led to a remarkable increase in the popularity of Bayesian methods in

the theory and practice of statistics. Thus over the years, a wide range of noninformative

priors have been proposed and studied.

The earliest use of noninformative priors is attributed to Laplace (1812). Laplace’s

rule, or the principle of insufficient reason, assigns a flat prior over the entire parameter

space. A problem with this rule is that it is not invariant under one-to one reparameterization.

For example, if θ is given a uniform distribution, then φ = exp (θ) will not have a uniform

distribution. Conversely, if we start with a uniform distribution for φ, then θ = log (φ)

will not have a uniform distribution. Since most statistical models do not have a unique

parameterization, this becomes bothersome. For example, a uniform prior for the standard

deviation σ will not transform into a uniform prior for the variance σ2. This lack of

invariance of the uniform prior often translates into significant variation in the resulting

posteriors.

Thus, Jeffreys (1961) proposed a prior which remains invariant under any one-to-one

reparameterization. In the general multiparameter set up, writing the Fisher Information

matrix as I(θ), where

I(θ) = E(− ∂2l

∂θi∂θj

)

and l is the log-likelihood, the rule is to take the prior to be

π(θ) ∝ |I(θ)| 12 .

11

The rule is applicable as long as I(θ) is defined and positive definite. As is easily checked,

it has the invariance property that for any other parameterization γ which is one-to-one

with θ,

π(γ) ∝ |J ||I(θ)| 12 ;

where J = det( ∂θ∂γ

). So the priors defined by the rule on γ and θ transform according

to the change-of-variables formula. Thus it does not require the selection of any specific

parameterization.

There are many intuitive justifications to use Jeffrey’s prior. One that concerns us

is a probability matching property. As an example, if X1, . . . , Xn are iid N(θ, 1), then

Xn =∑n

i=1 Xi/n is the MLE of θ. With the uniform prior π(θ) ∝ c ( a constant), the

posterior of θ is N(Xn, 1/n). Accordingly, writing zα for the upper 100α point of the

N(0,1) distribution,

P (θ ≤ Xn + zαn−1/2|Xn) = 1− α = P (θ ≤ Xn + zαn−1/2|θ)

This is an example of exact matching. Other examples of exact matching can be found

in Datta, Ghosh, M and Mukerjee (2000) and Severini, Mukerjee and Ghosh, M. (2002).

However, in most instances one has to rely on asymptotics rather than exact matching.

To see this, suppose θn is the MLE of θ. Then θn| θ is asymptotically N(θ, I−1(θ)), where

I(θ) is the Fisher Information number. Using the transformation g(θ) =∫ θ

I1/2(t),

g(θn) is asymptotically N(g(θ), 1) by the delta method. Now, intuitively one expects the

uniform prior as the asymptotic matching prior for g(θ). Transforming back to the original

parameter, Jeffreys’ prior is a probability matching prior for θ. This is discussed in Ghosh,

M. (2001). To see this, let φ = g(θ) and π(φ) = 1. Then |∂φ∂θ| = |g′(θ)| = I1/2(θ).

The above matching property is usually referred to as the quantile matching property.

However, quantile matching is one of several matching criteria available in the literature.

Typically, this matching of posterior coverage probability of a Bayesian credible set with

the corresponding frequentist coverage probability is also accomplished through either (a)

12

highest posterior density (HPD) regions, or (b) Distributions Functions , or (c) inversion

of certain test statistics.

Matching priors based on posterior quantiles was first investigated by Welch and

Peers (1963) who considered a scalar parameter of interest θ in the absence of any

nuisance parameters. In this case they showed by solving a differential equation that the

frequentist coverage probability of a one-sided posterior credible interval for θ matches

the nominal level with a remainder of o(n−12 ), where n is the sample size, if and only if

one works with Jeffreys’ prior. Such a prior will be referred to as a first order probability

matching prior. Welch and Peers proved this result only for continous distributions.

Ghosh, J.K. (1994) pointed out a suitable modification which would lead to the same

conclusion for discrete distributions. On the other hand, if one requires the remainder to

be of the order o(n−1), then we have a second order probability matching prior. We shall

see later that the Jeffreys’ prior is not necessarily a second order matching prior even in

the one parameter case. Moreover, Jeffrey’s prior has been criticized in the presence of

nuisance parameters. For example, Bernardo (1979) has shown that Jeffrey’s prior can

lead to marginalization paradox (cf. Dawid, Stone and Zidek (1973)) for inference about µσ

when the model is normal with mean µ and variance σ2. A second example, due to Berger

and Bernardo (1992a), shows that Jeffrey’s prior can lead to an inconsistent estimator of

the error variance in the balanced one-way normal ANOVA model when the number of

cells grows to infinity in direct proportion to the sample size. So Jeffrey’s prior fails to

avoid the Neyman-Scott (1948) phenomenon.

The original idea of Welch and Peers (1963) was pursued in the nuisance parameter

case by Peers (1965), Stein (1985), Tibshirani (1989), Nicolaou (1993), Datta and Ghosh,

J.K. (1995 a, b), Datta and Ghosh, M. (1995, 1996), Ghosh, M., Carlin and Srivastava

(1995), Ghosh, M. and Yang (1996), Datta (1996) and Garvan and Ghosh, M. (1996)

among others. As we shall see, matching is obtained by solving differential equations. The

calculations are highly simplified if the parameter of interest is orthogonal to the nuisance

13

parameters (Cox and Reid, 1987). If θ is partitioned into two vectors θ1 and θ2 of length

p1 and p2 respectively, where p1 + p2 = p, then θ1 is ortho gonal to θ2 if the elements of the

information matrix satisfy

iθsθt =1

nEθ

( ∂l

∂θs

∂l

∂θt

)

=1

nEθ

(− ∂2l

∂θs∂θt

)= 0

for s = 1, . . . , p1, t = p1 + 1, . . . , p1 + p2 ; this is to hold for all θ in the parameter space.

Note that l(θ) is the log-likelihood and i refers to information per observation, which will

be assumed to be O(1) as n →∞.

Suppose that we have a scalar parameter of interest, that is, θ = (θ1, . . . , θp)T , where

θ1 is the parameter of interest and the rest are nuisance parameters. Writing I(θ) = ((Ijk))

as the Fisher Information matrix, if θ1 is orthogonal to (θ2, . . . , θpT ), that is I1k = 0 for all

k = 2, . . . , p, extending the previous intuitive argument, π(θ) ∝ I111/2(θ) is a probability

matching prior. In the presence of nuisance parameters, Jeffreys’ prior may not satisfy the

quantile matching property. As an example, consider the Behrens-Fisher problem ( Ghosh,

M. and Kim, 2001). The model is represented by the density

1

γ1γ2

φ(x(1) − µ1

γ1

)φ(x(2) − µ2

γ2

), x(1), x(2)εR1

where µ1, µ2 ( εR1) and γ1, γ2 (> 0) are unknown parameters and φ is the standard normal

p.d.f. Interest lies in the difference µ1 − µ2. Reparameterize as

θ1 = µ1 − µ2, θ2 =(µ1/γ1

2) + (µ2/γ22)

(1/γ12) + (1/γ2

2), θ3 = γ1, θ4 = γ2

where θ1, θ2 εR1, and θ3, θ4 > 0. Then θ1 is the parameter of interest and the above is an

orthogonal parameterization. In this example, the first order matching is achieved if and

only if π(θ) = d(θ(2))(θ32 + θ4

2)−1/2

where θ(2) = (θ2, θ3, θ4)T . Moreover, it can be seen

that d(θ(2)) ∝ (θ32+θ4

2)3/2

(θ3θ4)3satisfies second order matching condition. However Jeffreys prior

14

for this model is proportional to (θ3θ4)−2 and is hence not even a first order probability

matching prior.

In the following sections we review and characterize the different matching priors.

1.2 Matching Via Posterior Quantiles

A major part of recent research still involves priors centered around those which

ensure approximate frequentist validity of the posterior quantiles. Specifically, we consider

priors π(.) for which the relation

Pθ{θ1 ≤ θ1(1−α)(π,X)} = 1− α + o(n−r/2) (1.2.1)

holds for r = 1 or 2 and for each α (0 < α < 1). Here n is the sample size, θ = (θ1, . . . , θp)T

is an unknown parameter vector, θ1 is the one-dimensional parameter of interest, Pθ{.} is

the frequentist probability measure under θ, and θ1(1−α)(π,X) is the (1 − α)th posterior

quantile of θ1, under π(.), given the data X. Priors satisfying (1.2.1) for r=1 or 2 are called

first or second order matching priors respectively. Clearly, they ensure that one-sided

Bayesian credible sets of the form (−∞, θ1(1−α)(π, X)] for θ1 have correct frequentist

coverage as well up to the order of approximation indicated in (1.2.1). In the presence

of nuisance parameters, a first order matching prior is not unique. The study of second

order matching priors, which ensures correct frequentist coverage to a higher order of

approximation, can help significantly in narrowing down the class of competing first order

matching priors.

1.2.1 Notation and Differential Equation

Consider a sequence {Xi}, i ≥ 1 of i.i.d possibly vector valued random variables with

common density f(x; θ) where the parameter vector θ = (θ1, . . . , θp)T belongs to Rp or

some open subset thereof and θ1 is the parameter of interest. Let X = (X1, . . . , Xn)T ,

where n is the sample size, and let θ = (θ1, . . . , θp)T

be the MLE of θ based on X. Let

θ have a prior density π(.) and be continuously differentiable over the entire parameter

15

space. Also let l(θ) = n−1∑n

i=1 logf(Xi; θ) and , with Dj = ∂/∂θj, for 1 ≤ j, r, s, u ≤ p, let

Vj = Djlogf(X1; θ), Lj,r,s = Eθ(VjVrVs), Lj,rs = Eθ(VjVrs), Ljrs = Eθ(Vjrs). (1.2.1.1)

Let I = ((Ijr)) be the per observation Fisher information matrix at θ. Define I−1 =

((Ijr)), then

τ jr = Ij1Ir1/I11, σjr = Ijr − τ jr. (1.2.1.2)

These are considered to be smooth functions of θ. Also, for 1 ≤ j, r ≤ p, let

πj(θ) = Djπ(θ), πjr(θ) = DjDrπ(θ), π = π(θ), πj = πj(θ), πjr = πjr(θ). (1.2.1.3)

Now we give the theorem which characterizes the first and second order probability

matching priors.

Theorem 1.2.1 (a) A prior π(.) is first order probability matching if and only if it

satisfies the partial differential equation

p∑j=1

Dj{(I11)− 1

2 Ij1π(θ)} = 0 (1.2.1.4)

(b) A prior π(.) is second order probability matching if and only if it satisfies, in addition,

the partial differential equation

∑j

∑r

∑s

∑u

1

3[Du{πτ jrLjrs(3σ

su + τ su)}]−p∑

j=1

p∑r=1

DjDr{πτ jr} = 0 (1.2.1.5)

Part (a) was proved originally by Peers (1965) and part (b) by Mukerjee and Ghosh,

M. (1997).

1.2.2 Special Cases

We now focus attention to the consequences of Theorem 1.2.1 in some important

special cases.

1.2.2.1 Case p=1

First consider matching priors in the one parameter models. For p=1 we have θ = θ1.

In this case, both θ and I are scalars. Then the first order matching equation (1.2.1.4)

16

reduces to

d

dθ{π(θ)/I1/2} = 0,

leading to the unique solution

π(θ) ∝ I1/2, (1.2.2.1)

which is the Jeffreys’ (1961) prior. Thus in this case, Jeffreys’ prior is the unique first

order matching prior. Furthermore, for p=1, by (1.2.1.1) and (1.2.1.2), (1.2.1.4) reduces to

1

3{π(θ)L111/I

2} − d

dθ{π(θ)/I} = constant. (1.2.2.2)

Now for Jeffreys’ prior, given by (1.2.2.1), and using the standard regularity

condition, it follows from Bartlett (1953) that

d

dθI = −(L1,11 + L111), (1.2.2.3)

and

L1,1,1 + 3L1,11 + L111 = 0. (1.2.2.4)

Thus, the left hand side of (1.2.2.2) simplifies to

1

3I−3/2L111 − d

dθI−1/2 =

1

6I−3/2L1,1,1 (1.2.2.5)

Summarizing the above results we get the following theorem.

Theorem 1.2.2 (a) For p=1, Jeffreys’ prior is the unique first order probability

matching prior.

(b) Furthermore, it is also second order probability matching if and only if I−3/2L1,1,1

is a constant free from θ.

Apart from this early result on matching priors, another result, again due to Welch

and Peers (1963), is presented in the next theorem.

Theorem 1.2.3 Under the one parameter location model

f(x; θ) = f ∗(x− θ), x εR1, (1.2.2.6)

17

Jeffreys’ prior, given by π(θ) = constant, is exact matching.

In the one parameter scale model

f(x; θ) =1

θf ∗(

x

θ), (1.2.2.7)

where θ > 0 and f ∗(.) is a density with support either R1 or [0,∞), Jeffreys’ prior given

by π(θ) ∝ θ−1 is second order probability matching. In this case too, it can be shown that

the matching is exact. Even beyond the standard one parameter location or scale models,

Jeffreys’ prior can enjoy the second order matching property. On the other hand there can

be models where the condition in Theorem 1.2.2 (b) does not hold and consequently no

second order matching prior is available.

Example 1.2.1 Consider the bivariate normal model with zero means, unit variances

and correlation coefficient θ, where |θ| < 1. Then

I =1 + θ2

(1− θ2)2 , L1,1,1 = −2θ(3 + θ2)

(1− θ2)3 .

Here I−3/2L1,1,1 is not free from θ. Hence by Theorem 1.2.2, Jeffreys’ prior is only first

order probability matching and no second order matching prior is available.

1.2.2.2 Case p=2

We now discuss two parameter models. Let p=2 where both the interest and nuisance

parameters are one-dimensional. This situation covers many models of interest such as the

location-scale family. By (1.2.1.2), here

I11 = Q, I12 = I21 = −Qζ, I22 = QI11/I22,

τ 11 = Q, τ 12 = τ 21 = −Qζ, τ 22 = Qζ2,

σ11 = 0, σ12 = σ21 = 0, σ22 = I−122 ,

(1.2.2.8)

where

Q = (I11 − I−122 I2

12)−1

, ζ = I12/I22. (1.2.2.9)

18

Hence using (1.2.2.8), the partial differential equations (1.2.1.4) and (1.2.1.5), for first and

second order probability matching, can be expressed as

D1{π(θ)Q1/2} −D2{π(θ)Q1/2ζ} = 0, (1.2.2.10)

and

1

3D1{π(θ)Q2(L111 − 3L112ζ + 3L122ζ

2 − L222ζ3)}

1

3D2{π(θ)Q2(L111 − 3L112ζ + 3L122ζ

2 − L222ζ3)}

D2{π(θ)QI−122 (L112 − 2L122ζ + L222ζ

2)}

−D12{π(θ)Q}+ 2D1D2{π(θ)Qζ} −D2

2{π(θ)Qζ2} = 0,

(1.2.2.11)

respectively. The second order matching condition (1.2.2.11) for the case p = 2 is due to

Mukerjee and Dey (1993).

1.2.3 Orthogonal Parameterization

The study of matching priors can get substantially simplified when there is an

orthogonal parameterization. We shall now discuss the implications of this phenomenon in

some detail. Let

I1j = 0(2 ≤ j ≤ p), (1.2.3.1)

identically in θ. Then I1j = 0, 2 ≤ j ≤ p, and by (1.2.1.2),

τ 11 = I11 = I−111 ;

τ jr = 0, if (j, r) 6= (1, 1);

σjr = 0, if either j = 1or r = 1;

σjr = Ijr, if j ≥ 2 and r ≥ 2.

Hence the partial differential equations (1.2.1.4) and (1.2.1.5), for first and second order

probability matching, can be expressed as

D1{π(θ)I−1/211 } = 0, (1.2.3.2)

19

and

p∑s=2

p∑u=2

Du{π(θ)I−111 IsuL11s +

1

3D1{π(θ)I−2

11 L111} −D21{π(θ)I−1

11 } = 0, (1.2.3.3)

respectively.

A prior π(.) satisfies (1.2.3.2) and is hence first order probability matching if and only

if it is of the form

π(θ) = d(θ(2))I1/211 , (1.2.3.4)

where d(.) (> 0) is any smooth function of θ(2) = (θ2, . . . , θp)T . This result is due to

Tibshirani (1989). Nicolaou (1993) also proved it using another approach. By (1.2.3.3), a

prior of the form (1.2.3.4) is second order probability matching if and only if

p∑s=2

p∑u=2

Du{d(θ(2))I−1/211 IsuL11s}+

1

6d(θ(2))D1{I−3/2

11 L1,1,1} = 0. (1.2.3.5)

1.2.4 Examples

Example 1.2.2 (Sun, 1997) Consider the Weibull model given by the density

(µ1/µ2)(x/µ2)µ1−1exp{−(x/µ2)

µ1}, x > 0,

where µ1, µ2(> 0) are unknown parameters. Interest lies in the shape parameter µ1.

Reparameterize (µ1, µ2) as

θ1 = µ1, θ2 = µ2exp(w/µ1), (1.2.4.1)

where w =∫∞0

(u logu) exp(−u) du and θ1, θ2 > 0. Then θ1 is the parameter of interest and

θ2 is orthogonal to θ1. Also,

I11 ∝ θ1−2, I22 = θ1

2/θ22, L1,1,1 ∝ θ1

−3, L112 ∝ (θ1θ2)−1.

By (1.2.3.4), therefore, first order matching is achieved if and only if π(θ) = d(θ2)/θ1.

Moreover, by (1.2.3.5) such a prior is second order matching if and only if d(θ2) ∝ θ2−1.

20

Hence π(θ) ∝ (θ1θ2)−1 is the unique second order matching prior. This prior becomes

proportional to (µ1µ2)−1 when reverted back to the original parameterization.

Example 1.2.3 (Mukerjee and Dey, 1993) This concerns the ratio of two independent

normal means and corresponds to a simpler version of the Fieller (1954) and Creasy (1954)

problem. Let the model be represented by the density

φ(x(1) − µ1)φ(x(2) − µ2), x(1), x(2) εR1,

where µ1, µ2(> 0) are unknown parameters. φ(.) represents the standard univariate normal

density. Interest lies in the ratio µ1/µ2. Reparameterize (µ1, µ2) as

θ1 = µ1/µ2, θ2 = (µ12 + µ2

2)1/2

, (1.2.4.2)

where θ1, θ2 > 0. Then θ1 is the parameter of interest and one can check that (1.2.4.2) is

an orthogonal parameterization. Furthermore,

I11 = θ22/(θ1

2 + 1)2, I22 = 1, L1,1,1 = 0, L112 = −θ2/(θ1

2 + 1)2.

Hence by (1.2.3.4), first order matching is achieved if and only if π(θ) = d(θ2)θ2

(θ12+1)

,

whereas by (1.2.3.5) such a prior is second order matching if and only if, in addition,

d(θ2) is a constant. Thus π(θ) ∝ θ2/(θ12 + 1) is the unique second order matching prior.

Interestingly under the original parameterization, this gets transformed to the uniform

prior on (µ1, µ2).

Example 1.2.4 (Tibshirani, 1989) Continuing with the setup of the last example, now

suppose that interest lies in the product µ1µ2. Reparameterize as

θ1 = µ1µ2, θ2 = µ22 − µ1

2 (1.2.4.3)

21

where θ1 > 0, θ2 εR1. Then θ1 is the parameter of interest and (1.2.4.3) is an orthogonal

parameterization. Furthermore,

I11 = 4I22 = (4θ12 + θ2

2)−1/2

, L1,1,1 = 0, L112 =1

2θ2 (4θ1

2 + θ22)−3/2

. (1.2.4.4)

By (1.2.3.4), first order matching is achieved if and only if

π(θ) = d(θ2)(4θ12 + θ2

2)−1/4

. (1.2.4.5)

Such a prior is also second order matching if and only if d(θ2) satisfies (1.2.3.5) which, in

view of (1.2.4.4), reduces to

D2{d(θ2)θ2(4θ12 + θ2

2)−3/4} = 0.

Clearly the above equation does not admit any solution for d(θ2). Thus no second

order matching prior is available in this example. Taking d(θ2) as constant in (1.2.4.5),

one gets the first order matching prior π(θ) ∝ (4θ12 + θ2

2)−1/4

. Under the original

(µ1, µ2)-parameterization, this is proportional to (µ12 + µ2

2)1/2

.

Example 1.2.5 (Garvan and Ghosh, M., 1999) Consider the Fisher-Von Mises probability

density function

f(y|µ, λ) =1

2πI0(λ)exp{λcos(y − µ)}.

Let µ be the parameter of interest and λ be the nuisance parameter. Then a second order

probability matching prior for µ denoted as πµ(2)(µ, λ) is obtained as

πµ(2)(µ, λ) = λ[1− 1

A(λ)− A2(λ)],

where A(λ) = I1(λ)/I0(λ), I1(λ) and I0(λ) being modified Bessel functions. Jeffreys’ prior,

however, is

πJ =

√λA(λ) [1− A(λ)

λ− A2(λ)].

Garvan and Ghosh, M. (1999) have investigated the performance of these priors by

calculating the frequentist coverage probabilities of some one-sided posterior credible

22

intervals for µ. Let µα denote the posterior α-quantile of µ given y = (y1, . . . , yn). Also,

let

P(µ,λ)(α; µ) = P(µ,λ)(µ ≤ µα|µ, λ) = P(µ,λ)(F (µ) ≤ F (µα)|µ, λ).

If the marginal posterior distribution of µ under the prior π yields quantiles so that

P(µ,λ)(α; µ) is close to α, then there is evidence that the chosen prior performs well. In

Tables 1-1 and 1-2, results are summarized for simulated tail probabilities of posterior

distributions of µ when µ = π , λ = 1, sample sizes n=5 and 10 and α is set at 0.05 and

0.95. It is evident that πµ(2)(µ, λ) is very close to the target.

1.3 Matching Priors for Distribution Functions

Matching priors for posterior quantiles were discussed at length in the previous

section. Since quantiles are intimately linked with the cumulative distribution function

(c.d.f), one may wonder how far these results hold when matching is done via c.d.f.

instead of quantiles. The results show that first order matching priors for quantiles remain

so when the analysis is based on a comparison of the posterior and frequentist c.d.f.’s.

In the situation where interest lies in several parameters, posterior quantiles are not

well defined but the joint posterior c.d.f. remains meaningful and provides a route for

finding matching priors.


In this section, the setup and notations on both the prior and the model are the same

as in the previous section. We target priors π which achieve matching via distribution

functions of some standardized variables. More specifically, when θ1 is the parameter of

interest, while (θ2, . . . , θp)T is the vector of nuisance parameters, writing θ1 as the MLE of

θ1 with n−12 I11,

(I = ((Ijj′)), I

−1 = ((Ijj′))), as its asymptotic variance, we consider the

random variable y =√

n(θ1 − θ1)/(I11)

1/2. Specifically, if P π denotes the posterior of y

given the data X, what we want to achieve is the asymptotic matching

E[P π(y ≤ w|X)|θ] = P (y ≤ w|θ) + o(n−1). (1.3.1.1)

23

A prior π(.) ensures first order matching of the posterior and frequentist c.d.f.’s if and

only if it satisfies the partial differential equation

p∑j=1

Dj{π(θ)Ij1(I11)−1/2} = 0. (1.3.1.2)

Further, a prior π(.) ensures matching in the same sense at the second order if and only if

it satisfies the partial differential equations

∑j

∑r

(DjDr{τ jrπ(θ)} − 2Dr{τ jrπj(θ)}

)

−∑j,r,s,v

(Dv{Ljrsτ

jrσsvπ(θ)} −Dr{Ljsvτjrσsvπ(θ)}) = 0

and∑j,r,s,v

Dv{Ljrsτjrτ svπ(θ)} = 0.

(1.3.1.3)

The above results are due to Mukerjee and Ghosh, M. (1997). The two approaches based

on the quantiles (1.2.1.4) and c.d.f.’s (1.3.1.2), lead to the same first order matching

condition. However, the corresponding second order matching conditions (1.2.1.5) and

(1.3.1.3) are not identical. The second order conditions (1.3.1.3) are more restrictive than

(1.2.1.5) and often do not have a solution.


Under orthogonality of θ1 with (θ2, . . . , θp), it follows from (3.2.5) to (3.2.7) of Datta

and Mukerjee (2004) that such a prior π is of the form I111/2g(θ2, . . . , θp), where in

addition one needs to satisfy the two differential equations

A1 =∂2

∂θ12

(I11π(θ))− 2

∂

∂θ1

(I11π(θ))−p∑

s=2

p∑v=2

∂

∂θs

{E( ∂3logf

∂θ12∂θs

)I11Isvπ(θ)}

−p∑

s=2

p∑v=2

∂

∂θ1

{E( ∂3logf

∂θ1∂θs∂θv

)I11Isvπ(θ)} = 0.

(1.3.2.1)

and

A2 =

p∑s=2

p∑v=2

∂

∂θs

{E( ∂3logf

∂θ12∂θs

)I11Isvπ(θ)} = 0. (1.3.2.2)

24

Suppose now that interest lies in finding c.d.f. matching priors for a one dimensional

smooth function g(θ) of the parameter vector θ. Following Datta and Ghosh, J.K. (1995b),

the first order c.d.f. matching condition for g(θ) is presented below.

Let ∇g(θ) = (D1g(θ), . . . , Dpg(θ))T be the gradient vector of the parametric function

g(θ). Define the vector η(θ) = (η1, . . . , ηp)T by

η = [{∇g(θ)}T I−1{∇g(θ)}]−1/2I−1∇g(θ).

Then a prior π(.) ensures first order c.d.f. matching for g(θ) if and only if it satisfies

p∑j=1

Dj{ηjπ(θ)} = 0. (1.3.2.3)

1.3.3 Examples

Example 1.3.1 (Datta and Ghosh, J.K., 1995b) Consider the log-normal model given

by the density

f(x; θ) = (xθ2)−1φ

( logx− θ1

θ2

), x > 0,

where θ1 ε R1 and θ2 > 0. Suppose interest lies in g(θ) = exp (θ1 + 12θ2

2), the population

mean. Here

I11 = θ2−2, I22 = 2θ2

−2, I12 = 0.

Hence solving (1.3.2.3), the solution obtained is

π(θ) = θ2−2

(1 +

1

2θ2

2)1/2

.

1.4 Matching Priors for Highest Posterior Density Regions

Often a Bayesian analysis culminates in the construction of a posterior or predictive

distribution. This distribution constitutes a full description of the information about the

parameter of future observation. A useful summary of it, perhaps to augment a graph, is

provided by a highest posterior density (HPD) region or indeed a family of such regions.

25

With a possibly multidimensional interest parameter θ, such a region is of the form

{θ : π(θ|X) ≥ K},

where π(θ|X) is the posterior density of θ, under a prior π(.), given the data X, and

K depends on π(.) and X in addition to the chosen posterior credibility level. By the

Neyman- Pearson lemma , an HPD region has the smallest possible volume, given X, at

a chosen level of credibility. In this section, we consider priors that ensure approximate

frequentist validity of HPD regions with margin of error o(n−1), where n is the sample

size. Priors of this kind are called HPD matching priors. They can be useful even when

the interest parameter is multidimensional since HPD regions are well defined in such

situations.


We continue with the setup and notation of Section 1.2. Suppose interest lies in

the entire parameter vector θ = (θ1, . . . , θp)T , i.e., there is no nuisance parameter. HPD

matching priors are required to ensure

Pθ{θ ε Q(1−α)(π,X)} = 1− α + o(n−1)

for all α and θ. We now give a characterization for HPD matching priors when interest lies

in the entire parameter vector θ. The result is due to Ghosh, J.K. and Mukerjee (1993b)

who reported it in another equivalent form.

Theorem 1.4.1 A prior π(.) is HPD matching for θ if and only if it satisfies the

partial differential equation

∑j,r,s,u

Du{π(θ)LjrsIjrIsu} −

∑j,r

DjDr{π(θ)Ijr} = 0. (1.4.1.1)

26

1.4.2 Special Case: p=1

Peers (1968) and Severini (1991) explored HPD matching priors for scalar θ. Then

p = 1, θ = θ1, I becomes a scalar, and (1.4.1.1) becomes

π(θ)L111I−2 − d

dθ{π(θ)I−1} = constant.

Using regularity condition (1.2.2.3),the above is equivalent to

I−1(dπ(θ)

dθ

)+ π(θ)L1,11I

−2 = constant. (1.4.2.1)

The HPD matching condition (1.4.2.1), arising for p = 1, was first reported in Peers

(1968) and is equivalent to the corresponding condition given in Severini (1991).

Continuing with p = 1 and again using (1.2.2.3), a prior of the form π(θ) ∝ Ir, where

r is a real number, satisfies (1.4.2.1) if and only if

Ir−2{(1− r)L1,11 − rL111} = constant. (1.4.2.2)

In particular, taking r = 1/2 in the above, (1.4.2.1) holds for Jeffreys’ prior if and

only if

I−3/2(L1,11 − L111) = constant. (1.4.2.3)

The condition (1.4.2.3) holds for the one parameter location and scale models introduced

in (1.2.2.6) and (1.2.2.7) respectively. For these models, Jeffreys prior is HPD matching

for θ. However, even with p = 1 , Jeffreys’ prior does not always enjoy the HPD matching

property.

Example 1.4.1 Consider the bivariate normal model with zero means, unit variances

and correlation coefficient θ, where |θ| < 1. Then

I =1 + θ2

(1− θ2)2 , L1,11 = −1

2L111 =

2θ(3 + θ2)

(1− θ2)3 .

Then (1.4.2.3) does not hold but (1.4.2.2) is satisfied by r = -1. Hence Jeffreys’ prior is

not HPD matching for θ but π(θ) ∝ I−1 enjoys this property.

27


Now suppose interest lies in θ1 and θ2, . . . , θp be nuisance parameters. As before,

an HPD matching prior for θ1 is defined as one that ensures frequentist validity of HPD

regions for θ1 with margin of error o(n−1). In order to facilitate the presentations, we

work under an orthogonal parameterization. Since θ1 is one dimensional, this can always

be achieved by suitably choosing the nuisance parameters (Cox and Reid, 1987). The

following theorem is due to Ghosh, J.K. and Mukerjee (1995).

Theorem 1.4.2. Suppose orthogonal parameterization holds. Then a prior π(.) is

HPD matching for θ1 if and only if it satisfies the partial differential equation

p∑s=2

p∑u=2

Du{π(θ)I−111 IsuL11s}+ D1{π(θ)I−2

11 L111} −D12{π(θ)I−1

11 } = 0 (1.4.3.1)

It can be verified that for models where

D1(I−3/211 L111) = 0, (1.4.3.2)

under orthogonal parameterization, any second order matching prior for posterior

quantiles of θ1 is also HPD matching for θ1. HPD matching priors are invariant of the

parameterization as long as the object of interest, viewed either as a parametric function

under an original parameterization or as a canonical parameter after reparameterization,

remains unaltered.

1.4.4 Examples

Example 1.4.2. (Ghosh, J.K. and Mukerjee, 1995a) We revisit Example (1.2.3) where

interest lies in the ratio of independent normal means. Here orthogonal parameterization

holds. Furthermore,

I11 = θ22/(θ1

2 + 1)2, I22 = 1, L112 = −θ2/(θ1

2 + 1)2, L111 = −3L1,11 = 6θ1θ2

2/(θ12 + 1)

3.

(3.1.1) is not satisfied. In addition, no first order matching prior for posterior quantiles

of θ1 is HPD matching for θ1. Solutions to the HPD matching condition (1.4.3.1) for θ1

28

however exist. For example, any prior of the form

π(θ) ∝ θ1r1θ2

r2/(θ12 + 1)

r3,

where r1, r2 and r3 are real, satisfies (1.4.3.1) if and only if one of the following holds:

(a) r1 = 0, r2 = 6, r3 = 3/2,

(b) r1 = 1, r2 = 13, r3 = 2,

(c) r1 = 0, r2 = 1, r3 = −1,

(d) r1 = 1, r2 = −2, r3 = −1/2.

Example 1.4.3 Consider the gamma model

f(x; θ) = xθ1−1e−x/θ2/{θ2θ1Γ(θ1)}, x > 0,

in a natural parameterization, where θ1, θ2 > 0. Then

I11 =d2

dθ12 log Γ(θ1), I22 = θ1θ2

−2, I12 = θ2−1,

L111 = − d3

dθ13 log Γ(θ1), L112 = 0, L122 = θ2

−2, L222 = 4θ1θ2−3.

Hence the prior π(θ) ∝ (θ13θ2)

−1satisfies (1.4.1.1) and is HPD matching for θ = (θ1, θ2)

T .

1.5 Matching Priors Associated with Other Credible Regions

In this section we focus on priors that ensure approximate frequentist validity of

posterior credible regions obtained by inverting certain commonly used statistics. The

results, when combined with those of the HPD matching case, can help in narrowing down

the choice of matching priors especially when the interest parameter is multidimensional.

We will begin with the matching priors that are associated with the LR statistics followed

by those associated with the Rao’s Score and Wald’s statistic.

29

1.5.1 Matching Priors Associated with the LR Statistic

1.5.1.1 Introduction

Suppose that the interest lies in the entire parameter vector θ = (θ1, . . . , θp)T , i.e.,

there is no nuisance parameter. The LR statistic for θ is given by

MLR(θ,X) = 2n{l(θ) − l(θ)},

where X = (X1, . . . , Xn)T and l(θ) = n−1∑n

i=1 log f(Xi; θ). Given a prior π(.), the

inversion of the above statistic yields a posterior credible region for θ as

Q(1−α)LR (π,X) = {θ : MLR(θ,X) ≤ k1−α(π,X)},

where k1−α(π, X), which may depend on π(.) and X but not on θ, has to be so chosen that

the relation

P π{θ εQ(1−α)LR (π, X) |X} = 1− α + o(n−1)

holds.

1.5.1.2 Differential equation

The matching priors are characterized in the theorem below.

Theorem 1.5.1 A prior π(.) is LR matching for θ if and only if it satisfies the partial

differential equation

∑j,r,s,u

Du{π(θ) LjrsIjrIsu} −

∑j,r

DjDr{π(θ) Ijr}+ 2∑j,r

Dr{Ijrπj(θ)} = 0. (1.5.1.1)

The above result is due to Ghosh, J.K. and Mukerjee (1991).

1.5.1.3 Special case: p=1

If p=1 then θ = θ1, I is a scalar and (1.5.1.1) becomes

π(θ)L111I−2 − d

dθ{π(θ)I−1}+ 2I−1

(dπ(θ)

dθ

)= constant.

30

Using regularity condition (1.2.2.3),the above is equivalent to

I−1(dπ(θ)

dθ

)− π(θ)L1,11I−2 = constant. (1.5.1.2)

Equation (1.5.1.2) is in agreement with the findings of Severini (1991) who studied this

problem for scalar θ.

Continuing with p = 1 and again using (1.2.2.3), a prior of the form π(θ) ∝ Ir, where

r is a real number, satisfies (1.5.1.2) if and only if

Ir−2{rL111 + (1 + r)L1,11} = constant. (1.5.1.3)

In particular, taking r = 1/2 in the above and using the regularity condition (1.2.2.4),

it follows that Jeffreys’ prior satisfies (1.5.1.2) if and only if

I−3/2(L1,1,1) = constant. (1.5.1.4)

The above, by Theorem 1.2.2, is also the condition under which Jeffreys’ prior is second

order matching for the posterior quantiles of θ.

The condition (1.5.1.4) holds for the one parameter location and scale models

introduced in (1.2.2.6) and (1.2.2.7) respectively. For these models, Jeffreys prior is LR

matching for θ. On the other hand for the bivariate normal model considered in Examples

(1.2.1) and (1.4.1), the condition (1.5.1.4) is not met but (1.5.1.3) holds with r = 1. Thus

for this model π(θ) ∝ I is LR matching for θ though Jeffrey’s prior does not enjoy this

property.

1.5.1.4 Nuisance parameters and orthogonality

The LR matching condition (1.5.1.1) given above allows θ to be possibly multidimensional

but presumes that nuisance parameters are absent. Several results on characterizations

of LR matching priors in the presence of nuisance parameters have been reported in

the literature. DiCiccio and Stern (1994) allowed both the interest and the nuisance

parameters to be possibly multidimensional and made no assumption about orthogonal

31

parameterization. Earlier, Ghosh, J.K. and Mukerjee (1992b, 1994b) derived special cases

for the matching conditions when both the interest and nuisance parameters are one

dimensional and orthogonal parameterization holds.

Theorem 1.5.2 Suppose orthogonal parameterization holds. Then a prior π(.) is LR

matching for θ1 if and only if it satisfies the partial differential equation

p∑s=2

p∑u=2

Du{π(θ)I−111 IsuL11s}

+ D1[I−111 {

∂π(θ)

∂θ1

− π(θ)(I−111 L1,11 −

p∑s=2

p∑u=2

IsuL1su)}] = 0.

(1.5.1.5)

For a general comparison between the LR and HPD matching conditions (1.5.1.5) and

(1.4.3.1) for θ1, it can be shown that the difference in the left hand side of (1.5.1.5) and

(1.4.3.1) reveals that an HPD matching prior satisfying (1.4.3.1) is also LR matching for

θ1 if and only if it satisfies

D1[I−111 {2π1(θ) + π(θ)

p∑s=2

p∑u=2

IsuL1su}] = 0.

1.5.2 Matching Priors Associated with Rao’s Score and Wald’s Statistic

1.5.2.1 Introduction

Two other statistics enjoy widespread popularity for constructing matching priors.

These are Rao’s score and Wald’s statistics. Suppose interest lies in the entire parameter

vector θ. Then Rao’s score statistic is based on the score vector

∇l(θ) = (D1l(θ), . . . , Dpl(θ))T

In the posterior setup, with h = (h1, . . . , hp)T = n1/2(θ − θ),

n1/2∇l(θ) = −Ch + o(1),

32

where n1/2∇l(θ) is p-variate normal with null mean vector and dispersion matrix C. A

posterior version of the Rao’s score statistic is given by

MRao(θ, X) = n{∇l(θ)}T C−1{∇l(θ)}

Similarly Wald’s statistic is based on h = n1/2(θ − θ), which is asymptotically p-variate

normal with null mean vector and dispersion matrix C−1. A posterior version of the

Wald’s statistic is given by

MWald(θ, X) = n(θ − θ)TC(θ − θ).

1.5.2.2 Differential equation

We present the characterization of the matching priors via Rao’s score statistic below.

Theorem 1.5.3 A prior π(.) ensures frequentist validity with margin of error o(n−1),

of posterior credible regions for θ given by the inversion of Rao’s score statistic if and only

if it satisfies the partial differential equations

∑j

∑r

[2Dr{Ijr πj(θ)} − DjDr{π(θ)Ijr}] = 0, (1.5.2.1)

and∑

j,r,s,u

Du{π(θ)LjrsIjrIsu} = 0 (1.5.2.2)

As noted in Rao and Mukerjee (1995), the classes of matching priors based on Rao’s

score and Wald’s statistic are identical. Lee (1989) also studied the matching problem

associated with Wald’s statistic.

Equations (1.5.2.1) and (1.5.2.2) add up to the matching condition (1.5.1.1) for the

LR statistic. Therefore, any matching prior arising from Rao’s score or Wald’s statistic

also enjoys the same property for the LR statistic. The converse is however, not true in

general.

33

1.5.2.3 Special case: p=1

Using the regularity condition (1.2.2.3), the matching conditions (1.5.2.1) and

(1.5.2.2) reduce to

I−1(dπ(θ)

dθ

)− π(θ)(L1,11 + L111)I−2 = constant, (1.5.2.3)

and

π(θ)L111I−2 = constant, (1.5.2.4)

respectively. By (1.2.2.3), these conditions are met by Jeffreys’ prior if and only if

I−3/2L1,11 = constant; I−3/2L111 = constant. (1.5.2.5)

(1.5.2.5) entails the corresponding condition for the LR statistic. For the one parameter

location and scale models, (1.2.2.3) again holds. Thus in these situations, Jeffreys’ prior

enjoys the matching property for both the score statistic and the Wald statistic. On

the other hand, in Example (1.4.1), concerning a bivariate normal model with unknown

correlation cooefficient, not only (1.5.2.5) fails to hold but also no solution to the matching

conditions (1.5.2.3) and (1.5.2.4) is available.

Table 1-1. Simulated Tail Probabilities of Posterior Distributions in Fisher-Von MisesP(µ,λ)(0.05; µ)

n πJ πµ(2)(µ, λ)

5 0.0605 0.051810 0.0564 0.0538

Table 1-2. Simulated Tail Probabilities of Posterior Distributions in Fisher-Von MisesP(µ,λ)(0.95; µ)

n πJ πµ(2)(µ, λ)

5 0.9489 0.957010 0.9421 0.9475

34

CHAPTER 2MATCHING PRIORS FOR SOME BIVARIATE NORMAL PARAMETERS

In the last chapter we have seen that matching is accomplished through either (a)

posterior quantiles, (b) distribution functions, (c) highest posterior density (HPD) regions,

or (d) inversion of certain test statistics. However, priors based on (a),(b),(c), or (d) need

not always be identical. Specifically, it may so happen that there does not exist any prior

satisfying all the four criteria.

In this chapter, we consider the bivariate normal distribution where the parameters

of interest are either the (i) regression coefficient, (ii) the generalized variance, i.e. the

determinant of the variance-covariance matrix, and (iii) ratio of the conditional variance

of one variable given the other divided by the marginal variance of the other variable. We

have been able to find a prior which meets all the four matching criteria for every one of

these parameters.

2.1 The Orthogonal Reparameterization

Let (X1i, X2i), (i = 1, . . . , n) be independent and identically distributed random

variables having a bivariate normal distribution with means µ1 and µ2, variances σ12(> 0)

and σ22(> 0), and correlation coefficient ρ(|ρ| < 1). We use the transformation

β = ρσ2/σ1, θ = σ1σ2(1− ρ2)1/2

and η = σ2(1− ρ2)1/2

/σ1. (2.1.1)

With this reparameterization, the bivariate normal distribution can be rewritten as

f(X1, X2) = (2πθ)−1exp

[− 1

2

{(X2 − µ2 − β (X1 − µ1)

)2

θη+

η(X1 − µ1)2

θ

}](2.1.2)

It may be noted that β is the regression coefficient of X2 on X1, while θ2 = σ12σ2

2(1− ρ2)

is the determinant of the variance-covariance matrix. Also, η2 = V (X2|X1)/V (X1).

35

With the above reparameterization, and that E(X1 − µ1)2 = θ/η and E(X2 − µ2 −

β(X1 − µ1)2 = θη, the Fisher Information matrix reduces to

I(µ1, µ2, β, θ, η) =

A 0

0 Diag(η−2, θ−2, η−2)

, (2.1.3)

where

A =

β2

θη+ η

θ− β

θη

− βθη

1θη

.

This establishes immediately the mutual orthogonality of (µ1, µ2), β,θ and η in the

sense of Huzurbazar (1950) and Cox and Reid (1987). Such orthogonality is often referred

to as ”Fisher Orthogonality”.

The inverse of the information matrix is then

I−1(µ1, µ2, β, θ, η) =

A−1 0

0 Diag(η2, θ2, η2)

, (2.1.4)

where

A−1 =

1θ3η

βθ3η

βθ3η

β2

θ3η+ η

θ3

.

Since the parameters of interest are orthogonal to (µ1, µ2), and it is customary to use

uniform (R2) prior on (µ1, µ2). Hence, we shall consider only priors of the form

Π0(µ1, µ2, β, θ, η) ∝ π(β, θ, η), (2.1.5)

and find π such that the matching criteria given in (a)-(d) are all satisfied for β, θ and η

each individually. This we are going to explore in the next four sections.

Before ending this section we state a lemma which is repeatedly in the sequel.

Lemma 2.1 For the bivariate normal density given in (2.1.2),

E(∂logf/∂β)3 = 0, E[(∂logf/∂β)(∂2logf/∂β2)] = 0; (2.1.6)

36

E(∂3logf/∂β3) = 0, E(∂3logf/∂β2∂θ) = (θη2)−1

, E(∂3logf/∂β2∂η) = η−3; (2.1.7)

E(∂3logf/∂β∂θ2) = 0,E(∂3logf/∂β∂η2) = 0; (2.1.8)

E(∂logf/∂θ)3 = 2/θ3, E[(∂logf/∂θ)(∂2logf/∂θ2)] = −2/θ3; (2.1.9)

E(∂3logf/∂θ3) = 4/θ3, E(∂3logf/∂θ2∂η) = 0, E(∂3logf/∂θ∂η2) = (θη2)−1

; (2.1.10)

E(∂logf/∂η)3 = 0, E[(∂logf/∂η)(∂2logf/∂η2)] = −η−3,E(∂3logf/∂η3) = 3η−3. (2.1.11)

Proof The proofs are based on the independence of X2−µ2−β(X1−µ1) and X1−µ1

along with the fact that X2 − µ2 − β(X1 − µ1) ∼ N(0, θη) and X1 − µ1 ∼ N(0, θ/η). Then

E{(X1 − µ1)

2}

= σ21 = θ

ηand

E{(X2 − µ2 − β(X1 − µ1))

2}

= σ22 + β2σ2

1 − 2βρσ1σ2

= σ22 + ρ2σ2

2 − 2ρ2σ22 = σ2

2(1− ρ2) = θη.

We begin with (2.1.6).

E(∂logf

∂β

)3= E

((X1 − µ1)3(X2 − µ2 − β(X1 − µ1))

3

θ3η3

)

=E(X1 − µ1)

3E(X2 − µ2 − β(X1 − µ1))3

θ3η3, (by independence)

= 0, (odd central moments are zero)

and

E(∂logf

∂β

∂2logf

∂β2

)= −E

((X1 − µ1)3(X2 − µ2 − β(X1 − µ1))

θ2η2

)

=E(X1 − µ1)

3E(X2 − µ2 − β(X1 − µ1))

θ2η2, (by independence)

= 0, (odd central moments are zero).

Next, since ∂2logf∂β2 = − (X1−µ1)2

θη, it is free from β and hence E(∂3logf/∂β3) = 0 in (2.1.7).

Further

E(∂3logf

∂β2∂θ

)= E

{(X1 − µ1)2

ηθ2

}

=θ/η

ηθ2= (θη2)

−1,

37

and

E(∂3logf

∂β2∂η

)= E

{(X1 − µ1)2

η2θ

}

=θ/η

θη2= η−3.

To show (2.1.8) we note that

E(∂3logf

∂β∂θ2

)= 2 E

((X1 − µ1)(X2 − µ2 − β(X1 − µ1))

θ3η

)

=2 E(X1 − µ1)E(X2 − µ2 − β(X1 − µ1))

θ3η, (by independence)


and

E(∂3logf

∂β∂η2

)= 2 E

((X1 − µ1)(X2 − µ2 − β(X1 − µ1))

θη3

)

=2 E(X1 − µ1)E(X2 − µ2 − β(X1 − µ1))

θη3, (by independence)


To show that (2.1.9) holds, some tedious algebra needs to be done. To this end note that

(∂logf

∂θ

)3=

(− 1

θ+{X2 − µ2 − β(X1 − µ1)

2}2θ2η

+η(X1 − µ1)

2

2θ2

)3

=− 1

θ3+

1

8θ6

{{X2 − µ2 − β(X1 − µ1)}6

η3+ η3(X1 − µ1)

6

+3{X2 − µ2 − β(X − 1− µ1)}4(X1 − µ1)

2

η

+ 3η{X2 − µ2 − β(X1 − µ1)}2(X1 − µ1)4

}

+3

2θ4

({X2 − µ2 − β(X1 − µ1)}2

η+ η(X1 − µ1)

2)

− 3

4θ5

{{X2 − µ2 − β(X1 − µ1)}4

η2+ η2(X1 − µ1)

4

+ 2{X2 − µ2 − β(X1 − µ1)}2(X1 − µ1)2

}.

38

Now we take expectations and use the formulas for the higher order central moments of a

normal distribution. This results in

E(∂logf

∂θ

)3=− 1

θ3+

48θ3

8θ6+

6θ

2θ4− 24θ2

4θ5

=2

θ3.

Similarly

(∂logf

∂θ

)(∂2logf

∂θ2

)=

{− 1

θ+

1

2θ2

({X2 − µ2 − β(X1 − µ1)}2

η+ η(X1 − µ1)

2)}

×{

1

θ2− 1

θ3

({X2 − µ2 − β(X1 − µ1)}2

η+ η(X1 − µ1)

2)}

=− 1

θ3+

1

θ4

({X2 − µ2 − β(X1 − µ1)}2

η+ η(X1 − µ1)

2)

+1

2θ4

({X2 − µ2 − β(X1 − µ1)}2

η+ η(X1 − µ1)

2)

− 1

2θ5

({X2 − µ2 − β(X1 − µ1)}2

η+ η(X1 − µ1)

2)2

.

Taking expectations on both sides we obtain

E(∂logf

∂θ

∂2logf

∂θ2

)= − 1

θ3+

2θ

θ4+

θ

θ4− 8θ2

2θ5

= − 2

θ3

Next, (2.1.10) holds because

E(∂3logf

∂θ3

)= E

{− 2

θ3+

3

θ4

({X2 − µ2 − β(X1 − µ1)}2

η+ η(X1 − µ1)

2)}

= − 2

θ3+

6θ

θ4

= 4/θ3,

E(∂3logf

∂θ2∂η

)= E

{{X2 − µ2 − β(X1 − µ1)}2

θ3η2− (X1 − µ1)

2

θ3

}

=θη

θ3η2− θ

ηθ3

= 0,

39

and

E(∂3logf

∂θ∂η2

)= E

({X2 − µ2 − β(X1 − µ1)}2

θ2η3

)

=θη

θ2η3= (θη2)

−1.

Finally, to show (2.1.11), we note that

(∂logf

∂η

)3=

{{X2 − µ2 − β(X1 − µ1)}2

2θη2− (X1 − µ1)

2

2θ

}3

=

{{X2 − µ2 − β(X1 − µ1)}6

(2θη2)3− (X1 − µ1)

6

(2θ)3

− 3{X2 − µ2 − β(X1 − µ1)}4

(2θη2)2

(X1 − µ1)2

2θ

+3{X2 − µ2 − β(X1 − µ1)}2

2θη2

(X1 − µ1)4

(2θ)2

},

which on taking expectations gives

E(∂logf

∂η

)3=

15

8η3− 15

8η3− 9

8η3+

9

8η3= 0.

Also, because

(∂logf

∂η

∂2logf

∂η2

)=

({X2 − µ2 − β(X1 − µ1)}2

2θη2− (X1 − µ1)

2

2θ

)

× −{X2 − µ2 − β(X1 − µ1)}2

θη3,

E(∂logf

∂η

∂2logf

∂η2

)=E

{({X2 − µ2 − β(X1 − µ1)}2

2θη2− (X1 − µ1)

2

2θ

)

× −{X2 − µ2 − β(X1 − µ1)}2

θη3

}

=− E{X2 − µ2 − β(X1 − µ1)}4

2θ2η5

+E{X2 − µ2 − β(X1 − µ1)}2E(X1 − µ1)

2

2θ2η3

=− 3(θη)2

2θ2η5+

θ2

2θ2η3= − 1

2η3

=− (η3)−1,

40

and

E(∂3logf

∂η3

)=E

{3{X2 − µ2 − β(X − 1− µ1)}2

θη4

}

=3θη

θη4= 3η−3.

2.2 Quantile Matching Priors

Here one is interested in the approximate frequentist validity of the posterior quantiles

of a one-dimensional interest parameter. When β is the parameter of interest, from

(1.2.3.4) it follows that the class of first order probability matching priors for β is given by

π(.) ∝ η−1g0(θ, η), (2.2.1)

where g0 is an arbitrary smooth function of (θ, η). In order that such a prior satisfies the

second order matching property, we need to find g0 by solving ( see (1.2.3.5))

∂

∂θ

{g0(θ, η) η θ2 E

(∂3logf

∂β2∂θ

)}+

∂

∂η

{g0(θ, η) η η2 E

(∂3logf

∂β2∂η

)}

+1

6g0(θ, η)

∂

∂β

{η3E

(∂logf

∂β

)3}

= 0.

(2.2.2)

From (2.1.6) and (2.1.7) in Lemma 2.1, (2.2.2) reduces to

∂

∂θ[g0(θ, η)(θ/η)] +

∂

∂ηg0(θ, η) = 0, (2.2.3)

and a solution to (2.2.3) is provided by g0(θ, η) ∝ θ−1. Thus the prior π(β, θ, η) =

θ−1η−1 satisfies the second order matching property. Next we proceed towards finding a

second order matching prior for θ. First, from (1.2.3.4), we obtain the class of first order

matching priors for θ as given by

π(β, θ, η) = θ−1g1(β, η). (2.2.4)

41

In order to find a second order probability matching prior for θ, we now need to solve

∂

∂β

{g1(β, η) θ η2 E[(X1 − µ1)(X2 − µ2 − β(X1 − µ1))]

θ3η

}

+∂

∂η

{g1(β, η) θ η2E

{3(X2 − µ2 − β(X1 − µ1))

2

θ3η2− (X1 − µ1)

2

θ3

}}

+1

6g1(β, η)

∂

∂θ

{θ3 E(

∂logf

∂θ)3

}= 0.

(2.2.5)

Since E[(X1 − µ1)(X2 − µ2 − β(X1 − µ1))] = 0, E[(X2 − µ2 − β(X1 − µ1))2] = θη

and E[(X1 − µ1)2] = θ/η, from (2.1.9) of Lemma 2.1 and (2.2.5), one needs to solve

2 ∂∂η

[g1(β, η)(η/θ)] = 0. Any g1(β, η) ∝ η−1g∗(β) provides the solution. In particular, taking

g∗ = 1, π(β, θ, η) ∝ (θη)−1 is a second order matching prior for θ.

Finally, when η is the parameter of interest, from (1.2.3.4) , once again, the class of

first order matching priors is given by

π(β, θ, η) = η−1g2(β, θ). (2.2.6)

In order to find a second order matching prior for η, we need to solve

∂

∂θ

{g2(β, θ)η−1η2θ2E(

∂3logf

∂θ∂η2)

}− ∂2

∂η2

{g2(β, θ)η−1η2

}

+1

3

∂

∂η

{g2(β, θ)η−1η4E(

∂3logf

∂η3)

}= 0.

(2.2.7)

From (2.1.10) and (2.1.11) of Lemma 2.1, (2.2.7) reduces to

∂

∂θ

{g2(β, θ)η−1η2θ2 1

θη2

}− ∂2

∂η2

{g2(β, θ)η−1η2

}+

1

3

∂

∂η

{g2(β, θ)η−1η4 1

η3

}= 0. (2.2.8)

Hence, a solution to (2.2.8) is provided by g2(β, θ) ∝ θ−1. So the prior π(β, θ, η) = θ−1η−1

satisfies the second order matching property in this case too. Thus a second order quantile

matching prior which works for every β, θ and η is given by π(µ1, µ2, β, θ, η) ∝ (θη)−1.

Back to the original parameterization, namely, (µ1, µ2, σ1, σ2, ρ), this reduces to the prior

π(σ1, σ2, ρ) = σ1−2(1− ρ2)

−1. (2.2.9)

42

This prior has been identified by Berger and Sun (2007) as the right-Haar prior and

one-at-a-time reference prior, and indeed provides exact quantile matching rather than

just asymptotic matching for a variety of parameters of interest including the ones

considered here. Moreover, as shown by these authors, when β is the parameter of

interest, any prior of the form σ1−(3−a)(1− ρ2)

−1, a > 0 for (µ1, µ2, σ1, σ2, ρ) achieves exact

matching, while when θ is the parameter of interest, both the priors σ1−2(1− ρ2)

−1and

σ1−1σ2

−1(1− ρ2)−3/2

achieve exact matching.

2.3 Matching Via Distribution Functions

In this section, we target priors π which achieve matching via distribution functions

of some standardized variables. When β is the parameter of interest, distribution function

matching priors are obtained by solving the differential equations

∂

∂β

(η2

{θ2E

(∂3logf

∂β2∂θ

)+ η2E

(∂3logf

∂β2∂η

)}π(β, θ, η)

)= 0; (2.3.1)

and

∂

∂β

(η2

{E

(∂3logf

∂β∂θ2

)θ2 + E

(∂3logf

∂β∂η2

)η2

})= 0. (2.3.2)

Using (2.1.7) and (2.1.8) of Lemma 2.1, (2.3.1) simplifies to ∂∂β

[(θ + η)π(β, θ, η)] = 0

which holds trivially for any prior π(β, θ, η) which does not depend on β, including the

prior π(β, θ, η) ∝ (θη)−1, the one found in the previous subsection. Again, with β as the

parameter of interest, for any prior π(β, θ, η) which does not depend on β, we solve

∂

∂θ{(θη2)

−1η2θ2π(β, θ, η)}+

∂

∂η{η−3η2η2π(β, θ, η)}

=∂

∂θ

{θ π(β, θ, η)

}+

∂

∂η

{η π(β, θ, η)

}= 0.

Once again π(β, θ, η) ∝ (θη)−1 will satisfy (2.3.1). For this prior (2.3.2) reduces to

η2θ2 ∂

∂βE

(∂3logf

∂β∂θ2

)+ η4 ∂

∂βE

(∂3logf

∂β∂η2

)= 0.

43

From (2.1.8) of Lemma 2.1, E(

∂3logf∂β∂θ2

)= E

(∂3logf∂β∂η2

)= 0. Hence matching via distributions

is achieved with any prior of the form π(µ1, µ2, β, θ, η) ∝ h(θ, η), and in particular

h(θ, η) ∝ (θη)−1.

Next when θ is the parameter of interest, for finding a matching prior, one needs to

solve first

∂2

∂θ2

(θ2π

)− 2

∂

∂θ

(θ2∂π

∂θ

)− ∂

∂θ

(θ4E

(∂3logf

∂θ3

)π

)− ∂

∂θ

(θ4E

(∂3logf

∂θ3

)π

)= 0, (2.3.3)

which simplifies to

∂2

∂θ2

(θ2π

)− 2

∂

∂θ

(θ2∂π

∂θ

)− 12

∂

∂θ

(θπ

)= 0.

Hence any prior π(.) ∝ θ−1g(β, η) will satisfy this equation. Such a prior also satisfies

∂

∂θ

(θ4E

(∂3logf

∂θ3

)π)

= 0. (2.3.4)

Finally when η is the parameter of interest, for finding a matching prior, one needs to

solve

∂2

∂η2

{η2π

}− 2

∂

∂η

{η2∂π

∂η

}− ∂

∂θ

{E(

∂3logf

∂θ∂η2)η2θ2π

}

− ∂

∂η

{E(

∂3logf

∂η∂β2)η2η2π

}= 0,

(2.3.5)

and

∂

∂η

(η4E

(∂3logf

∂η3

)π)

= 0. (2.3.6)

Again, using (2.1.7), (2.1.10) and (2.1.11) of Lemma 2.1, the prior π(β, θ, η) = (θη)−1

satisfies the second order matching property.

2.4 Highest Posterior Density (HPD) Matching Priors

We now turn attention to HPD matching priors for each one of the parameters β, θ

and η. While quantile matching property is quite desirable for construction of one sided

credible intervals, the HPD matching or matching via inversion of test statistics seems

more appropriate for the construction of two-sided credible intervals. We first consider

44

HPD region for β. In view of the orthogonality result derived in Section 2.1, such a prior

π0(µ1, µ2, β, θ, η) ∝ π(β, θ, η) must satisfy (see (1.4.3.1) of Chapter 1)

∂

∂θ

(η2θ2E

(∂3logf

∂β2∂θ

)π

)+

∂

∂η

(η4E

(∂3logf

∂β2∂η

)π

)+

∂

∂β

(η4E

(∂3logf

∂β3

)π

)

− ∂2

∂β2

(η2π

)= 0.

(2.4.1)

Using (2.1.7) of Lemma 2.1, (2.4.1) reduces to

∂

∂θ(θπ) +

∂

∂η(ηπ)− ∂2

∂β2(η2π) = 0. (2.4.2)

Clearly the prior π(β, θ, η) ∝ (θη)−1 satisfies (2.4.2).

Next consider θ as the parameter of interest. Now one needs to solve

∂

∂β

(η2θ2E

(∂3logf

∂θ2∂β

)π

)+

∂

∂η

(θ2η2E

(∂3logf

∂θ2∂η

)π

)+

∂

∂θ

(θ4E

(∂3logf

∂θ3

)π

)

− ∂2

∂θ2

(θ2π

)= 0.

(2.4.3)

Again from (2.1.8) and (2.1.10) of Lemma 2.1, (2.4.3) simplifies to

− 2∂

∂θ(θπ)− ∂2

∂θ2(θπ) = 0, (2.4.4)

which is satisfied by the prior π(β, θ, η) ∝ (θη)−1.

Finally, when η is the parameter of interest, we need to solve

∂

∂β

(η4E

(∂3logf

∂η2∂β

)π

)+

∂

∂θ

(η2θ2E

(∂3logf

∂η2∂θ

)π

)+

∂

∂η

(η4E

(∂3logf

∂η3

)π

)

− ∂2

∂η2

(η2π

)= 0.

(2.4.5)

From (2.1.8), (2.1.10) and (2.1.11) of Lemma 2.1, (2.4.5) reduces to

∂

∂θ(θπ) +

∂

∂η(ηπ)− ∂2

∂η2(η2π) = 0. (2.4.6)

Again π(β, θ, η) ∝ (θη)−1 will do.

45

2.5 Matching Priors Via Inversion of Test Statistics

One traditional way to derive frequentist confidence intervals is inversion of certain

test statistics. The most popular such test is the likelihood ratio test. When β is the

parameter of interest, LR matching prior π for β is obtained by solving the differential

equation

∂

∂θ

(η2θ2(θη2)

−1π

)+

∂

∂η

(η4η−3π

)

+∂

∂β

(η2

{∂π

∂β− π

{η2E

((∂logf

∂β

)(∂2logf

∂β2

))− θ2E(∂3logf

∂β∂θ2

)− η2E(∂3logf

∂β∂η2

)}})= 0.

(2.5.1)


∂

∂θ(θπ) +

∂

∂η(ηπ) + η2 ∂

∂β(∂π

∂β) = 0,

i.e.

∂

∂θ(θπ) +

∂

∂η(ηπ) + η2 ∂2π

∂β2= 0.

Again π ∝ (θη)−1 provides a solution.

Next if θ is the parameter of interest, LR matching prior π for θ is obtained by

solving the differential equation

∂

∂β

{η2θ2.0.π

}+

∂

∂η

{η2θ2.0.π

}

+∂

∂θ

(θ2

{∂π

∂θ− π

{θ2E

((∂logf

∂θ

)(∂2logf

∂θ2

))− η2(E

(∂3logf

∂β2∂θ

)+ E

(∂3logf

∂η2∂θ

))}})= 0.

(2.5.2)

Again from (2.1.7), (2.1.9) and (2.1.10) in Lemma 2.1, (2.5.2) reduces to

∂

∂θ[θ2{∂π

∂θ+

2

θπ + π

2

θ}] = 0,

46

or, upon simplifying,

∂

∂θ[θ2∂π

∂θ+ 4θπ] = 0

which holds for π ∝ (θη)−1.

Finally, when η is the parameter of interest, the LR matching prior is obtained by

∂

∂β

{π{η4η−3.0}

}+

∂

∂θ

{η2θ2 1

θη2π

}

+∂

∂η

(η2

{∂π

∂η− π

{η2E

((∂logf

∂η

)(∂2logf

∂η2

))− η2 θ/η

θη2− 0

}})= 0.

(2.5.3)

Once again, using (2.1.11) in Lemma 2.1, (2.5.3) reduces to

∂

∂θ(θπ) +

∂

∂η

[η2

(∂π

∂η+

2

ηπ)]

= 0,

and the prior π ∝ (θη)−1 provides a solution.

2.6 Propriety of Posteriors and Simulation Study

The prior π(µ1, µ2, β, θ, η) ∝ (θη)−1 is improper. In this section, we prove the

propriety of the posterior under this prior. In addition, we find the marginal posteriors

for β, θ and η, and discuss methods for finding the HPD intervals for each one of these

parameters.

First we derive the posterior pdf of β. It turns out to be a proper t-density. This

immediately implies the propriety of the joint posterior also, because otherwise the

marginal posterior of β cannot be proper.

To this end, first writing Xi = (X1i, X2i)T , i = 1, . . . , n, the joint posterior is given by

π(µ1, µ2, β, θ, η|X1,X2) ∝

θ−nexp

[− 1

2

n∑i=1

{(X2i − µ2 − β (X1i − µ1)

)2

θη+

η(X1i − µ1)2

θ

}](θη)−1.

(2.6.1)

From the identities

n∑1

(X2i−µ2−β(X1i−µ1)

)2

=n∑

i=1

{(X2i− X2)−β(X1i− X1)2}+n(X2−µ2−β(X1−µ1))

2

47

andn∑1

(X1i − µ1)2 =

n∑1

(X1i − X1)2 + n(X1 − µ1)

2,

first integrating out µ2, and then µ1, one gets from (2.6.1) after simplification

π(β, θ, η|X1,X2) ∝ θ−nη−1exp

[− 1

2θ

n∑i=1

{(X2i − X2 − β

(X1i − X1

) )2

η+ η

(X1i − X1

)2}]

.

(2.6.2)

Next integrating out θ in (2.6.2) and writing Sjk =∑n

i=1(Xji − Xj)(Xki − Xk), j, k = 1, 2,

π(β, η|X1,X2) ∝ ηn−2

(η2S11 + S22 + β2S11 − 2βS12

)−(n−1)

. (2.6.3)

From (2.6.3), the marginal posterior of β is given by

π(β|X1,X2) ∝∫ ∞

o

ηn−2

(η2 +

S22 + β2S11 − 2βS12

S11

]

)−(n−1)

dη.

Putting η = z[S22 + β2S11 − 2βS12/S11]−1/2 in the above integral, one gets after

simplification,

π(β|X1,X2) ∝(

1 +(β − S12/S11)

2

S22.1

)−n−12

, (2.6.4)

where S22.1 = S22 − S122/S11. This posterior is a t-distribution with location parameter

S12/S11, scale parameter {S22.1/(n− 2)}1/2 and degrees of freedom n-2.

Next we find the posterior of θ. Integrating out β in (2.6.2), one gets

π(θ, η|X1,X2) ∝ θ−n+ 12 η−

12 exp

(− 1

2

(S22.1

θη+

ηS11

θ

)). (2.6.5)

Now the posterior of θ is given by

π(θ|X1,X2) ∝ θ−n+ 12

∫ ∞

0

η−12 exp

(− 1

2θ(S22.1

η+ ηS11)

)dη. (2.6.6)

Putting η = z−1, (2.6.6) is rewritten as

π(θ|X1,X2) ∝ θ−n+ 12

∫ ∞

0

z−32 exp

(− S22.1

2θz

(z − S11

1/2

S22.11/2

)2 − S111/2S22.1

1/2

θ

)dz. (2.6.7)

48

Recalling that an Inverse Gaussian random variable, say U, with mean µ and scale

parameter λ has pdf given by

fµ,λ(u) =λ1/2

(2πu)3/2exp

(− λ(z − µ)2

2µ2z

), z > 0, µ > 0, λ > 0,

it follows from (2.6.7) that

π(θ|X1,X2) ∝ θ−(n−1)exp(−S111/2S22.1

1/2/θ)I[θ>0], (2.6.8)

so that θ−1 has a Gamma distribution with shape parameter n-2 and scale parameter

(S11S22.1)− 1

2 .

Finally, integrating θ in (2.6.5), the marginal posterior of η is given by

π(η|X1,X2) ∝η−1/2(S22.1

η+ ηS11)

−(n−3/2)

∝ ηn−2(η2 +S22.1

S11

)−(n−3/2).

(2.6.9)

The construction of HPD credible intervals is fairly simple. The posterior of β being a

univariate-t (thus symmetric and unimodal), from (2.6.4), the 100(1 − α) % HPD credible

interval for β is given by S12/S11 ± {S22.1/(n− 2)}1/2tn−2;α/2, where tn−2;α/2 denotes the

upper 100α2% point of a Student’s t-distribution with n-2 degrees of freedom.

Next observing that the posterior of θ is log-concave, the 100(1 − α) % region for θ is

given by [θ1, θ2], where θ1 and θ2 satisfy

θ1−(n−1)exp(−S11

1/2S22.11/2/θ1) = θ2

−(n−1)exp(−S111/2S22.1

1/2/θ2) (2.6.10)

and ∫ θ2

θ1

θ−(n−1)exp(−S111/2S22.1

1/2/θ)(S11S22.1)n−2

2 dθ = 1− α. (2.6.11)

If w = θ−1, then the posterior pdf of w is given by

π(w|X1,X2) ∝ wn−3exp(−wS1/211 S

1/222.1).

49

Noting the log-concavity of this pdf as well, the HPD region [w1, w2] for w is obtained by

solving

w1n−3exp(−w1S11

1/2S22.11/2) = w2

n−3exp(−w2S111/2S22.1

1/2) (2.6.12)

and ∫ w2

w1

wn−3

Γ(n− 2)exp(−wS11

1/2S22.11/2)(S11S22.1)

n−22 dw = 1− α. (2.6.13)

Clearly the solution [w1, w2] of (2.6.12) and (2.6.13) is different from the solution [θ−12 , θ−1

1 ]

of (2.6.10) and (2.6.11).

Finally observing that the posterior of η in (2.6.9) is log-concave, the 100(1 − α) %

HPD interval [η1, η2] for η is obtained by solving

ηn−21 (η2

1 +S22.1

S11

)−(n−3/2) = ηn−22 (η2

2 +S22.1

S11

)−(n−3/2),

where

c

∫ η2

η1

ηn−2(η2 +S22.1

S11

)−(n−3/2) dη = 1− α,

where c is the normalizing constant.

Now we evaluate the frequentist coverage probability by investigating the HPD

credible interval of the marginal posterior densities of β, θ and η under our probability

matching prior for several ρ and n. That is to say, the frequentist coverage of a 100(1 −α)% HPD interval should be close to 1 − α. This is done numerically. Table 2-1 gives

numerical values of the frequentist coverage probabilites of 95% HPD intervals for β, θ and

η.

The computation of these numerical values is based on simulation. In particular, for

fixed (µ1, µ2, σ21, σ

22, ρ) and n, we take 5, 000 independent random samples of (X1,X2)

from the bivariate normal model. In our simulation study, we take µ1 = µ2 = 0 without

loss of generality. Under the prior π, the frequentist coverage probability can be estimated

by the relative frequency of HPD intervals containing true parameter value. An inspection

of Table 2-1 reveals that the agreement between the frequentist and posterior coverage

50

probabilities of HPD intervals is quite good for the probability matching prior even if n is

small.

Table 2-1. Frequentist Coverage Probabilities of 95% HPD Intervals for β, θ and η whenσ2

1 = 1 and σ22 = 1

ρ n β θ η0.25 4 0.952 0.947 0.949

8 0.946 0.955 0.95012 0.954 0.952 0.94816 0.952 0.954 0.95020 0.945 0.948 0.950

0.50 4 0.950 0.952 0.9498 0.944 0.952 0.94812 0.954 0.953 0.94416 0.946 0.950 0.94920 0.952 0.948 0.949

0.75 4 0.955 0.952 0.9538 0.953 0.948 0.94912 0.950 0.946 0.94716 0.948 0.946 0.95120 0.956 0.946 0.951

51

CHAPTER 3THE BIVARIATE NORMAL CORRELATION COEFFICIENT

One of the classic problems in statistics is inference for the correlation coefficient,

ρ, in a bivariate normal distribution. Beginning with Fisher’s hyperbolic tangent

transformation, there have been many proposals, both frequentist and Bayesian, which

address this problem. Added to this is the fiducial approach as found in Fisher (1930,

1956) and Pratt (1963).

Bayesian inference for ρ began in the early sixties with the work of Brillinger (1962)

and Geiser and Cornfield (1963). The main objective of these authors was to find whether

the fiducial distribution of ρ could be identified as a Bayesian posterior under possibly

some default or objective prior, and the conclusion was that this was most likely not

possible.

After a long fallow period, interest in this problem revived with the recent interesting

work of Berger and Sun (2006). These authors considered various parametric functions

arising from the bivariate normal distribution, and derived many objective priors which

satisfy the quantile matching property. In the process, they found a prior which achieves

this goal. In addition, they showed that the resulting posterior matched the fiducial

distribution of ρ as proposed by Brillinger (1962) and Geiser and Cornfield (1963).

In this chapter, as before, we first construct an orthogonal parameterization with ρ

as the parameter of interest and then find a prior, if any, which meets all the matching

criterion, at least asymptotically. In addition, we have considered several likelihood-based

methods as well for similar inferential purposes based on certain modifications of the

profile likelihood, namely conditional profile likelihood, adjusted profile likelihood and

integrated likelihood.

3.1 The Orthogonal Parameterization

Let (X1i, X2i), (i = 1, . . . , n) be independent and identically distributed random

variables having a bivariate normal distribution with means µ1 and µ2, variances σ12(> 0)

52

and σ22(> 0), and correlation coefficient ρ (|ρ| < 1). Using the transformation

θ1 = σ1/σ2, θ2 = σ1σ2(1− ρ2)1/2

and ψ = ρ, (3.1.1)

the bivariate normal pdf can be rewritten as

f(X1, X2 | µ1, µ2, θ1, θ2, ψ) ∝1

θ2

exp

{− 1

2(1− ψ2)1/2θ2

{(X1 − µ1)

2

θ1

+ θ1(X2 − µ2)2 − 2ψ(X1 − µ1)(X2 − µ2)

}}.

(3.1.2)

With this reparameterization, the Fisher Information matrix reduces to

I(µ1, µ2, θ1, θ2, ψ) =

A 0

0 D

, (3.1.3)

where

A =

1θ1θ2(1−ψ2)1/2 − ψ

θ2(1−ψ2)1/2

− ψθ2(1−ψ2)1/2

θ1

θ2(1−ψ2)1/2

and

D = Diag (1

θ21(1− ψ2)

,1

θ22

,1

(1− ψ2)2).

This establishes immediately the mutual orthogonality of (µ1, µ2), θ1,θ2 and ψ. The

inverse of the information matrix is simply then

I−1(µ1, µ2, θ1, θ2, ψ) =

A−1 0

0 D−1

, (3.1.4)

where

A−1 =

θ1θ2

(1−ψ2)1/2ψθ2

(1−ψ2)1/2

ψθ2

(1−ψ2)1/2θ2

θ1(1−ψ2)1/2

(3.1.5)

and

D−1 = Diag ( θ21(1− ψ2), θ2

2, (1− ψ2)2). (3.1.6)

53

For subsequent sections, we need also a few other results which are collected in the

following lemma.


E( ∂3logf

∂ψ2∂θ1

)= 0, E

( ∂3logf

∂ψ2∂θ2

)=

1

θ2(1− ψ2)2; (3.1.7)

E(∂3logf

∂ψ3

)= − 6ψ

(1− ψ2)3; (3.1.8)

E

((∂logf

∂ψ

)(∂2logf

∂ψ2

))=

2ψ

(1− ψ2)3; (3.1.9)

E(∂3logf

∂ψ∂θ21

)= − ψ

θ21(1− ψ2)2

,E(∂3logf

∂ψ∂θ22

)= 0; (3.1.10)

Proof To prove the Lemma note that E(X1 − µ1)2 = θ1θ2(1− ψ2)−1/2, E(X2 − µ2)

2 =

θ−11 θ2(1− ψ2)−1/2, and E

{(X1 − µ1)(X2 − µ2)

}= ψθ2(1− ψ2)−1/2. We begin with (3.1.7).

E( ∂3logf

∂ψ2∂θ1

)=− 1

2θ2

{1 + 2ψ2

(1− ψ2)5/2

}E

(− (X1 − µ1)

2

θ21

+ (X2 − µ2)2

)

=− 1

2θ2

{1 + 2ψ2

(1− ψ2)5/2

}(− θ2

θ1(1− ψ2)1/2+

θ2

θ1(1− ψ2)1/2

)

=0,

and

E( ∂3logf

∂ψ2∂θ2

)=E

(− 2ψ(X1 − µ1)(X2 − µ2)

θ22(1− ψ2)3/2

+1

2θ22

{1 + 2ψ2

(1− ψ2)5/2

}{(X1 − µ1)

2

θ1

+ θ1(X2 − µ2)2 − 2ψ(X1 − µ1)(X2 − µ2)

})

=− 2ψ

θ22(1− ψ2)3/2

ψθ2

(1− ψ2)1/2

+1

2θ22

1 + 2ψ2

(1− ψ2)5/2

(θ2

(1− ψ2)1/2+

θ2

(1− ψ2)1/2− 2ψ2θ2

(1− ψ2)1/2

)

=− 2ψ2

θ2(1− ψ2)2+

1 + 2ψ2

θ2(1− ψ2)2

=1

θ2(1− ψ2)2.

54

Next, (3.1.8) holds because

E(∂3logf

∂ψ3

)=E

(2(X1 − µ1)(X2 − µ2)

θ2

1 + 2ψ2

(1− ψ2)5/2+

(1 + 2ψ2)(X1 − µ1)(X2 − µ2)

θ2(1− ψ2)5/2

− 1

2θ2

{9ψ + 6ψ3

(1− ψ2)7/2

}{(X1 − µ1)

2

θ1

+ θ1(X2 − µ2)2 − 2ψ(X1 − µ1)(X2 − µ2)

})

= +3(1 + 2ψ2)

θ2(1− ψ2)5/2

ψθ2

(1− ψ2)1/2

− 1

2θ2

9ψ + 6ψ3

(1− ψ2)7/2

(θ2

(1− ψ2)1/2+

θ2

(1− ψ2)1/2− 2ψ2θ2

(1− ψ2)1/2

)

=3ψ(1 + 2ψ2)

(1− ψ2)3− 3ψ(3 + 2ψ2)

(1− ψ2)3

=− 6ψ

(1− ψ2)3.

To show that (3.1.9) holds, note that, using the Bartlett Identity we get

E

((∂logf

∂ψ

)(∂2logf

∂ψ2

))=−

(d

dψIψψ + E

(∂3logf

∂ψ3

))

=−(

d

dψ

1

(1− ψ2)2− 6ψ

(1− ψ2)3

)

=2ψ

(1− ψ2)3

and finally, in order to show that (3.1.10) holds, we have

E(∂3logf

∂ψ∂θ21

)=E

(− ψ(X1 − µ1)2

(1− ψ2)3/2θ2θ31

)

=− ψ

θ21(1− ψ2)2

,

and

E(∂3logf

∂ψ∂θ22

)=E

(+

2(X1 − µ1)(X2 − µ2)

θ32(1− ψ2)1/2

− 1

θ32

{ψ

(1− ψ2)3/2

}{(X1 − µ1)

2

θ1

+ θ1(X2 − µ2)2 − 2ψ(X1 − µ1)(X2 − µ2)

})

=2ψθ2

θ32(1− ψ2)

− ψ

θ32(1− ψ2)3/2

(θ2

(1− ψ2)1/2+

θ2

(1− ψ2)1/2− 2ψ2θ2

(1− ψ2)1/2

)

= 0.

We derive the matching priors in the next few sections.

55


Here, as before, one is interested in the approximate frequentist validity of the

posterior quantiles of a one-dimensional interest parameter. Due to orthogonality of ψ

with (µ1, µ2, θ1, θ2), from (1.2.3.4), the class of first order matching priors is characterized

by

π(µ1, µ2, θ1, θ2, ψ) ∝ (1− ψ2)−1g0(µ1, µ2, θ1, θ2). (3.2.1)

As is often customary to assign a uniform prior to (µ1, µ2) on R2, we will consider only the

subclass of priors where g0(µ1, µ2, θ1, θ2) = g(θ1, θ2).

A prior of the form π ∝ (1 − ψ2)−1g(θ1, θ2) satisfies the second-order probability

matching property if and only if (see (1.2.3.5)) g satisfies the relation

∂

∂θ1

(g(1− ψ2)

{Iθ1θ1E

( ∂3logf

∂ψ2∂θ1

)+ Iθ2θ2E

( ∂3logf

∂ψ2∂θ2

)})

+g

3

∂

∂ψ

{(1− ψ2)3E

(∂3logf

∂ψ3

)}− g∂2

∂ψ2

{(1− ψ2)

}= 0.

(3.2.2)

Now by (3.1.7) and (3.1.8) from Lemma 3.1 and (3.1.6), (4.2.2) simplifies to (1 −ψ2)−1 ∂

∂θ2(gθ2) − 2g ∂

∂ψψ − g ∂2

∂ψ2 (1 − ψ2) = 0, which simplifies to ∂∂θ2

(gθ2) = 0 and

a solution is provided by g(θ1, θ2) ∝ h(θ1)θ−12 . Thus every prior π(µ1, µ2, θ1, θ2, ψ) ∝

h(θ1)θ−12 (1 − ψ2)−1 is a second order probability matching prior for ψ for any arbitrary

smooth function h of θ1. In particular if we let h(θ1) = θ−11 , then from Theorem 1 of

Datta and Ghosh, M. (1995), the one-at-a-time reference or reverse reference prior for ψ

is given by θ−11 θ−1

2 (1 − ψ2)−1. This prior was first proposed in Lindley (1965), and was

subsequently shown to be a one-at-a-time reference prior by Bayarri (1981). Due to the

invariance property of such a prior, back to the original parameterization, a second order

matching prior for ρ is π(µ1, µ2, σ1, σ2, ρ) ∝ σ−11 σ−1

2 (1− ρ2)−1.

The first order quantile matching prior π ∝ (1 − ψ2)−1g(θ1, θ2) is also first order

matching via distribution functions. It follows from (1.3.2.1) and (1.3.2.2) of Chapter 1

that in order that this prior is also a second order distribution function matching prior, it

56

needs to satisfy

∂2

∂ψ2{Iψψπ} − 2

∂

∂ψ{Iψψ ∂

∂ψπ} − ∂

∂θ2

{E( ∂3logf

∂ψ2∂θ2

)IψψIθ2θ2π}

− ∂

∂ψ{E(∂3logf

∂ψ∂θ21

)IψψIθ1θ1π} = 0

(3.2.3)

and

∂

∂ψ{E(∂3logf

∂ψ3

)(Iψψ)2π} = 0. (3.2.4)

It is easily verified that for the prior π ∝ (1 − ψ2)−1g(θ1, θ2), the left hand side of (3.2.4)

reduces to −6g. The above prior also fails to satisfy (3.2.3) for any g. Hence we do not

have a prior that satisfies the second order distribution function matching criteria.


We now turn attention to HPD matching priors for ρ. Due to orthogonality of ψ with

(µ1, µ2, θ1, θ2), (see (1.4.3.1) of Chapter 1) we need a prior π which satisfies the differential

equation

∂

∂θ1

{(1− ψ2)2

(Iθ1θ1E

( ∂3logf

∂ψ2∂θ1

))π

}+

∂

∂θ2

{(1− ψ2)2

(Iθ2θ2E

( ∂3logf

∂ψ2∂θ2

))π

}

+∂

∂ψ

{(1− ψ2)4E

(∂3logf

∂ψ3

)π

}− ∂2

∂ψ2

{(1− ψ2)2π

}= 0

(3.3.1)

Using (3.1.7) and (3.1.8) from Lemma 3.1 and (3.1.6), (3.3.1) reduces to

∂

∂θ2

{θ2π

}− 6

∂

∂ψ

{ψ(1− ψ2)π

}− ∂2

∂ψ2

{(1− ψ2)2π

}= 0. (3.3.2)

Consider the class of priors π(θ1, θ2, ψ) ∝ h(θ1)θa2(1 − ψ2)b. With this prior, (3.3.2) can be

written as

h(θ1) θa2 [(a + 1)(1− ψ2)b + 2(b− 1){(1− ψ2)b+1 − 2(b + 1)ψ2(1− ψ2)b}] = 0.

On further simplification, this reduces to

h(θ1) θa2 (1− ψ2)b [a + 1 + 2(b− 1)(1− ψ2)− 4(b2 − 1)ψ2] = 0. (3.3.3)

57

Since this needs to hold for all ψ ε (−1, 1) one gets

a + 1 = 4(b2 − 1) = −2(b− 1)

Hence the two possible solutions are a = −1, b = 1 and a = 4, b = −3/2. This results

in π ∝ h(θ1) θ−12 (1 − ψ2) and π ∝ h(θ1) θ4

2(1 − ψ2)−3/2 which are both HPD matching

for ψ. In particular for h(θ1) = θ−11 , back to the original parameterization, we obtain

π ∝ σ−11 σ−1

2 (1 − ρ2) and π ∝ σ41 σ4

2 (1 − ρ2) as HPD matching for ρ. In general, HPD

matching priors suffer from lack of invariance. However, if the same object of interest is

considered over the two parameterizations then they are invariant of the parameterization

adopted. This has been discussed in detail in Datta and Mukerjee (2004 , p74).


The most popular test obtained by inverting certain test statistics is the likelihood

ratio test. But tests based on Rao’s score statistic or the Wald statistic are also of

importance, and are first order equivalent (i.e. upto o(n−1/2)) to the likelihood ratio tests.

From (1.5.1.5), a likelihood ratio matching prior π is obtained by solving

p∑s=2

p∑u=2

∂

∂u

{π(θ)I−1

11 IsuE(∂3logf

∂θ21∂θs

)

}

+∂

∂θ1

(I−111

{∂π

∂θ1

− π(θ)(I−1

11 E((∂logf

∂θ1

)(∂2logf

∂θ21

))−p∑

s=2

p∑u=2

IsuE(∂3logf

∂θ1∂θu∂θs

)

})= 0.

(3.4.1)

Under the orthogonal parameterization obtained in (3.1.1), (3.4.1) can be rewritten as

∂

∂θ1

{(1− ψ2)2Iθ1θ1E

( ∂3logf

∂ψ2∂θ1

)π

}+

∂

∂θ2

{(1− ψ2)2 Iθ2θ2E

( ∂3logf

∂ψ2∂θ2

)π

}

+∂

∂ψ

{(1− ψ2)2

{∂π

∂ψ− π

((1− ψ2)2E

((∂logf

∂ψ)(

∂2logf

∂ψ2))

− Iθ1θ1E(∂3logf

∂ψ∂θ21

)− Iθ2θ2E(∂3logf

∂ψ∂θ22

))}}

= 0.

(3.4.2)

58

Again, using results (3.1.7), (3.1.9) and (3.1.10) from Lemma 3.1 and (3.1.6), (3.4.2)

reduces to

∂

∂θ2

{θ2π

}+

∂

∂ψ

{(1− ψ2)2

{∂π

∂ψ− π

( 3ψ

1− ψ2

)}}= 0. (3.4.3)

Consider again the class of priors π = h(θ1)θa2(1 − ψ2)b. Then (3.4.3) further reduces

to

a + 1− (2b + 3){1− ψ2 − 2(b + 1)ψ2

}= 0. (3.4.4)

In order that (3.4.4) holds for all ψ ε(−1, 1) a unique solution is obtained for a = −1 and

b = −3/2. Hence the unique prior within the considered class of priors that satisfies the

likelihood ratio matching property is given by π ∝ h(θ1)θ−12 (1 − ψ2)−3/2. Once again, if

we let h(θ1) = θ−11 , then back to the original parameterization, π ∝ σ−1

1 σ−12 (1 − ρ2)−3/2

satisfies the likelihood ratio matching property for ρ.

3.5 Propriety of the Posteriors

We now establish the propriety of the posteriors. We chose h(θ1) = θ−11 . Then a prior

of the form π(µ1, µ2, θ1, θ2, ψ) ∝ (1 − ψ2)aθ−11 θ−1

2 satisfies the various matching properties

discussed above for different values of a. Also, the joint posterior of µ1, µ2, θ1, θ2, ψ given

X is

π(µ1, µ2, θ1, θ2, ψ|X)

∝ θ−n2 exp

{− 1

2(1− ψ2)1/2θ2

n∑i=1

{(X1i − µ1)2

θ1

+ θ1(X2i − µ2)2 − 2ψ(X1i − µ1)(X2i − µ2)

}}

× (θ1θ2)−1(1− ψ2)a.

(3.5.1)

Next consider the transformation

ρ = ψ , σ21 =

θ1θ2

(1− ψ2)1/2and σ2

2 =θ2

θ1(1− ψ2)1/2.

59

The Jacobian is given by (1−ρ2)1/2

σ22

. The posterior using this transformation can be written

as

π(µ1, µ2, σ21, σ

22, ρ|X)

∝ exp

{− 1

2(1− ρ2)

n∑i=1

{(X1i − µ1)2

σ21

+(X2i − µ2)

2

σ22

− 2ρ(X1i − µ1)(X2i − µ2)

σ1σ2

}}

× (σ21σ

22)−n/2−1(1− ρ2)−n/2+a.

(3.5.2)

Now , integrating out µ1 and µ2, we obtain

π(σ21, σ

22, ρ|X) ∝ (σ2

1σ22)−n+1

2 (1− ρ2)−n−1

2+aexp

{− 1

2(1− ρ2)

{S11

σ21

+S22

σ22

− 2ρS12

σ1σ2

}}. (3.5.3)

Consider another transformation

z1 = σ21(1− ρ2) , z2 = σ2

2(1− ρ2) and z3 = ρ.

The Jacobian here is given by (1 − z23)−2.Then, using a series expansion, the posterior can

be written as

π(z1, z2, z3|X) ∝ (1− z23)

n−12

+a(z1z2)−n+1

2 exp

{− 1

2

{S11

z1

+S22

z2

− 2z3S12√z1z2

}}

= (1− z23)

n−12

+a(z1z2)−n+1

2 exp

{− 1

2

{S11

z1

+S22

z2

}}

×∞∑

r=0

(z3S12)r

zr/21 z

r/22 r!

.

The marginal posterior of z3 is obtained by integrating out z1 and z2. So

π(z3|X) ∝ (1− z23)

n−12

+a

∞∑r=0

∫ ∞

0

∫ ∞

0

(z1z2)−n+r+1

2Sr

12

r!zr3exp

{− 1

2{S11

z1

+S22

z2

}}dz1dz2

∝ (1− z23)

n−12

+a

∞∑r=0

Sr12z

r3

r!Sr/211 S

r/222

×∫ ∞

0

∫ ∞

0

(w1w2)n+r−1

2−1exp

{− 1

2{w1S11 + w2S22}

}dw1dw2

= (1− z23)

n−12

+a

∞∑r=0

zr3

Rr

r!Γ2(

n + r − 1

2).

(3.5.4)

60

Clearly, for r odd, the integral∫ 1

−1zr3 (1 − z2

3)n−1

2+a dz3 = 0. So in order to show the

propriety, we need to only show that

I =

∫ 1

−1

(1− z23)

n−12

+a{ ∞∑

r=0

z2r3

R2r

(2r)!Γ2(

n + 2r − 1

2)}dz3 < ∞.

To this end, we first note that

∫ 1

−1

z2r3 (1− z2

3)n−1

2+a dz3 = 2

∫ 1

0

z2r3 (1− z2

3)n−1

2+a dz3

=

∫ 1

0

ur−1/2 (1− u)n−1

2+a du

= Beta (r +1

2,n + 1

2+ a)

=Γ(r + 1

2)Γ(n+1

2+ a)

Γ(n+2r+22

+ a)for a > −n− 2

2.

Hence

I =∞∑

r=0

R2r

(2r)!

Γ(r + 12)Γ2(n+2r−1

2)

Γ(n+2r+22

+ a). (3.5.5)

By the Legendre duplication formula, (2r)! = Γ(2r + 1) = Γ(r + 12)Γ(r + 1)22r/π1/2. Hence

writing k (> 0) as a generic constant which does not depend on r,

I = k

∞∑r=0

R2r

4rr!

Γ2(r + n−12

)

Γ(r + n+2+2a2

)=

∞∑r=0

br (say).

Note that

br+1

br

=R2

4

(r + n−12

)2

(r + 1)(r + n+2+2a2

)→ R2

4< 1 as r →∞.

Hence, by the ratio rule of convergence,∑∞

r=0 br < ∞ which proves the propriety of the

posteriors.

3.6 Likelihood Based Inference

The objective of this section is to describe methods for likelihood-based inference

for the bivariate normal correlation coefficient. As a general rule, the basic approach for

inference in the presence of nuisance parameters is to replace the nuisance parameters

in the likelihood function by their maximum likelihood estimates and examine the

resulting profile likelihood as a function of the parameter of interest. This procedure

61

is known to give inconsistent or inefficient estimates for problems where the number of

nuisance parameters grows in direct proportion to the sample size. The conditional profile

likelihood (Cox and Reid, 1987) on the other hand is based on the conditional likelihood

given maximum likelihood estimates of the orthogonalized parameters. It corrects the

inconsistency of the profile likelihood in some problems and also makes “degrees of

freedom” adjustments for estimating the normal variance. An alternative simpler approach

is to adjust the profile log-likelihood so that the mean score is zero and the variance of the

score function equals its negative expected derivative matrix, so that the score function

is unbiased (Godambe, 1960) and information unbiased (Lindsay, 1982). This method,

known as the adjusted profile likelihood, was proposed by McCullagh and Tibshirani

(1990). Also, an integrated likelihood (Kalbfleisch and Sprott, 1970), which is defined as

the integral over the nuisance parameter space of the likelihood times the prior density,

can be used for inference. Here one must be willing to specify a joint prior distribution for

the nuisance parameters conditional on the parameters of interest.

We begin with the profile likelihood for (θ1, θ2, ψ) given by

Lp(θ1, θ2, ψ) =

exp

{− 1

2(1−ψ2)1/2θ2

{S2

1

θ1+ θ1S

22 − 2ψrS1S2

}}

(2πθ22)

(n/2), (3.6.1)

where S21 =

∑ni=1(X1i − X1)

2, S22 =

∑ni=1(X2i − X2)

2. Let lp ≡ logLp.

To obtain the maximizer of the profile likelihood first we obtain

∂lp∂θ1

= − 1

2(1− ψ2)1/2θ2

{− S21

θ21

+ S22

},

which on equating to zero leads to

θ1(ψ) =

√S2

1

S22

.

62

On differentiating Lp(θ1, θ2, ψ) with respect to θ2 and equating to zero, we obtain

n =1

2(1− ψ2)1/2θ2

{S2

1

θ1

+ θ1S22 − 2ψrS1S2

}.

This gives

θ2(ψ) =S1S2(1− ψr)

n(1− ψ2)1/2. (3.6.2)

Thus the profile likelihood for ψ is given by

Lp(ψ) ∝ exp(−n)

(1− ψr)n(1− ψ2)n/2 ∝ (1− ψ2)n/2

(1− ψr)n. (3.6.3)

Next, from (3.1.3) the determinant of the matrices A and B are 1θ22

and 1θ21θ2

2(1−ψ2)

respectively. From Cox and Reid (1987), one obtains now the conditional profile likelihood

Lcp(ψ) ∝ (1− ψ2)n/2

(1− ψr)nθ1(ψ) θ2

2(ψ) (1− ψ2)1/2

∝ (1− ψ2)n/2

(1− ψr)n

(1− ψr)2

1− ψ2(1− ψ2)1/2

=(1− ψ2)

n−12

(1− ψr)n−2.

(3.6.4)

Next we derive the adjusted profile likelihood for ψ. Let λ denote the vector of

nuisance parameters and U(ψ) the score function, that is U(ψ) = dlogLp(ψ)

dψ. Then

m(ψ) = Eψ,λψU(ψ) and w(ψ) =

{− Eψ,λψ

d2

dψ2 lp(ψ) + ddψ

m(ψ)}

/varψ,λψ(U(ψ)). Also

let U(ψ) = {U(ψ) −m(ψ)}w(ψ). The adjusted profile log likelihood for ψ is obtained as

lap(ψ) =∫ ψ

U(t)dt. From (3.6.3), the score function is then given by

U(ψ) =dlogLp(ψ)

dψ=

n(r − ψ)

(1− ψr)(1− ψ2). (3.6.5)

From Kendall and Stuart (Vol 1,1968,pg 390) and (3.6.3)

m(ψ) = E[U(ψ)] =ψ

2(1− ψ2)+ O(n−1), (3.6.6)

and

m′(ψ) =1 + ψ2

2(1− ψ2)2+ O(n−1). (3.6.7)

63

Next, by Taylor’s expansion we obtain

E{U2(ψ)} =n2

(1− ψ2)2E{ (r − ψ)2

(1− ψr)2}

=n2

(1− ψ2)2E

{(r − ψ)2{1− ψ(r − ψ)− ψ2}−2

}

=n2

(1− ψ2)4E

{(r − ψ)2{1− ψ(r − ψ)

1− ψ2}−2

}

=n2

(1− ψ2)4E

{(r − ψ)2{1 +

2ψ(r − ψ)

1− ψ2+

3ψ2(r − ψ)2

(1− ψ2)2+ . . .}}

=n2

(1− ψ2)4

{(1− ψ2)2

n+ O(n−2)

}

=n

(1− ψ2)2+ O(1).

(3.6.8)

Hence

m(ψ) = O(1),

m′(ψ) = O(1),

V (U(ψ)) =n

(1− ψ2)2+ O(1).

(3.6.9)

Further,

d2lp(ψ)

dψ2= n

d

dψ{ r − ψ

(1− ψr)(1− ψ2)}

= nd

dψ

{ 1

ψ{ 1

(1− ψr)− 1

(1− ψ2)}}

= nd

dψ

{ 1

ψ+

r

(1− ψr)− 1

ψ(1− ψ2)

}

= n{− 1

ψ2+

r2

(1− ψr)2+

1

ψ2(1− ψ2)− 2

(1− ψ2)2

}

= n{ψ2r2 − (1− 2ψr + ψ2r2)

ψ2(1− ψr)2+

1− 3ψ2

ψ2(1− ψ2)2

}

= n{2ψr − 1

ψ2{1− ψ(r − ψ)− ψ2}−2 +

1− 3ψ2

ψ2(1− ψ2)2

}

= n

{2ψ(r − ψ) + 2ψ2 − 1

ψ2(1− ψ2)2{1− ψ(r − ψ)

1− ψ2}−2 +

1− 3ψ2

ψ2(1− ψ2)2

}.

64

Then

E[−d2lp(ψ)

dψ2] = −n

{2ψ2 − 1 + 1− 3ψ2

ψ2(1− ψ2)2+ O(n−1)

}

=n

(1− ψ2)2+ O(1).

(3.6.10)

This leads to

w(ψ) =E[− d2

dψ2 lp(ψ)] + m′(ψ)

var[U(ψ)]

=

n(1−ψ2)2

+ O(1)n

(1−ψ2)2)2+ O(1)

= 1 + O(n−1).

(3.6.11)

Thus

U(ψ) = [U(ψ)−m(ψ)]w(ψ)

= [U(ψ)− ψ

2(1− ψ2)+ O(n−1)][1 + O(n−1)]

= U(ψ)− ψ

2(1− ψ2)+ O(n−1).

(3.6.12)

In other words

dlap(ψ)

dψ= U(ψ)− ψ

2(1− ψ2),

and on integrating we obtain

lap(ψ) = lp(ψ) +1

4log(1− ψ2).

Therefore

Lap(ψ) = Lp(ψ)(1− ψ2)1/4

∝ (1− ψ2)n2+ 1

4 (1− ψr)−n.

(3.6.13)

Finally we wish to find the integrated likelihood. This requires specification of a prior

distribution for the nuisance parameters conditional on the parameter of interest. In

particular, the conditional reference prior of Berger, Liseo and Wolpert(1999) is given

by π(µ1, µ2, θ1, θ2, ψ) ∝ θ−11 θ−2

2 (1 − ψ2)−1/2. Then calculations similar to those done for

65

proving the propriety of ψ lead to the conditional reference integrated likelihood given by

LI(ψ) ∝ (1− ψ2)n−1

2

∞∑a=0

ψa ra

a!Γ2(

n + a

2). (3.6.14)

One common feature of all the modified likelihoods LP (ψ), LCP (ψ), Lap(ψ) andLI(ψ) is

that they are all dependent on the data only through the sample correlation coefficient r.

3.7 Simulation Study

In order to evaluate the three different priors, we undertook a simulation study where

data was generated from a bivariate normal distribution with (µ1, µ2, σ1, σ2) = (0, 0, 1, 1)

and varying values of ρ and varying sample sizes n. Throughout our simulation study, we

annotate the priors as follows:

Prior 1 π ∝ (1− ρ2)σ−11 σ−1

2

Prior 2 π ∝ (1− ρ2)−3/2σ−11 σ−1

2

Prior 3 π ∝ (1− ρ2)−1σ−11 σ−1

2 .

Since the full conditional distribution of the parameters under any of the three priors do

not follow a standard distributional form, we used Gibbs sampling with componentwise

Metropolis-Hastings updates at each iteration to generate random numbers from the

conditional posterior distributions of each parameter (Robert and Casella, 2001). We ran

two chains with different initial values and allowed a burn-in of 4000 each. A random-walk

jumping density with normal noise added to the existing value in the chain for the means

and log standard deviations were used. The correlations also had a random walk prior

by adding a small normal noise to the old values. Each chain was run 10,000 times and

convergence was judged by a Gelman-Rubin (Gelman and Rubin, 1992) diagnostic. The

trace plot presenting the time history of all 10000 iterations for all five parameters is

presented for a sample simulated dataset with ρ = 0.3, under Prior 3 and sample size

10, in Figure 3-1. Figure 3-2 presents the plot of Gelman-Rubin diagnostic for the ρ

chain under the same setting with diagnostic values close to 1 suggesting convergence.

Figures 3-3, 3-4 and 3-5 are posterior distributions for ρ under three different priors for

four different sample sizes n = 10, 20, 30, 40. One can immediately make the following

66

observations. Though there are certain numerical differences, the posterior distribution

of ρ does not seem to vary widely between Prior 1 and 3, even for smaller sample sizes.

Whereas Prior 2 appears to produce a smaller spread in the posterior distribution than

the other two priors. As data information increases with sample size, the posterior

distributions become very similar under the three priors. Some skewness can be observed

in the posterior distributions for smaller sample sizes, which was often noted during

our simulation, but the distribution becomes fairly symmetric as n becomes large. The

posterior distribution also becomes more concentrated around the true value of ρ with

increasing n.

We repeated our Gibbs sampling estimation technique for 500 datasets under each

configuration of ρ and n. Each time, we computed the posterior mean, the 95% quantile

interval (as given by the 2.5th and 97.5th sample percentile of the randomly generated

parameter values after the burn-in period) and the 95% HPD interval. Table 3-1 presents

the average of posterior means, the mean squared error, the frequentist coverage of the

Bayesian credible intervals (as estimated by the proportion of times the true parameter

value falls in the corresponding credible intervals) across the 500 datasets and under

three different priors. Some interesting differences can be noted in the behavior for

smaller sample sizes. Prior 1 appears to be performing worse than Priors 2 and 3 whereas

point estimation of ρ is concerned, with higher bias, though the MSE is not necessarily

larger for all values of ρ. On the other hand, Prior 1 has appreciably better coverage

property for the HPD intervals for smaller sizes, than Priors 2 and 3, and is in fact the

theoretically established HPD matching prior. Priors 1 and 3 are very comparable in

terms of coverage of quantile intervals, with Prior 3 having a slight edge over Prior 1 as it

attains nominal coverage for a smaller n in many cases. Prior 2, elicited from a inversion

of likelihood-ratio statistic point of view appears to be the least attractive from frequentist

coverage perspective. Based on our simulation results, if one is concerned about both

67

point and interval estimation, Prior 3 appears to have a slight edge over the other two

contenders.

Figure 3-1. Plot of Gelman-Rubin Diagnostic Statistic for ρ Under Prior III for n=10Under the Simulation Setting of Section 3.7.

68

Tab

le3-

1.Sim

ula

tion

Res

ult

toC

ompar

eth

eT

hre

eD

iffer

ent

Pri

ors

Sugg

este

dfo

rB

ivar

iate

Nor

mal

Cor

rela

tion

Par

amet

er.

The

Tru

ePar

amet

erSet

tings

are

µ1

=µ

2=

0,σ

1=

σ2

=1

and

Var

yin

gV

alues

ofρ

asLis

ted.

Pri

or1:

π∝

(1−

ρ2)σ−1 1

σ−1 2

,P

rior

2:π∝

(1−

ρ2)−

3/2σ−1 1

σ−1 2

,P

rior

3:π∝

(1−

ρ2)−

1σ−1 1

σ−1 2

.R

esult

sar

eB

ased

on50

0Sim

ula

ted

Dat

aset

s.∗ :

Ave

rage

valu

efo

rpos

teri

orm

ean

ofρ,av

erag

edac

ross

500

sim

ula

ted

dat

aset

s.ρ

nP

rior

1P

rior

2P

rior

3ρ∗

MSE

Cov

erag

eC

over

age

ρM

SEC

over

age

Cov

erag

eρ

MSE

Cov

erag

eC

over

age

(Qua

ntile

)(H

PD

)(Q

uant

ile)

(HP

D)

(Qua

ntile

)(H

PD

)

10-0

.65

0.05

0.87

0.91

-0.8

00.

020.

860.

82-0

.77

0.02

0.88

0.86

-0.8

20-0

.71

0.02

0.87

0.90

-0.7

80.

010.

900.

89-0

.77

0.01

0.90

0.90

30-0

.74

0.01

0.90

0.92

-0.7

90.

010.

920.

91-0

.78

0.01

0.93

0.92

40-0

.76

0.01

0.94

0.95

-0.7

90.

000.

930.

91-0

.79

0.00

0.94

0.92

10-0

.35

0.07

0.92

0.92

-0.4

70.

080.

900.

87-0

.44

0.07

0.92

0.89

-0.5

20-0

.41

0.04

0.91

0.91

-0.4

80.

040.

910.

90-0

.46

0.03

0.91

0.91

30-0

.44

0.02

0.94

0.93

-0.4

90.

020.

940.

93-0

.48

0.02

0.95

0.93

40-0

.44

0.02

0.95

0.96

-0.4

80.

010.

960.

94-0

.47

0.01

0.97

0.95

10-0

.14

0.06

0.94

0.92

-0.1

80.

100.

90.

85-0

.17

0.09

0.91

0.87

-0.2

20-0

.16

0.03

0.94

0.96

-0.2

00.

050.

940.

90-0

.19

0.04

0.95

0.92

30-0

.17

0.03

0.94

0.93

-0.1

90.

030.

920.

91-0

.19

0.03

0.93

0.92

40-0

.17

0.02

0.95

0.94

-0.1

90.

030.

940.

92-0

.18

0.02

0.94

0.94

100.

010.

060.

960.

920.

010.

110.

890.

860.

010.

090.

910.

880

200.

020.

040.

960.

930.

020.

050.

930.

900.

020.

050.

930.

9030

-0.0

10.

020.

930.

92-0

.01

0.03

0.92

0.90

-0.0

10.

030.

910.

9040

00.

020.

960.

950.

000.

030.

950.

920.

000.

030.

940.

9410

0.15

0.06

0.94

0.93

0.21

0.10

0.90

0.85

0.19

0.09

0.93

0.88

0.2

200.

150.

040.

950.

930.

180.

050.

930.

890.

170.

050.

930.

9030

0.17

0.03

0.94

0.93

0.20

0.03

0.93

0.92

0.19

0.03

0.94

0.92

400.

170.

020.

950.

940.

190.

020.

930.

920.

180.

020.

930.

9310

0.35

0.07

0.93

0.91

0.48

0.08

0.88

0.84

0.45

0.07

0.92

0.87

0.5

200.

410.

030.

930.

920.

480.

030.

930.

900.

470.

030.

930.

9230

0.44

0.02

0.93

0.93

0.49

0.02

0.92

0.92

0.48

0.02

0.92

0.92

400.

450.

020.

950.

950.

490.

020.

960.

940.

480.

020.

950.

9510

0.63

0.06

0.84

0.89

0.78

0.03

0.86

0.82

0.74

0.03

0.89

0.86

0.8

200.

710.

020.

890.

920.

790.

010.

930.

910.

770.

010.

930.

9330

0.75

0.01

0.90

0.93

0.80

0.01

0.94

0.92

0.79

0.01

0.95

0.93

400.

760.

010.

920.

940.

790.

000.

940.

930.

790.

000.

940.

94

69

Figure 3-2. Sample Trace Plot for All the Parameters under Prior III for n=10 Under theSimulation Setting of Section 3.7

70

Figure 3-3. Posterior Distribution for ρ under Prior I for Different Sample Sizes, Underthe Simulation Setting of Section 3.7

71

Figure 3-4. Posterior Distribution for ρ under Prior II for Different Sample Sizes, Underthe Simulation Setting of Section 3.7

72

Figure 3-5. Sample Posterior Distribution for ρ under Prior III for Different Sample Sizes,Under the Simulation Setting of Section 3.7

73

CHAPTER 4RATIO OF VARIANCES

There are many experimental situations in which an investigator wants to estimate

the ratio of variances of two independent normal populations. Study of the ratio of

variances dates back to 1920 when Fisher developed the F-statistic for testing the

variance ratio. The most well-used example involves testing of the hypothesis that the

standard deviations of two normally distributed populations are equal. Although ratio

of variances have been vigorously studied in the case of two independent normal samples

both in the frequentist and in the Bayesian literature, little study has been done for a

possibly correlated bivariate normal population. For testing the equality of variances in

a bivariate normal population, Pitman (1939) and Morgan (1939) introduced a variable

transformation which reduces the problem to testing a bivariate normal correlation

coefficient equal to zero. This same idea can be easily extended to test the null hypothesis

whether a variance ratio equals a particular value. Inverting this test statistic, Roy and

Potthoff (1958) obtained confidence bounds on the ratio of variances in the correlated

bivariate normal distribution. Since the test statistic has a Students’s t-distribution under

the null hypothesis, the resulting confidence bounds involves percentiles of a Student

t-distribution.

The objective of this Chapter is to find priors according to the different matching

criteria when the ratio of variances in the bivariate normal distribution is the parameter of

interest and compare the performance for moderate sample sizes. It turns out that there is

a general class of priors which satisfies all the matching criteria.

4.1 The Orthogonal Parameterization

We will continue in the setup of Section 3.1 of Chapter 3 where (X1i, X2i), (i =

1, . . . , n) are independent and identically distributed random variables having a bivariate

normal distribution with means µ1 and µ2, variances σ12(> 0) and σ2

2(> 0), and

74

correlation coefficient ρ (|ρ| < 1). We use the same transformation

θ1 = σ1/σ2, θ2 = σ1σ2(1− ρ2)1/2

and ψ = ρ, (4.1.1)

and obtain the bivariate normal pdf as in (3.1.2).

With this reparameterization, the Fisher Information matrix reduces to

I(µ1, µ2, θ1, θ2, ψ) =

A 0

0 D

, (4.1.2)

where

A =

1θ1θ2(1−ψ2)1/2 − ψ

θ2(1−ψ2)1/2

− ψθ2(1−ψ2)1/2

θ1

θ2(1−ψ2)1/2

and

D = Diag (1

θ21(1− ψ2)

,1

θ22

,1

(1− ψ2)2).

This establishes immediately the mutual orthogonality of (µ1, µ2), θ1,θ2 and ψ in the

sense of Huzurbazar (1950) and Cox and Reid (1987). Such orthogonality is often referred

to as “Fisher Orthogonality”.

The inverse of the information matrix is simply then

I−1(µ1, µ2, θ1, θ2, ψ) =

A−1 0

0 D−1

, (4.1.3)

where

A−1 =

θ1θ2

(1−ψ2)1/2ψθ2

(1−ψ2)1/2

ψθ2

(1−ψ2)1/2θ2

θ1(1−ψ2)1/2

(4.1.4)

and

D−1 = Diag ( θ21(1− ψ2), θ2

2, (1− ψ2)2). (4.1.5)

For subsequent sections, we need also a few other results which are collected in the

following lemma.

75


E(∂3logf

∂θ21∂ψ

)= − ψ

θ21(1− ψ2)2

, E(∂3logf

∂θ21∂θ2

)=

1

θ21θ2(1− ψ2)

; (4.1.6)

E(∂3logf

∂θ31

)=

3

θ31(1− ψ2)

; (4.1.7)

E

((∂logf

∂θ1

)(∂2logf

∂θ21

))= − 1

θ31(1− ψ2)

; (4.1.8)

E( ∂3logf

∂θ1∂ψ2

)= 0,E

(∂3logf

∂θ1∂θ22

)= 0; (4.1.9)

Proof Note that E(X1−µ1)2 = θ1θ2(1−ψ2)−1/2 and E(X2−µ2)

2 = θ−11 θ2(1−ψ2)−1/2.

We begin with (4.1.6).

E(∂3logf

∂θ21∂ψ

)=E

(− ψ(X1 − µ1)2

(1− ψ2)3/2θ31θ2

)

=− ψ

θ21(1− ψ2)2

and

E(∂3logf

∂θ21∂θ2

)=E

( (X1 − µ1)2

(1− ψ2)1/2θ31θ

22

)

=θ1θ2

θ31θ

22(1− ψ2)

=1

θ21θ2(1− ψ2)

.

To prove (4.1.7), we see that

E(∂3logf

∂θ31

)=E

( 3(X1 − µ1)2

(1− ψ2)1/2θ41θ2

)

=3θ1θ2

θ41θ2(1− ψ2)

=3

θ31(1− ψ2)

.

Next, (4.1.8) holds because from the Bartlett Identity

E

((∂logf

∂θ1

)(∂2logf

∂θ21

))=− E

(∂3logf

∂θ31

)− ∂

∂θ1

( 1

θ21(1− ψ2)

)

=− 3

θ31(1− ψ2)

+2

θ31(1− ψ2)

=− 1

θ31(1− ψ2)

.

76

Finally, we see that (4.1.9) holds because

E( ∂3logf

∂ψ2∂θ1

)=− 1

2θ2

{1 + 2ψ2

(1− ψ2)5/2

}E

(− (X1 − µ1)

2

θ21

+ (X2 − µ2)2

)

=− 1

2θ2

{1 + 2ψ2

(1− ψ2)5/2

}(− θ2

θ1(1− ψ2)1/2+

θ2

θ1(1− ψ2)1/2

)

=0

while

E(∂3logf

∂θ1∂θ22

)=− 1

(1− ψ2)1/2θ32

{− (X1 − µ1)2

θ21

+ (X2 − µ2)2)

}

=− 1

(1− ψ2)1/2θ32

{− θ1θ2

θ21(1− ψ2)1/2

+θ2

θ1(1− ψ2)1/2

}= 0.

We derive the matching priors in the next few sections.


Due to orthogonality of (θ1, θ2) with (µ1, µ2, ψ), from (1.2.3.4), the class of first order

matching priors is characterized by

π(µ1, µ2, θ1, θ2, ψ) ∝ θ−11 (1− ψ2)−1/2g0(µ1, µ2, ψ, θ2). (4.2.1)

As is often customary to assign a uniform prior to (µ1, µ2) on R2, we will consider only the

subclass of priors where g0(µ1, µ2, ψ, θ2) = g(ψ, θ2).

A prior of the form π ∝ θ−11 (1 − ψ2)−1/2g(ψ, θ2) satisfies the second-order quantile

matching property if and only if (see (1.2.3.5) of Chapter 1) g satisfies the relation

∂

∂θ2

{θ−11 (1− ψ2)1/2g θ2

1θ22E

(∂3logf

∂θ21∂θ2

)}+

∂

∂ψ

{θ−11 (1− ψ2)1/2g θ2

1(1− ψ2)2E(∂3logf

∂θ21∂ψ

)}

+1

6(1− ψ2)−1/2g

∂

∂θ1

{θ31(1− ψ2)3/2E

(∂logf

∂θ1

)3}= 0

(4.2.2)

From (4.1.5)-(4.1.7), (4.2.2) simplifies to

θ−11 (1− ψ2)−1/2 ∂

∂θ2

{g θ2

}− θ−11

∂

∂ψ

{g ψ(1− ψ2)1/2

}= 0. (4.2.3)

77

Now let g be the class of functions given by g(θ2, ψ) = θa2 |ψ|a(1 − ψ2)−

a+22 . With this

choice of g the left hand side of the above equation reduces to

θ−11 |ψ|a(1− ψ2)−

a+32

∂

∂θ2

θa+12 − θ−1

1 θa2

∂

∂ψ

{|ψ|a+1(1− ψ2)−a+32

+1sgn(ψ)}

= (a + 1)θ−11 θa

2 |ψ|a(1− ψ2)−a+32 − θ−1

1 θa2

{|ψ|a+1(− a + 1

2

)(1− ψ2)−

a+32 (−2ψ)sgn(ψ)

+ (1− ψ2)−a+32

+1(a + 1)|ψ|asgn2ψ}

= (a + 1)θ−11 θa

2 |ψ|a(1− ψ2)−a+32

{1− (

ψ2 + (1− ψ2))}

= 0.

(4.2.4)

Thus every prior π(µ1, µ2, θ1, θ2, ψ) ∝ θ−11 θa

2 |ψ|a(1 − ψ2)−a+32 is a second order probability

matching prior for θ1. Due to the invariance property of such a prior, back to the original

parameterization, a second order matching prior for σ1

σ2is given by π(µ1, µ2, σ1, σ2, ρ) ∝

σa1σ

a2 |ρ|a(1− ρ2)−1.

4.3 Matching Via Distribution Functions

The class of first order quantile matching priors is also first order matching via

distribution functions. Under orthogonality of θ1 with (θ2, . . . , θp), it follows from (1.3.2.1)

and (1.3.2.2) of Chapter 1 that in order that this class of priors also satisfies the second

order distribution function matching criterion, it needs to satisfy the two differential

equations

A1 =∂2

∂θ12

(I11π(θ))− 2

∂

∂θ1

(I11 ∂

∂θ1

π)−p∑

s=2

p∑v=2

∂

∂θs

{E

( ∂3logf

∂θ12∂θs

)I11Isvπ(θ)

}

−p∑

s=2

p∑v=2

∂

∂θ1

{E

( ∂3logf

∂θ1∂θs∂θv

)I11Isvπ(θ)

}= 0.

(4.3.1)

and

A2 =

p∑s=2

p∑v=2

∂

∂θs

{E

( ∂3logf

∂θ12∂θs

)I11Isvπ(θ)

}= 0. (4.3.2)

In our context, when θ1 = σ1

σ2is the parameter of interest, any class of priors of the form

π ∝ θ−11 (1 − ψ2)−1/2g(ψ, θ2) ensures matching of the posterior and frequentist cumulative

78

distribution functions at the second order if from (4.3.1), (4.3.2) and (4.1.5)

g∂2

∂θ21

{θ−11 θ2

1(1− ψ2)1/2}− 2g

∂

∂θ1

{θ21(1− ψ2)1/2 ∂

∂θ1

θ−11

}

− ∂

∂θ2

{θ−11 g θ2

1(1− ψ2)1/2θ22E

(∂3logf

∂θ21∂θ2

)}− ∂

∂ψ

{θ−11 g θ2

1(1− ψ2)1/2(1− ψ2)2E(∂3logf

∂θ21∂ψ

)}

− g∂

∂θ1

{E

( ∂3logf

∂θ1∂θ22

)θ21(1− ψ2)1/2θ2

2θ−11

}− g∂

∂θ1

{E

( ∂3logf

∂θ1∂ψ2

)θ1

2(1− ψ2)1/2(1− ψ2)2θ−11

}

= 0

(4.3.3)

and

g(1− ψ2)−1/2 ∂

∂θ1

{E

(∂3logf

∂θ31

)θ−11 θ4

1(1− ψ2)2}

= 0. (4.3.4)


− θ−11 (1− ψ2)−1/2 ∂

∂θ2

{g θ2

}− θ−11

∂

∂ψ

{g ψ(1− ψ2)1/2

}= 0 (4.3.5)

while the left hand side of (4.3.4) reduces to 3g(1 − ψ2)−1/2 ∂∂θ1

{(1 − ψ2)

}which is clearly

0 for any g. So, we need to find g such that (4.3.5) is satisfied. In particular, (4.3.5) is

satisfied if we let g once again to be the class of functions g(θ2, ψ) = θa2 |ψ|a(1 − ψ2)−

a+22 .

In other words, the same class of priors enjoy second order matching for both quantiles as

well as distribution functions.


We now turn attention to HPD matching priors for θ1. We will consider priors which

ensure that HPD regions with credibility level 1 − α also have asymptotically the same

frequentist coverage probability, the error of approximation being o(n−1). From (3.1.1)

of Chapter 1 any second order matching prior for posterior quantiles of θ1 is also HPD

matching for θ1 in the special case of models satisfying

∂

∂θ1

(Iθ1θ1E

(∂3logf

∂θ31

))= 0. (4.5.1)

79

It is easy to check that when θ1 = σ1

σ2is the parameter of interest from (4.1.5)

and (4.1.7), (4.5.1) holds and hence the second order quantile matching prior π ∝θ−11 θa

2 |ψ|a(1− ψ2)−a+32 is also HPD matching.


In this section, once again, we focus on priors that ensure approximate frequentist

validity of posterior credible regions obtained by inverting the likelihood ratio test

statistic. Then from (1.5.1.5), a likelihood ratio matching prior π is obtained by solving

∂

∂θ2

{π θ2

1(1− ψ2)θ22E

(∂3logf

∂θ21∂θ2

)}+

∂

∂ψ

{π θ2

1(1− ψ2)(1− ψ2)2E(∂3logf

∂θ21∂ψ

)}

+∂

∂θ1

{θ21(1− ψ2)

{∂

∂θ1

π − π(θ21(1− ψ2)E

((∂logf

∂θ1

)(∂2logf

∂θ21

))

− θ22E

(∂3logf

∂θ1∂θ22

)− (1− ψ2)2E( ∂3logf

∂θ1∂ψ2

))}}= 0

(4.6.1)

Then from (4.1.6),(4.1.8) and (4.1.9) of Lemma 4.1, (4.6.1) reduces to

∂

∂θ2

{π θ2

}+

∂

∂ψ

{π ψ(1− ψ2)

}+

∂

∂θ1

{θ21(1− ψ2)

{ ∂

∂θ1

π + πθ−11

}}= 0 (4.6.2)

Consider once again π(µ1, µ2, θ1, θ2, ψ) ∝ θ−11 θa

2 |ψ|a(1 − ψ2)−a+32 . Then ∂

∂θ1π + πθ−1

1 = 0,

and the left hand side of (4.6.2) simplifies to

θ−11 |ψ|a(1− ψ2)−

a+32

∂

∂θ2

θa+12 − θ−1

1 θa2

∂

∂ψ

{|ψ|a+1(1− ψ2)−a+32

+1}

which is exactly the same as the left hand side of (4.2.4), and leads to the same class of

matching priors as before. With this we conclude that we have been able to find a class

of priors π(µ1, µ2, σ1, σ2, ρ) ∝ σa1σ

a2 |ρ|a(1 − ρ2)−1 which satisfies all the different matching

criteria.

4.6 Propriety of the Posteriors

We now establish the propriety of the posteriors. A prior of the form π ∝ θ−11 θa

2 |ψ|a(1−ψ2)−

a+32 satisfies the various matching properties discussed above. Also, the joint posterior

80

of µ1, µ2, θ1, θ2, ψ given X is

π(µ1, µ2, θ1, θ2, ψ|X)

∝ θ−n2 exp

{− 1

2(1− ψ2)1/2θ2

n∑i=1

{(X1i − µ1)2

θ1

+ θ1(X2i − µ2)2 − 2ψ(X1i − µ1)(X2i − µ2)

}}

× θ−11 θa

2 |ψ|a(1− ψ2)−a+32 .

(4.7.1)

Next consider the transformation

ρ = ψ , σ1 =(θ1θ2)

1/2

(1− ψ2)1/4and σ2 =

θ1/22

θ1/21 (1− ψ2)1/4

.

The posterior using the above transformation can be written as

π(µ1, µ2, σ1, σ2, ρ|X)

∝ exp

{− 1

2(1− ρ2)

n∑i=1

{(X1i − µ1)2

σ21

+(X2i − µ2)

2

σ22

− 2ρ(X1i − µ1)(X2i − µ2)

σ1σ2

}}

× (σ1σ2)a−n|ρ|a(1− ρ2)−n/2−1.

(4.7.2)

Now , integrating out µ1 and µ2, we obtain

π(σ1, σ2, ρ|X) ∝ (σ1σ2)−n+a+1|ρ|a(1−ρ2)−

n+12 exp

{− 1

2(1− ρ2)

{S11

σ21

+S22

σ22

−2ρS12

σ1σ2

}}(4.7.3)

Consider another transformation

z1 = σ21(1− ρ2) , z2 = σ2

2(1− ρ2) and z3 = ρ.

Then, using a series expansion, the posterior can be written as

π(z1, z2, z3|X) ∝ (1− z23)−n+1

2+n−a|z3|a(z1z2)

−n+a2 exp

{− 1

2

{S11

z1

+S22

z2

− 2z3S12√z1z2

}}

∝ (1− z23)

n−12−a(z1z2)

−n+a2 exp

{− 1

2(S11

z1

+S22

z2

)

}

×∞∑

r=0

(z3S12)r

zr/21 z

r/22 r!

.

81

On integrating z1 and z2 the marginal posterior of z3 is

π(z3|X) ∝ |z3|a(1− z23)

n−12−a

∞∑r=0

zr3

(2R)r

r!Γ2(

n + r − a− 2

2) for a < n− 2 (4.7.4)

Clearly, for r odd, the integral∫ 1

−1zr3|z3|a (1 − z2

3)n−1

2−a dz3 = 0. So in order to show the

propriety, we need to only show that

I =

∫ 1

−1

|z3|a(1− z23)

n−12−a

{ ∞∑r=0

z2r3

(2R)2r

(2r)!Γ2(

n− a + 2r − 2

2)}dz3 < ∞.

To this end, we first note that

∫ 1

−1

z2r3 |z3|a (1− z2

3)n−1

2−a dz3 = 2

∫ 1

0

z2r+a3 (1− z2

3)n−1−2a

2 dz3

=

∫ 1

0

ur+a+12−1 (1− u)

n+1−2a2

−1 du

= Beta (r +a + 1

2,n + 1− 2a

2)

=Γ(r + a+1

2)Γ(n+1

2− a)

Γ(n+2r−a−22

).

So the posterior now reduces to

I =∞∑

r=0

(2R)2r

(2r)!Γ(r +

a + 1

2)Γ(

n− a + 2r − 2

2). (4.7.5)

By the Legendre duplication formula, (2r)! = Γ(2r + 1) = Γ(r + 12)Γ(r + 1)22r/π1/2. Hence

writing k (> 0) as a generic constant which does not depend on r,

I = k

∞∑r=0

R2r

r!

Γ(r + n−a−22

)Γ(r + a+12

)

Γ(r + 12)

.

Writing the sum as∑∞

r=0 ar, it follows that

ar+1

ar

=R2

r + 1

r + a+12

r + 12

(r +n− a− 2

2) → R2 < 1 as r →∞.

Hence, the summation converges by the ratio rule of convergence, so that the posterior

π(z3|x) and accordingly the joint posterior π(µ1, µ2, σ1, σ2, ρ|X) is proper.

82

4.7 Simulation Study

Using the parameterization θ1 = σ1/σ2, θ2 = σ1σ2(1− ρ2)1/2

and ψ = ρ, as

before, our parameter of interest is θ1. A general class of priors was obtained as π ∝θ−11 θa

2 |ψ|a(1 − ψ2)(−a+32

). This prior satisfies quantile matching, matching via distribution

functions, HPD matching as well as likelihood ratio matching property.

There are three priors that we wish to compare. The first one is π ∝ θ−11 . This was

recommended by Staicu (2007) in her PhD dissertation showing that this prior achieves

matching up to O(n−3/2). The second prior is π ∝ θ−11 (1 − ψ2)−3/2. This was suggested

by Mukerjee and Reid (2001). This is a special case (a=0) of the class of priors that we

obtained satisfying all the matching criteria. Finally, the prior π ∝ θ−11 θ−1

2 (1 − ψ2)−1 was

recommended by Berger and Sun (2007). This is also one-at-a-time reference prior for each

one of the parameters θ1, θ2 and ψ satisfying the first order matching property.

In order to evaluate the three different priors, we undertook a simulation study where

data was generated from a bivariate normal distribution with (µ1, µ2, σ2, ρ) = (0, 0, 1, 0.5)

and varying values of σ1 and varying sample sizes n. The values of θ1 varied from 0.5 to

2.0.

Since the full conditional distribution of the parameters under any of the three priors

do not follow a standard distributional form, we used Gibbs sampling with componentwise

Metropolis-Hastings updates at each iteration to generate random numbers from the

conditional posterior distributions of each parameter (Robert and Casella, 2001). We

ran two chains with different initial values and allowed a burn-in of 10000 each. A

random-walk jumping density with normal noise added to the existing value in the

chain for the means and log standard deviations were used. The correlations also had a

random walk prior by adding a small normal noise to the old values. Each chain was run

40,000 times and convergence was judged by a Gelman-Rubin (Gelman and Rubin, 1992)

diagnostic. The trace plot presenting the time history of the last 8000 iterations for all

five parameters is presented for a sample simulated dataset with θ1 = 0.7, under Prior 3

83

and sample size 20, in Figure 4-1. Figure 4-2 presents the plot of Gelman-Rubin diagnostic

for the θ1 chain under the same setting with diagnostic values close to 1 suggesting

convergence. Figures 4-3, 4-4 and 4-5 are posterior distributions for θ1 under three

different priors for four different sample sizes n = 10, 20, 30, 40. One can immediately make

the following observations. Though there are certain numerical differences, the posterior

distribution of θ1 does not seem to vary widely between priors 1,2 and 3, even for smaller

sample sizes, though Prior 2 typically gave smaller posterior standard deviations. As data

information increases with sample size, the posterior distributions become very similar

under the three priors. Some skewness can be observed in the posterior distributions for

smaller sample sizes, which was often noted during our simulation, but the distribution

becomes fairly symmetric as n becomes large. The posterior distribution also becomes

more concentrated around the true value of θ1 with increasing n, as expected.

We repeated our Gibbs sampling estimation technique for 500 datasets under each

configuration of θ1 and n. Each time, we computed the posterior mean, the 95% quantile

interval (as given by the 2.5th and 97.5th sample percentile of the randomly generated

parameter values after the burn-in period) and the 95% HPD interval. Table 4-1 presents

the average of posterior means, the mean squared error, the frequentist coverage of the

Bayesian credible intervals (as estimated by the proportion of times the true parameter

value falls in the corresponding credible intervals) across the 500 datasets and under three

different priors. Some interesting differences can be noted in the behavior for smaller

sample sizes. Prior 2 appears to be performing best in terms of both coverage of quantile

and HPD intervals and also has excellent point estimation properties in terms of average

posterior mean and MSE for smaller sample sizes. For larger sample sizes all three priors

become almost indistinguishable in terms of their performances.

84

Tab

le4-

1.Sim

ula

tion

Res

ult

toC

ompar

eth

eT

hre

eD

iffer

ent

Pri

ors

Sugg

este

dfo

rB

ivar

iate

Nor

mal

Rat

ioof

Sta

ndar

dD

evia

tion

Par

amet

erθ 1

.T

he

Tru

ePar

amet

erSet

tings

are

µ1

=µ

2=

0,σ

2=

1an

dV

aryin

gV

alues

ofθ 1

=σ

1as

Lis

ted.

Pri

or1:

π∝

θ−1

1,P

rior

2:π∝

(1−

ρ2)−

3/2θ−

11

,P

rior

3:π∝

(1−

ψ2)−

1θ−

11

θ−1

2.

Res

ult

sar

ebas

edon

500

sim

ula

ted

dat

aset

s.∗ :

Ave

rage

valu

efo

rpos

teri

orm

ean

ofθ 1

,av

erag

edac

ross

500

sim

ula

ted

dat

aset

s.θ 1

nP

rior

1P

rior

2P

rior

3θ 1∗

MSE

Cov

erag

eC

over

age

θ 1M

SEC

over

age

Cov

erag

eθ 1

MSE

Cov

erag

eC

over

age

(Qua

ntile

)(H

PD

)(Q

uant

ile)

(HP

D)

(Qua

ntile

)(H

PD

)

100.

520.

020.

930.

940.

510.

020.

950.

950.

520.

020.

930.

930.

520

0.51

0.01

0.94

0.94

0.50

0.01

0.95

0.94

0.51

0.01

0.94

0.94

300.

500.

010.

940.

940.

500.

010.

950.

960.

50.

010.

940.

9440

0.51

0.01

0.96

0.95

0.51

0.01

0.96

0.95

0.51

0.01

0.95

0.95

101.

060.

140.

930.

941.

060.

120.

950.

951.

070.

140.

930.

931.

020

1.01

0.04

0.96

0.94

1.01

0.03

0.95

0.95

1.01

0.04

0.95

0.94

301.

010.

030.

960.

951.

010.

020.

960.

951.

010.

030.

950.

9540

1.01

0.02

0.95

0.94

1.01

0.02

0.95

0.95

1.01

0.02

0.95

0.94

101.

610.

250.

920.

941.

610.

230.

950.

951.

610.

260.

930.

941.

520

1.54

0.12

0.94

0.92

1.54

0.11

0.93

0.93

1.54

0.12

0.93

0.93

301.

510.

060.

940.

941.

510.

050.

930.

921.

510.

060.

930.

9340

1.52

0.05

0.94

0.94

1.52

0.05

0.95

0.95

1.52

0.05

0.95

0.95

102.

110.

440.

940.

942.

110.

420.

950.

942.

110.

420.

960.

942.

020

2.04

0.18

0.95

0.95

2.04

0.18

0.95

0.95

2.04

0.18

0.95

0.95

302.

040.

110.

950.

962.

040.

100.

950.

952.

040.

110.

940.

9540

2.03

0.08

0.95

0.95

2.03

0.07

0.95

0.95

2.03

0.08

0.95

0.94

85

Figure 4-1. Sample trace plot for all the parameters under Prior 3 for n=20 under thesimulation setting of Section 4.8, True value of θ1 = 0.7.

86

Figure 4-2. Plot of Gelman-Rubin Diagnostic Statistic for θ1 under Prior 3 for n=20 underthe simulation setting of Section 4.8, True value of θ1 = 0.7.

87

Figure 4-3. Posterior Distribution for θ1 under Prior 1 for Different Sample Sizes, underthe Simulation Setting of Section 4.8. True value of θ1 = 0.7.

88

Figure 4-4. Posterior Distribution for θ1 under Prior 2 for Different Sample Sizes, underthe Simulation Setting of Section 4.8. True value of θ1 = 0.7.

89

Figure 4-5. Sample Posterior Distribution for θ1 under Prior 3 for Different Sample Sizes,under the Simulation Setting of Section 4.8. True value of θ1 = 0.7.

90

CHAPTER 5SUMMARY

Study of probability matching priors, that ensure approximate frequentist validity of

posterior credible sets, has received much attention in recent years. In this dissertation,

we develop some such priors for parameters and some functions of the parameters of a

bivariate normal distribution. The criterion used is the asymptotic matching of coverage

probabilities of Bayesian credible intervals with the corresponding frequentist coverage

probabilities. The paper uses various matching criteria, namely, quantile matching,

matching of distribution functions, highest posterior density matching, and matching via

inversion of test statistics. Orthogonal parameterizations were obtained which simplified

the differential equations that needed to be solved for obtaining these matching priors.

First, we considered the (i) regression coefficient, (ii) the generalized variance, i.e. the

determinant of the variance-covariance matrix, and (iii) ratio of the conditional variance

of one variable given the other divided by the marginal variance of the other variable as

the parameters of interest. Here we have been able to find a single prior which meets all

the four matching criteria for every one of these parameters. The agreement between the

frequentist and posterior coverage probabilities of HPD intervals is quite good for the

probability matching priors even for small sample sizes.

Next we consider the bivariate normal correlation coeffcient as the parameter of

interest. Here we obtain different priors satisfying the different matching criteria and

compare their performance for moderate sample sizes. There however, does not exist

a prior that satisfies the matching via distribution functions criterion. In addition,

we develop inference based on certain modifications of the profile likelihood, namely

conditional profile likelihood, adjusted profile likelihood and integrated likelihood. One

common feature of all the modified likelihoods is that they are all dependent on the data

only through the sample correlation coefficient r.

91

Finally, we considered the ratio of the standard deviations of the bivariate normal

distribution as our parameter of interest. A general class of priors was obtained in this

case which satisfied all the matching criteria. A specific prior from this class was chosen

and it’s performance was compared with some other commonly used priors. The chosen

prior seems to be performing the best in terms of both coverage of quantile and HPD

intervals and also has excellent point estimation properties in terms of average posterior

mean and MSE for smaller sample sizes.

Recently, Sun and Berger (2006) have illustrated objective Bayesian inference for

the multivariate normal distribution using different types of formal objective priors,

different modes of inference and different criteria involved in selecting optimal objective

priors. They, in particular, focus on reference priors, and show that the right-Haar

prior is a one-at-time reference prior for many parameters and functions of parameters.

Our future research will concentrate on finding probability matching priors for the

multivariate analogs of the bivariate normal parameters. Here interest lies in several

parameters or parametric functions. For instance, we may be interested in the generalized

variance, the regression matrix or the correlation matrix. Then posterior quantiles

are not well-defined. HPD regions and credible region via the LR statistic remain

meaningful and of much interest. They can be used to find matching priors. Also the

joint posterior c.d.f remains meaningful and provides a viable route for finding matching

priors. Orthogonal parameterizations are not guaranteed. However if found, they will

simplify the computations.

92

REFERENCES

Bayarri, M.J. (1981), “Inferencia bayesiana sobre elcoeficientede correlacin de unapoblacin normal bivariante,” in Trabajos de Estadistica e Investigacion Operativa, 32,18-31.

Bartlett, M. S. (1937), “Properties of Sufficiency and Statistical Tests”, Pro. Roy.Soc. London A, 160, 268-282.

Berger, J., and Bernardo, J. M. (1992a), “On the Development of Reference Priors”(with discussion), in Bayesian Statistics 4, eds. J. M. Bernardo, J.O. Bereger, A. P.Dawid, and A. F. M. Smith, Oxford, U.K.: Oxford University Press, pp. 35-60.

Berger, J., Liseo, B., and Wolpert, R.L. (1999), “Integrated Likelihood Methods forEliminating Nuisance Parameters,” Statistical Science, 14, 1-22.

Berger, J., and Sun, D. (2007), “Objective priors for the Bivariate Normal Model ,”to appear in the Annals of Statistics.

Berger, J., and Sun, D. (2006), “Objective priors for a Bivariate Normal Model withMultivariate Generalizations,” in ISDS Technical Report, Duke University.

Bernardo, J. M. (1979), “Reference Posterior Distributions for Bayesian Inference”(with discussion), Journal of the Royal Statistical Society, Ser. B, 41, 113-147.

Cox, D. R., and Reid, N. (1987), “Orthogonal Parameters and ApproximateConditional Inference” (with discussion), Journal of the Royal Statistical Society,Ser. B, 49, 1-39.

Datta, G. S., and Ghosh, J. K. (1995a), “Noninformative Priors for MaximalInvariant Parameter in Group Models,” Test, 4, 95-114.

Datta, G. S., and Ghosh, J. K. (1995b), “On Priors Providing Frequentist Validityfor Bayesian Inference,” Biometrika, 82, 37-45.

Datta, G. S., and Ghosh, M. (1995a), “Some Remarks on Noninformative Priors,”Journal of the American Statistical Association, 90, 1357-1363.

Datta, G. S., and Ghosh, M. (1996), “On the Invariance of Noninformative Priors,”Annals of Statistics, 24, 141-159.

Datta, G. S., Ghosh, M., and Mukerjee, R. (2000), “Some New Results onProbability Matching Priors,” Calcutta Statistics Association Bulletin, 50, 179-192.

Datta, G. S., and Mukerjee, R. (2004), Probability Matching Priors: Higher OrderAsymptotics, Lecture notes in Statistics, Springer, New York.

Datta, G. S., and Sweeting, T. J. (2005), “Probability Matching Priors,” Handbookof Statistics, Vol 25: Bayesian Thinking: Modeling and Computation, eds. D. Dey,and C.R. Rao, Elsevier, pp. 91-114.

93

Dawid, A. P., Stone, M., and Zidek, J. V. (1973), “Marginalization Paradoxes inBayesian and Structural Inference” (with discussion), Journal of the Royal StatisticalSociety, Ser. B, 35, 189-233.

DiCicio, T. J., and Stern, S.E. (1994), “Frequentist and Baysian Bartlett Correctionof Test Statistics based on Adjusted Profile Likelihoods,” Journal of the RoyalStatistical Society, Ser. B, 56, 397-408.

Fisher, R. A. (1956), Statistical Methods and Scientific Inference, Oliver and Boyd,Edinburgh.

Garvan, C. W., and Ghosh, M. (1997), “Noninformative Priors for DispersionModels,” Biometrika, 84, 976-982.

Garvan, C. W., and Ghosh, M. (1999), “On the Property of Posteriors for DispersionModels,” Journal of Statistical Planning and Inference, 78, 229-241.

Gelman, A., and Rubin, D.B. (1992), “Inference from Iterative Simulation UsingMultiple Sequences,” Statistical Science, 7, 457-472.

Ghosh, J. K. (1994), Higher Order Asymptotics, Institute of Mathematical Statisticsand American Statistical Association, Hayward, California.

Ghosh, J. K., and Mukerjee, R. (1991), “Characterization of Priors under whichBayesian and Frequentist Bartlett Corrections are Equivalent in the MultiparameterCase,” Journal of Multivariate Analysis, 38, 385-393.

Ghosh, J. K., and Mukerjee, R. (1992b), “ Bayesian and Frequentist BartlettCorrections for Likelihood Ratio and Conditional Likelihood Ratio Tests,” Journal ofthe Royal Statistical Society, Ser. B, 54, 867-875.

Ghosh, J. K., and Mukerjee, R. (1993a), “On Priors that match Posterior andFrequentist Distribution Functions,” Canadian Journal of Statistics, 21, 89-96.

Ghosh, J. K., and Mukerjee, R. (1993b), “Frequentist Validity of Higher PosteriorDensity Regions in Multiparameter Case,” Annals of the Institute of StatisticalMathematics, 45, 293-302.

Ghosh, J. K., and Mukerjee, R. (1994b), “Adjusted versus Conditional Likelihood: Power Properties and Bartlett-type Adjustment,” Journal of the Royal StatisticalSociety, Ser. B, 56, 185-188.

Ghosh, J. K., and Mukerjee, R. (1995), “Frequentist Validity of Higher PosteriorDensity Regions in the presence of Nuisance Parameters,” Statistical Decisions, 13,131-139.

Ghosh, M., Carlin, B. P., and Srivastava, M. S. (1995), “Probability Matching Priorsfor Linear Calibration,” Test, 4, 333-357.

94

Ghosh, M., and Kim, Y-H. (2001), “The Behrens-Fisher problem revisited: A BayesFrequentist Synthesis,” Canadian Journal of Statistics, 29, 5-17.

Ghosh, M., and Mukerjee, R. (1998), “Recent Developments on ProbabilityMatching Priors,” in: S. E. Ahmed, M. Ahsanullah and B. K. Sinha, eds., AppliedStatistical Science, III,, Nova Science Publishers, New York, pp. 227-252.

Ghosh, M., and Yang, M. C. (1996), “Noninformative problem for the Two-SampleNormal problem,” Test, 5, 145-157.

Ghosh M.(2001), “Interval Estimation for a Binomial Proportion: Comment,”Statistical Science, 16, 124-125.

Godambe, V.P. (1960), “An Optimum Property of Regular Maximum LikelihoodEstimation,” Annals of Mathematical Statistics, 31, 1208-1211.

Huzurbazar, V. S. (1950), “Probability Distributions and Orthogonal Parameters,”in Proceedings Cambridge Phil. Society, 46, 281-284.

Jeffreys, H. (1961), Theory of Probability, Oxford, U.K.: Oxford University Press.

Kalbfleisch, J.D., and Sprott, D.A. (1970), “Application of Likelihood Methods toModels Involving Large number of Parameters” (with discussion), Journal of theRoyal Statistical Society, Ser. B, 32, 175-208.

Kass, R. E., and Wasserman, L. (1996), “The Selection of Prior Distributions byFormal Rules,” Journal of the American Statistical Association, 91, 1343-1370.

Kendall, M.G., and Stuart, A. (1969), The Advanced Theory of Statistics, Vol 1: 390,Hafner Publishing Company, New York.

Lee, C.B. (1989), Comparison of Frequentist Coverage Probability and BayesianPosterior Coverage Probability, and Applications. Unpublished Ph.D. dissertation,Purdue university, Indiana.

Lindsay, B. (1982), “Conditional Score Functions: Some Optimality Results,”Biometrika, 69, 503-512.

Lindley, D.V. (1965), Introduction to Probability and Statistics from a BayesianViewpoint, Cambridge University Press: Cambridge.

McCullagh, P., and Tibshirani, R. (1990), “A Simple Method for the Adjustment ofProfile Likelihoods,” Journal of the Royal Statistical Society, Ser. B, 52, 325-344.

Morgan, W.A. (1939), “A Test for the Significance of the Difference Between theTwo Variances in a Sample from a Normal Bivariate Population,” Biometrika, 31,13-19.

95

Mukerjee, R., and Dey, D. K. (1993), “Frequentist Validity of Higher PosteriorQuantiles in the Presence of Nuisance Parameters: Higher Order Asymptotics,”Biometrika, 80, 499-505.

Mukerjee, R., and Ghosh, M.(1997), “Second Order Probability Matching Priors,”Biometrika, 84, 970-975.

Mukerjee, R., and Reid, N. (2001), “Second-Order Probability Matching Priors fora Parametric Function with Application to Bayesian Tolerance Limits,” Biometrika,88, 587-592.

Nicolaou, A. (1993), “Bayesian Intervals with Good Frequency Behavior in thePresence of Nuisance Parameters,” Journal of the Royal Statistical Society, Ser. B,55, 377-390.

Peers, H. W. (1965), “Confidence Properties of Bayesian Interval Estimates,”Journal of the Royal Statistical Society, Ser. B, 30, 535-544.

Pitman, E.J.G. (1939), “A Note on Normal Correlation,” Biometrika, 31, 9-12.

Rao, C.R., and Mukerjee, R. (1995), “On Posterior Credible Sets based on the ScoreStatistic,” Statistica Sinica, 5, 781-791.

Roy, S.N., and Potthoff, R.F. (1958), “Confidence Bounds on Vector Analogues ofthe “Ratio of Means” and the “Ratio of Variances” for Two Correlated NormalVariates and Some Associated Tests,” The Annals of Mathematical Statistics, 29,829-841.

Severini, T. A. (1991), “On the Relationship between Bayesian and Non-BayesianInterval Estimates,” Journal of the Royal Statistical Society, Ser. B, 53, 611-618.

Severini, T. A., Mukerjee, R., and Ghosh, M. (2002), “On an Exact ProbabilityMatching Property of Right-Invariant Priors,” Biometrika, 89, 952-957.

Staicu, A. (2007), “On Some Aspects of Likelihood Methods with Applications inBiostatistics,” Unpublished PhD dissertation, University of Toronto, Toronto.

Stein, C, (1985), “On the Coverage Probability of Confidence Sets Based on a PriorDistribution,” in Sequential Methods in Statistics, Banach Center Publications, 16,Warsaw: Polish Scientific Publishers, pp. 485-514.

Sun, D., and Berger, J. (2006), “Objective Bayesian Analysis for the MultivariateNormal Model,” ISDS Technical Report, Duke University. To appear in BayesianStatistics 8, eds. J.M. Bernardo, et. al., Oxford, U.K., Oxford University Press.

Tibshirani, R. (1989), “ Noninformative Priors for One Parameter of Many,”Biometrika, 76, 604-608.

96

Welch, B. L., and Peers, H. W. (1963), “On Formulae for Confidence Points Basedon Integrals of Weighted Likelihoods,” Journal of the Royal Statistical Society, Ser.B, 25, 318-329.

Yin, M., and Ghosh, M. (1997), “A Note on the Probability Difference BetweenMatching Priors Based on Posterior Quantiles and on Inversion of ConditionalLikelihood Ratio Statistics,” Calcutta Statistical Association Bulletin, 47, 59-65.

97

BIOGRAPHICAL SKETCH

Upasana Santra was born on March 4, 1977 in Kanpur, India. She graduated from St.

Mary’s Convent High School, Kanpur in 1995. She earned her B.Sc. from Banaras Hindu

University, Varanasi and her M.Sc. from Indian Institute of Technology, Kanpur in 1998

and 2000, respectively, majoring in Statistics.

Upon arriving to the United States with her husband, Swadeshmukul Santra, she

worked as a Statistical Consultant in the Statistics Unit of IFAS at the University of

Florida. She earned her M.S. in Statistics in 2003 from the University of Florida and

continued for her Ph.D. degree thereafter.

98

probability matching priors for the bivariate normal...

Documents