probability matching priors for the bivariate normal...
TRANSCRIPT
PROBABILITY MATCHING PRIORS FOR THE BIVARIATE NORMALDISTRIBUTION
By
UPASANA SANTRA
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2008
1
c© 2008 Upasana Santra
2
To my parents who nurtured my academic interests making this milestone possible
3
ACKNOWLEDGMENTS
First, I offer my sincerest gratitude to my committee chair Dr. Malay Ghosh, who
supported me with his knowledge and patience. I would like to thank my supervisory
committee members Dr. Ramon Littell, Dr. Bhramar Mukherjee who also served as
cochair and Dr. Jonathan Shuster. Special thanks go to Dr. Bhramar Mukherjee for her
continuous guidance, support and help. I acknowledge her and Dr. Dalho Kim for doing
the simulation studies in my dissertation.
Finally I would like to thank my family, especially my husband Swadeshmukul Santra,
who first encouraged me to pursue this degree, and my daughter Laboni Santra. Without
their continuing support and encouragement, I would not have finished this degree.
4
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
CHAPTER
1 LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2 Matching Via Posterior Quantiles . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.1 Notation and Differential Equation . . . . . . . . . . . . . . . . . . 151.2.2 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.2.1 Case p=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.2.2.2 Case p=2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.3 Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . 191.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Matching Priors for Distribution Functions . . . . . . . . . . . . . . . . . . 231.3.1 Notation and Differential Equation . . . . . . . . . . . . . . . . . . 231.3.2 Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . 241.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Matching Priors for Highest Posterior Density Regions . . . . . . . . . . . 251.4.1 Notation and Differential Equation . . . . . . . . . . . . . . . . . . 261.4.2 Special Case: p=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.4.3 Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . 281.4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Matching Priors Associated with Other Credible Regions . . . . . . . . . . 291.5.1 Matching Priors Associated with the LR Statistic . . . . . . . . . . 30
1.5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 301.5.1.2 Differential equation . . . . . . . . . . . . . . . . . . . . . 301.5.1.3 Special case: p=1 . . . . . . . . . . . . . . . . . . . . . . . 301.5.1.4 Nuisance parameters and orthogonality . . . . . . . . . . . 31
1.5.2 Matching Priors Associated with Rao’s Score and Wald’s Statistic . 321.5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 321.5.2.2 Differential equation . . . . . . . . . . . . . . . . . . . . . 331.5.2.3 Special case: p=1 . . . . . . . . . . . . . . . . . . . . . . . 34
2 MATCHING PRIORS FOR SOME BIVARIATE NORMAL PARAMETERS . . 35
2.1 The Orthogonal Reparameterization . . . . . . . . . . . . . . . . . . . . . 352.2 Quantile Matching Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5
2.3 Matching Via Distribution Functions . . . . . . . . . . . . . . . . . . . . . 432.4 Highest Posterior Density (HPD) Matching Priors . . . . . . . . . . . . . . 442.5 Matching Priors Via Inversion of Test Statistics . . . . . . . . . . . . . . . 462.6 Propriety of Posteriors and Simulation Study . . . . . . . . . . . . . . . . . 47
3 THE BIVARIATE NORMAL CORRELATION COEFFICIENT . . . . . . . . . 52
3.1 The Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . . 523.2 Quantile Matching Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.3 Highest Posterior Density (HPD) Matching Priors . . . . . . . . . . . . . . 573.4 Matching Priors Via Inversion of Test Statistics . . . . . . . . . . . . . . . 583.5 Propriety of the Posteriors . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.6 Likelihood Based Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.7 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4 RATIO OF VARIANCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.1 The Orthogonal Parameterization . . . . . . . . . . . . . . . . . . . . . . . 744.2 Quantile Matching Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3 Matching Via Distribution Functions . . . . . . . . . . . . . . . . . . . . . 784.4 Highest Posterior Density (HPD) Matching Priors . . . . . . . . . . . . . . 794.5 Matching Priors Via Inversion of Test Statistics . . . . . . . . . . . . . . . 804.6 Propriety of the Posteriors . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.7 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5 SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6
LIST OF TABLES
Table page
1-1 Fisher-Von Mises P(µ,λ)(0.05; µ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1-2 Fisher-Von Mises P(µ,λ)(0.95; µ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2-1 Frequentist Coverage Probabilities of 95% HPD Intervals for β, θ and η whenσ2
1 = 1 and σ22 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3-1 Simulation Comparing Priors for Bivariate Normal Correlation Coefficient . . . 69
4-1 Simulation Comparing Priors Suggested for Bivariate Normal Ratio of StandardDeviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7
LIST OF FIGURES
Figure page
3-1 Plot of Gelman-Rubin Diagnostic Statistic for ρ Under Prior III for n=10 Underthe Simulation Setting of Section 3.7. . . . . . . . . . . . . . . . . . . . . . . . . 68
3-2 Sample Trace Plot for All the Parameters under Prior III for n=10 Under theSimulation Setting of Section 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3-3 Posterior Distribution for ρ under Prior I for Different Sample Sizes, Under theSimulation Setting of Section 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3-4 Posterior Distribution for ρ under Prior II for Different Sample Sizes, Under theSimulation Setting of Section 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3-5 Sample Posterior Distribution for ρ under Prior III for Different Sample Sizes,Under the Simulation Setting of Section 3.7 . . . . . . . . . . . . . . . . . . . . 73
4-1 Sample Trace Plot for all Parameters under Prior 3 under the Simulation Settingof Section 4.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4-2 Plot of Gelman-Rubin Diagnostic Statistic for θ1 under Prior 3 . . . . . . . . . . 87
4-3 Posterior Distribution for θ1 under Prior 1 for Different Sample Sizes . . . . . . 88
4-4 Posterior Distribution for θ1 under Prior 2 for Different Sample Sizes . . . . . . 89
4-5 Sample Posterior Distribution for θ1 under Prior 3 for Different Sample Sizes . . 90
8
Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy
PROBABILITY MATCHING PRIORS FOR THE BIVARIATE NORMALDISTRIBUTION
By
Upasana Santra
May 2008
Chair: Malay GhoshCochair: Bhramar MukherjeeMajor: Statistics
In practice, most Bayesian analyses are performed with so called “non-informative”
priors. This is especially so when there is little or no prior information, and yet the
Bayesian technique can lead to solutions satisfactory from both the Bayesian and the
frequentist perspectives. The study of probability matching priors ensuring, upto the
desired order of asymptotics, the approximate frequentist validity of posterior credible
sets has received significant attention in recent years. In this dissertation we develop
some objective priors for certain parameters of the bivariate normal distribution. The
parameters considered are the regression coefficient, the generalized variance, the ratio
of one of the conditional variances to the marginal variance of the other variable, the
correlation coefficient and the ratio of the standard deviations. The criterion used is
the asymptotic matching of coverage probabilities of Bayesian credible intervals with
the corresponding frequentist coverage probabilities. Various matching criteria, namely,
quantile matching, matching of distribution functions, highest posterior density matching,
and matching via inversion of test statistics are used.
One particular prior is found which meets all the matching criteria individually for
the regression coefficient, the generalized variance and the ratio of one of the conditional
variances to the marginal variance of the other variable. For the correlation coefficient
though, each matching criterion leads to a different prior. There however, does not exist a
9
prior that satisfies the matching via distribution functions criterion in this case. Finally,
a general class of priors have been obtained for inference about the ratio of standard
deviations.
The propriety of the resultant posteriors is proved in each case under mild conditions
and simulation results suggest that the approximations are valid even for moderate sample
sizes. Further, several likelihood based methods have been considered for the correlation
coefficient. One common feature of all these modified likelihoods is that they are all
dependent on the data only through the sample correlation coefficient r.
10
CHAPTER 1LITERATURE REVIEW
1.1 Introduction
The Bayesian paradigm is an attempt to utilize all available information in decision-making.
Prior knowledge coming from experience, expert judgement, or previously collected data is
used with current data to characterize the current state of knowledge. However, even with
little or no prior information, one can often employ noninformative priors to draw reliable
inference which has led to a remarkable increase in the popularity of Bayesian methods in
the theory and practice of statistics. Thus over the years, a wide range of noninformative
priors have been proposed and studied.
The earliest use of noninformative priors is attributed to Laplace (1812). Laplace’s
rule, or the principle of insufficient reason, assigns a flat prior over the entire parameter
space. A problem with this rule is that it is not invariant under one-to one reparameterization.
For example, if θ is given a uniform distribution, then φ = exp (θ) will not have a uniform
distribution. Conversely, if we start with a uniform distribution for φ, then θ = log (φ)
will not have a uniform distribution. Since most statistical models do not have a unique
parameterization, this becomes bothersome. For example, a uniform prior for the standard
deviation σ will not transform into a uniform prior for the variance σ2. This lack of
invariance of the uniform prior often translates into significant variation in the resulting
posteriors.
Thus, Jeffreys (1961) proposed a prior which remains invariant under any one-to-one
reparameterization. In the general multiparameter set up, writing the Fisher Information
matrix as I(θ), where
I(θ) = E(− ∂2l
∂θi∂θj
)
and l is the log-likelihood, the rule is to take the prior to be
π(θ) ∝ |I(θ)| 12 .
11
The rule is applicable as long as I(θ) is defined and positive definite. As is easily checked,
it has the invariance property that for any other parameterization γ which is one-to-one
with θ,
π(γ) ∝ |J ||I(θ)| 12 ;
where J = det( ∂θ∂γ
). So the priors defined by the rule on γ and θ transform according
to the change-of-variables formula. Thus it does not require the selection of any specific
parameterization.
There are many intuitive justifications to use Jeffrey’s prior. One that concerns us
is a probability matching property. As an example, if X1, . . . , Xn are iid N(θ, 1), then
Xn =∑n
i=1 Xi/n is the MLE of θ. With the uniform prior π(θ) ∝ c ( a constant), the
posterior of θ is N(Xn, 1/n). Accordingly, writing zα for the upper 100α point of the
N(0,1) distribution,
P (θ ≤ Xn + zαn−1/2|Xn) = 1− α = P (θ ≤ Xn + zαn−1/2|θ)
This is an example of exact matching. Other examples of exact matching can be found
in Datta, Ghosh, M and Mukerjee (2000) and Severini, Mukerjee and Ghosh, M. (2002).
However, in most instances one has to rely on asymptotics rather than exact matching.
To see this, suppose θn is the MLE of θ. Then θn| θ is asymptotically N(θ, I−1(θ)), where
I(θ) is the Fisher Information number. Using the transformation g(θ) =∫ θ
I1/2(t),
g(θn) is asymptotically N(g(θ), 1) by the delta method. Now, intuitively one expects the
uniform prior as the asymptotic matching prior for g(θ). Transforming back to the original
parameter, Jeffreys’ prior is a probability matching prior for θ. This is discussed in Ghosh,
M. (2001). To see this, let φ = g(θ) and π(φ) = 1. Then |∂φ∂θ| = |g′(θ)| = I1/2(θ).
The above matching property is usually referred to as the quantile matching property.
However, quantile matching is one of several matching criteria available in the literature.
Typically, this matching of posterior coverage probability of a Bayesian credible set with
the corresponding frequentist coverage probability is also accomplished through either (a)
12
highest posterior density (HPD) regions, or (b) Distributions Functions , or (c) inversion
of certain test statistics.
Matching priors based on posterior quantiles was first investigated by Welch and
Peers (1963) who considered a scalar parameter of interest θ in the absence of any
nuisance parameters. In this case they showed by solving a differential equation that the
frequentist coverage probability of a one-sided posterior credible interval for θ matches
the nominal level with a remainder of o(n−12 ), where n is the sample size, if and only if
one works with Jeffreys’ prior. Such a prior will be referred to as a first order probability
matching prior. Welch and Peers proved this result only for continous distributions.
Ghosh, J.K. (1994) pointed out a suitable modification which would lead to the same
conclusion for discrete distributions. On the other hand, if one requires the remainder to
be of the order o(n−1), then we have a second order probability matching prior. We shall
see later that the Jeffreys’ prior is not necessarily a second order matching prior even in
the one parameter case. Moreover, Jeffrey’s prior has been criticized in the presence of
nuisance parameters. For example, Bernardo (1979) has shown that Jeffrey’s prior can
lead to marginalization paradox (cf. Dawid, Stone and Zidek (1973)) for inference about µσ
when the model is normal with mean µ and variance σ2. A second example, due to Berger
and Bernardo (1992a), shows that Jeffrey’s prior can lead to an inconsistent estimator of
the error variance in the balanced one-way normal ANOVA model when the number of
cells grows to infinity in direct proportion to the sample size. So Jeffrey’s prior fails to
avoid the Neyman-Scott (1948) phenomenon.
The original idea of Welch and Peers (1963) was pursued in the nuisance parameter
case by Peers (1965), Stein (1985), Tibshirani (1989), Nicolaou (1993), Datta and Ghosh,
J.K. (1995 a, b), Datta and Ghosh, M. (1995, 1996), Ghosh, M., Carlin and Srivastava
(1995), Ghosh, M. and Yang (1996), Datta (1996) and Garvan and Ghosh, M. (1996)
among others. As we shall see, matching is obtained by solving differential equations. The
calculations are highly simplified if the parameter of interest is orthogonal to the nuisance
13
parameters (Cox and Reid, 1987). If θ is partitioned into two vectors θ1 and θ2 of length
p1 and p2 respectively, where p1 + p2 = p, then θ1 is ortho gonal to θ2 if the elements of the
information matrix satisfy
iθsθt =1
nEθ
( ∂l
∂θs
∂l
∂θt
)
=1
nEθ
(− ∂2l
∂θs∂θt
)= 0
for s = 1, . . . , p1, t = p1 + 1, . . . , p1 + p2 ; this is to hold for all θ in the parameter space.
Note that l(θ) is the log-likelihood and i refers to information per observation, which will
be assumed to be O(1) as n →∞.
Suppose that we have a scalar parameter of interest, that is, θ = (θ1, . . . , θp)T , where
θ1 is the parameter of interest and the rest are nuisance parameters. Writing I(θ) = ((Ijk))
as the Fisher Information matrix, if θ1 is orthogonal to (θ2, . . . , θpT ), that is I1k = 0 for all
k = 2, . . . , p, extending the previous intuitive argument, π(θ) ∝ I111/2(θ) is a probability
matching prior. In the presence of nuisance parameters, Jeffreys’ prior may not satisfy the
quantile matching property. As an example, consider the Behrens-Fisher problem ( Ghosh,
M. and Kim, 2001). The model is represented by the density
1
γ1γ2
φ(x(1) − µ1
γ1
)φ(x(2) − µ2
γ2
), x(1), x(2)εR1
where µ1, µ2 ( εR1) and γ1, γ2 (> 0) are unknown parameters and φ is the standard normal
p.d.f. Interest lies in the difference µ1 − µ2. Reparameterize as
θ1 = µ1 − µ2, θ2 =(µ1/γ1
2) + (µ2/γ22)
(1/γ12) + (1/γ2
2), θ3 = γ1, θ4 = γ2
where θ1, θ2 εR1, and θ3, θ4 > 0. Then θ1 is the parameter of interest and the above is an
orthogonal parameterization. In this example, the first order matching is achieved if and
only if π(θ) = d(θ(2))(θ32 + θ4
2)−1/2
where θ(2) = (θ2, θ3, θ4)T . Moreover, it can be seen
that d(θ(2)) ∝ (θ32+θ4
2)3/2
(θ3θ4)3satisfies second order matching condition. However Jeffreys prior
14
for this model is proportional to (θ3θ4)−2 and is hence not even a first order probability
matching prior.
In the following sections we review and characterize the different matching priors.
1.2 Matching Via Posterior Quantiles
A major part of recent research still involves priors centered around those which
ensure approximate frequentist validity of the posterior quantiles. Specifically, we consider
priors π(.) for which the relation
Pθ{θ1 ≤ θ1(1−α)(π,X)} = 1− α + o(n−r/2) (1.2.1)
holds for r = 1 or 2 and for each α (0 < α < 1). Here n is the sample size, θ = (θ1, . . . , θp)T
is an unknown parameter vector, θ1 is the one-dimensional parameter of interest, Pθ{.} is
the frequentist probability measure under θ, and θ1(1−α)(π,X) is the (1 − α)th posterior
quantile of θ1, under π(.), given the data X. Priors satisfying (1.2.1) for r=1 or 2 are called
first or second order matching priors respectively. Clearly, they ensure that one-sided
Bayesian credible sets of the form (−∞, θ1(1−α)(π, X)] for θ1 have correct frequentist
coverage as well up to the order of approximation indicated in (1.2.1). In the presence
of nuisance parameters, a first order matching prior is not unique. The study of second
order matching priors, which ensures correct frequentist coverage to a higher order of
approximation, can help significantly in narrowing down the class of competing first order
matching priors.
1.2.1 Notation and Differential Equation
Consider a sequence {Xi}, i ≥ 1 of i.i.d possibly vector valued random variables with
common density f(x; θ) where the parameter vector θ = (θ1, . . . , θp)T belongs to Rp or
some open subset thereof and θ1 is the parameter of interest. Let X = (X1, . . . , Xn)T ,
where n is the sample size, and let θ = (θ1, . . . , θp)T
be the MLE of θ based on X. Let
θ have a prior density π(.) and be continuously differentiable over the entire parameter
15
space. Also let l(θ) = n−1∑n
i=1 logf(Xi; θ) and , with Dj = ∂/∂θj, for 1 ≤ j, r, s, u ≤ p, let
Vj = Djlogf(X1; θ), Lj,r,s = Eθ(VjVrVs), Lj,rs = Eθ(VjVrs), Ljrs = Eθ(Vjrs). (1.2.1.1)
Let I = ((Ijr)) be the per observation Fisher information matrix at θ. Define I−1 =
((Ijr)), then
τ jr = Ij1Ir1/I11, σjr = Ijr − τ jr. (1.2.1.2)
These are considered to be smooth functions of θ. Also, for 1 ≤ j, r ≤ p, let
πj(θ) = Djπ(θ), πjr(θ) = DjDrπ(θ), π = π(θ), πj = πj(θ), πjr = πjr(θ). (1.2.1.3)
Now we give the theorem which characterizes the first and second order probability
matching priors.
Theorem 1.2.1 (a) A prior π(.) is first order probability matching if and only if it
satisfies the partial differential equation
p∑j=1
Dj{(I11)− 1
2 Ij1π(θ)} = 0 (1.2.1.4)
(b) A prior π(.) is second order probability matching if and only if it satisfies, in addition,
the partial differential equation
∑j
∑r
∑s
∑u
1
3[Du{πτ jrLjrs(3σ
su + τ su)}]−p∑
j=1
p∑r=1
DjDr{πτ jr} = 0 (1.2.1.5)
Part (a) was proved originally by Peers (1965) and part (b) by Mukerjee and Ghosh,
M. (1997).
1.2.2 Special Cases
We now focus attention to the consequences of Theorem 1.2.1 in some important
special cases.
1.2.2.1 Case p=1
First consider matching priors in the one parameter models. For p=1 we have θ = θ1.
In this case, both θ and I are scalars. Then the first order matching equation (1.2.1.4)
16
reduces to
d
dθ{π(θ)/I1/2} = 0,
leading to the unique solution
π(θ) ∝ I1/2, (1.2.2.1)
which is the Jeffreys’ (1961) prior. Thus in this case, Jeffreys’ prior is the unique first
order matching prior. Furthermore, for p=1, by (1.2.1.1) and (1.2.1.2), (1.2.1.4) reduces to
1
3{π(θ)L111/I
2} − d
dθ{π(θ)/I} = constant. (1.2.2.2)
Now for Jeffreys’ prior, given by (1.2.2.1), and using the standard regularity
condition, it follows from Bartlett (1953) that
d
dθI = −(L1,11 + L111), (1.2.2.3)
and
L1,1,1 + 3L1,11 + L111 = 0. (1.2.2.4)
Thus, the left hand side of (1.2.2.2) simplifies to
1
3I−3/2L111 − d
dθI−1/2 =
1
6I−3/2L1,1,1 (1.2.2.5)
Summarizing the above results we get the following theorem.
Theorem 1.2.2 (a) For p=1, Jeffreys’ prior is the unique first order probability
matching prior.
(b) Furthermore, it is also second order probability matching if and only if I−3/2L1,1,1
is a constant free from θ.
Apart from this early result on matching priors, another result, again due to Welch
and Peers (1963), is presented in the next theorem.
Theorem 1.2.3 Under the one parameter location model
f(x; θ) = f ∗(x− θ), x εR1, (1.2.2.6)
17
Jeffreys’ prior, given by π(θ) = constant, is exact matching.
In the one parameter scale model
f(x; θ) =1
θf ∗(
x
θ), (1.2.2.7)
where θ > 0 and f ∗(.) is a density with support either R1 or [0,∞), Jeffreys’ prior given
by π(θ) ∝ θ−1 is second order probability matching. In this case too, it can be shown that
the matching is exact. Even beyond the standard one parameter location or scale models,
Jeffreys’ prior can enjoy the second order matching property. On the other hand there can
be models where the condition in Theorem 1.2.2 (b) does not hold and consequently no
second order matching prior is available.
Example 1.2.1 Consider the bivariate normal model with zero means, unit variances
and correlation coefficient θ, where |θ| < 1. Then
I =1 + θ2
(1− θ2)2 , L1,1,1 = −2θ(3 + θ2)
(1− θ2)3 .
Here I−3/2L1,1,1 is not free from θ. Hence by Theorem 1.2.2, Jeffreys’ prior is only first
order probability matching and no second order matching prior is available.
1.2.2.2 Case p=2
We now discuss two parameter models. Let p=2 where both the interest and nuisance
parameters are one-dimensional. This situation covers many models of interest such as the
location-scale family. By (1.2.1.2), here
I11 = Q, I12 = I21 = −Qζ, I22 = QI11/I22,
τ 11 = Q, τ 12 = τ 21 = −Qζ, τ 22 = Qζ2,
σ11 = 0, σ12 = σ21 = 0, σ22 = I−122 ,
(1.2.2.8)
where
Q = (I11 − I−122 I2
12)−1
, ζ = I12/I22. (1.2.2.9)
18
Hence using (1.2.2.8), the partial differential equations (1.2.1.4) and (1.2.1.5), for first and
second order probability matching, can be expressed as
D1{π(θ)Q1/2} −D2{π(θ)Q1/2ζ} = 0, (1.2.2.10)
and
1
3D1{π(θ)Q2(L111 − 3L112ζ + 3L122ζ
2 − L222ζ3)}
1
3D2{π(θ)Q2(L111 − 3L112ζ + 3L122ζ
2 − L222ζ3)}
D2{π(θ)QI−122 (L112 − 2L122ζ + L222ζ
2)}
−D12{π(θ)Q}+ 2D1D2{π(θ)Qζ} −D2
2{π(θ)Qζ2} = 0,
(1.2.2.11)
respectively. The second order matching condition (1.2.2.11) for the case p = 2 is due to
Mukerjee and Dey (1993).
1.2.3 Orthogonal Parameterization
The study of matching priors can get substantially simplified when there is an
orthogonal parameterization. We shall now discuss the implications of this phenomenon in
some detail. Let
I1j = 0(2 ≤ j ≤ p), (1.2.3.1)
identically in θ. Then I1j = 0, 2 ≤ j ≤ p, and by (1.2.1.2),
τ 11 = I11 = I−111 ;
τ jr = 0, if (j, r) 6= (1, 1);
σjr = 0, if either j = 1or r = 1;
σjr = Ijr, if j ≥ 2 and r ≥ 2.
Hence the partial differential equations (1.2.1.4) and (1.2.1.5), for first and second order
probability matching, can be expressed as
D1{π(θ)I−1/211 } = 0, (1.2.3.2)
19
and
p∑s=2
p∑u=2
Du{π(θ)I−111 IsuL11s +
1
3D1{π(θ)I−2
11 L111} −D21{π(θ)I−1
11 } = 0, (1.2.3.3)
respectively.
A prior π(.) satisfies (1.2.3.2) and is hence first order probability matching if and only
if it is of the form
π(θ) = d(θ(2))I1/211 , (1.2.3.4)
where d(.) (> 0) is any smooth function of θ(2) = (θ2, . . . , θp)T . This result is due to
Tibshirani (1989). Nicolaou (1993) also proved it using another approach. By (1.2.3.3), a
prior of the form (1.2.3.4) is second order probability matching if and only if
p∑s=2
p∑u=2
Du{d(θ(2))I−1/211 IsuL11s}+
1
6d(θ(2))D1{I−3/2
11 L1,1,1} = 0. (1.2.3.5)
1.2.4 Examples
Example 1.2.2 (Sun, 1997) Consider the Weibull model given by the density
(µ1/µ2)(x/µ2)µ1−1exp{−(x/µ2)
µ1}, x > 0,
where µ1, µ2(> 0) are unknown parameters. Interest lies in the shape parameter µ1.
Reparameterize (µ1, µ2) as
θ1 = µ1, θ2 = µ2exp(w/µ1), (1.2.4.1)
where w =∫∞0
(u logu) exp(−u) du and θ1, θ2 > 0. Then θ1 is the parameter of interest and
θ2 is orthogonal to θ1. Also,
I11 ∝ θ1−2, I22 = θ1
2/θ22, L1,1,1 ∝ θ1
−3, L112 ∝ (θ1θ2)−1.
By (1.2.3.4), therefore, first order matching is achieved if and only if π(θ) = d(θ2)/θ1.
Moreover, by (1.2.3.5) such a prior is second order matching if and only if d(θ2) ∝ θ2−1.
20
Hence π(θ) ∝ (θ1θ2)−1 is the unique second order matching prior. This prior becomes
proportional to (µ1µ2)−1 when reverted back to the original parameterization.
Example 1.2.3 (Mukerjee and Dey, 1993) This concerns the ratio of two independent
normal means and corresponds to a simpler version of the Fieller (1954) and Creasy (1954)
problem. Let the model be represented by the density
φ(x(1) − µ1)φ(x(2) − µ2), x(1), x(2) εR1,
where µ1, µ2(> 0) are unknown parameters. φ(.) represents the standard univariate normal
density. Interest lies in the ratio µ1/µ2. Reparameterize (µ1, µ2) as
θ1 = µ1/µ2, θ2 = (µ12 + µ2
2)1/2
, (1.2.4.2)
where θ1, θ2 > 0. Then θ1 is the parameter of interest and one can check that (1.2.4.2) is
an orthogonal parameterization. Furthermore,
I11 = θ22/(θ1
2 + 1)2, I22 = 1, L1,1,1 = 0, L112 = −θ2/(θ1
2 + 1)2.
Hence by (1.2.3.4), first order matching is achieved if and only if π(θ) = d(θ2)θ2
(θ12+1)
,
whereas by (1.2.3.5) such a prior is second order matching if and only if, in addition,
d(θ2) is a constant. Thus π(θ) ∝ θ2/(θ12 + 1) is the unique second order matching prior.
Interestingly under the original parameterization, this gets transformed to the uniform
prior on (µ1, µ2).
Example 1.2.4 (Tibshirani, 1989) Continuing with the setup of the last example, now
suppose that interest lies in the product µ1µ2. Reparameterize as
θ1 = µ1µ2, θ2 = µ22 − µ1
2 (1.2.4.3)
21
where θ1 > 0, θ2 εR1. Then θ1 is the parameter of interest and (1.2.4.3) is an orthogonal
parameterization. Furthermore,
I11 = 4I22 = (4θ12 + θ2
2)−1/2
, L1,1,1 = 0, L112 =1
2θ2 (4θ1
2 + θ22)−3/2
. (1.2.4.4)
By (1.2.3.4), first order matching is achieved if and only if
π(θ) = d(θ2)(4θ12 + θ2
2)−1/4
. (1.2.4.5)
Such a prior is also second order matching if and only if d(θ2) satisfies (1.2.3.5) which, in
view of (1.2.4.4), reduces to
D2{d(θ2)θ2(4θ12 + θ2
2)−3/4} = 0.
Clearly the above equation does not admit any solution for d(θ2). Thus no second
order matching prior is available in this example. Taking d(θ2) as constant in (1.2.4.5),
one gets the first order matching prior π(θ) ∝ (4θ12 + θ2
2)−1/4
. Under the original
(µ1, µ2)-parameterization, this is proportional to (µ12 + µ2
2)1/2
.
Example 1.2.5 (Garvan and Ghosh, M., 1999) Consider the Fisher-Von Mises probability
density function
f(y|µ, λ) =1
2πI0(λ)exp{λcos(y − µ)}.
Let µ be the parameter of interest and λ be the nuisance parameter. Then a second order
probability matching prior for µ denoted as πµ(2)(µ, λ) is obtained as
πµ(2)(µ, λ) = λ[1− 1
A(λ)− A2(λ)],
where A(λ) = I1(λ)/I0(λ), I1(λ) and I0(λ) being modified Bessel functions. Jeffreys’ prior,
however, is
πJ =
√λA(λ) [1− A(λ)
λ− A2(λ)].
Garvan and Ghosh, M. (1999) have investigated the performance of these priors by
calculating the frequentist coverage probabilities of some one-sided posterior credible
22
intervals for µ. Let µα denote the posterior α-quantile of µ given y = (y1, . . . , yn). Also,
let
P(µ,λ)(α; µ) = P(µ,λ)(µ ≤ µα|µ, λ) = P(µ,λ)(F (µ) ≤ F (µα)|µ, λ).
If the marginal posterior distribution of µ under the prior π yields quantiles so that
P(µ,λ)(α; µ) is close to α, then there is evidence that the chosen prior performs well. In
Tables 1-1 and 1-2, results are summarized for simulated tail probabilities of posterior
distributions of µ when µ = π , λ = 1, sample sizes n=5 and 10 and α is set at 0.05 and
0.95. It is evident that πµ(2)(µ, λ) is very close to the target.
1.3 Matching Priors for Distribution Functions
Matching priors for posterior quantiles were discussed at length in the previous
section. Since quantiles are intimately linked with the cumulative distribution function
(c.d.f), one may wonder how far these results hold when matching is done via c.d.f.
instead of quantiles. The results show that first order matching priors for quantiles remain
so when the analysis is based on a comparison of the posterior and frequentist c.d.f.’s.
In the situation where interest lies in several parameters, posterior quantiles are not
well defined but the joint posterior c.d.f. remains meaningful and provides a route for
finding matching priors.
1.3.1 Notation and Differential Equation
In this section, the setup and notations on both the prior and the model are the same
as in the previous section. We target priors π which achieve matching via distribution
functions of some standardized variables. More specifically, when θ1 is the parameter of
interest, while (θ2, . . . , θp)T is the vector of nuisance parameters, writing θ1 as the MLE of
θ1 with n−12 I11,
(I = ((Ijj′)), I
−1 = ((Ijj′))), as its asymptotic variance, we consider the
random variable y =√
n(θ1 − θ1)/(I11)
1/2. Specifically, if P π denotes the posterior of y
given the data X, what we want to achieve is the asymptotic matching
E[P π(y ≤ w|X)|θ] = P (y ≤ w|θ) + o(n−1). (1.3.1.1)
23
A prior π(.) ensures first order matching of the posterior and frequentist c.d.f.’s if and
only if it satisfies the partial differential equation
p∑j=1
Dj{π(θ)Ij1(I11)−1/2} = 0. (1.3.1.2)
Further, a prior π(.) ensures matching in the same sense at the second order if and only if
it satisfies the partial differential equations
∑j
∑r
(DjDr{τ jrπ(θ)} − 2Dr{τ jrπj(θ)}
)
−∑j,r,s,v
(Dv{Ljrsτ
jrσsvπ(θ)} −Dr{Ljsvτjrσsvπ(θ)}) = 0
and∑j,r,s,v
Dv{Ljrsτjrτ svπ(θ)} = 0.
(1.3.1.3)
The above results are due to Mukerjee and Ghosh, M. (1997). The two approaches based
on the quantiles (1.2.1.4) and c.d.f.’s (1.3.1.2), lead to the same first order matching
condition. However, the corresponding second order matching conditions (1.2.1.5) and
(1.3.1.3) are not identical. The second order conditions (1.3.1.3) are more restrictive than
(1.2.1.5) and often do not have a solution.
1.3.2 Orthogonal Parameterization
Under orthogonality of θ1 with (θ2, . . . , θp), it follows from (3.2.5) to (3.2.7) of Datta
and Mukerjee (2004) that such a prior π is of the form I111/2g(θ2, . . . , θp), where in
addition one needs to satisfy the two differential equations
A1 =∂2
∂θ12
(I11π(θ))− 2
∂
∂θ1
(I11π(θ))−p∑
s=2
p∑v=2
∂
∂θs
{E( ∂3logf
∂θ12∂θs
)I11Isvπ(θ)}
−p∑
s=2
p∑v=2
∂
∂θ1
{E( ∂3logf
∂θ1∂θs∂θv
)I11Isvπ(θ)} = 0.
(1.3.2.1)
and
A2 =
p∑s=2
p∑v=2
∂
∂θs
{E( ∂3logf
∂θ12∂θs
)I11Isvπ(θ)} = 0. (1.3.2.2)
24
Suppose now that interest lies in finding c.d.f. matching priors for a one dimensional
smooth function g(θ) of the parameter vector θ. Following Datta and Ghosh, J.K. (1995b),
the first order c.d.f. matching condition for g(θ) is presented below.
Let ∇g(θ) = (D1g(θ), . . . , Dpg(θ))T be the gradient vector of the parametric function
g(θ). Define the vector η(θ) = (η1, . . . , ηp)T by
η = [{∇g(θ)}T I−1{∇g(θ)}]−1/2I−1∇g(θ).
Then a prior π(.) ensures first order c.d.f. matching for g(θ) if and only if it satisfies
p∑j=1
Dj{ηjπ(θ)} = 0. (1.3.2.3)
1.3.3 Examples
Example 1.3.1 (Datta and Ghosh, J.K., 1995b) Consider the log-normal model given
by the density
f(x; θ) = (xθ2)−1φ
( logx− θ1
θ2
), x > 0,
where θ1 ε R1 and θ2 > 0. Suppose interest lies in g(θ) = exp (θ1 + 12θ2
2), the population
mean. Here
I11 = θ2−2, I22 = 2θ2
−2, I12 = 0.
Hence solving (1.3.2.3), the solution obtained is
π(θ) = θ2−2
(1 +
1
2θ2
2)1/2
.
1.4 Matching Priors for Highest Posterior Density Regions
Often a Bayesian analysis culminates in the construction of a posterior or predictive
distribution. This distribution constitutes a full description of the information about the
parameter of future observation. A useful summary of it, perhaps to augment a graph, is
provided by a highest posterior density (HPD) region or indeed a family of such regions.
25
With a possibly multidimensional interest parameter θ, such a region is of the form
{θ : π(θ|X) ≥ K},
where π(θ|X) is the posterior density of θ, under a prior π(.), given the data X, and
K depends on π(.) and X in addition to the chosen posterior credibility level. By the
Neyman- Pearson lemma , an HPD region has the smallest possible volume, given X, at
a chosen level of credibility. In this section, we consider priors that ensure approximate
frequentist validity of HPD regions with margin of error o(n−1), where n is the sample
size. Priors of this kind are called HPD matching priors. They can be useful even when
the interest parameter is multidimensional since HPD regions are well defined in such
situations.
1.4.1 Notation and Differential Equation
We continue with the setup and notation of Section 1.2. Suppose interest lies in
the entire parameter vector θ = (θ1, . . . , θp)T , i.e., there is no nuisance parameter. HPD
matching priors are required to ensure
Pθ{θ ε Q(1−α)(π,X)} = 1− α + o(n−1)
for all α and θ. We now give a characterization for HPD matching priors when interest lies
in the entire parameter vector θ. The result is due to Ghosh, J.K. and Mukerjee (1993b)
who reported it in another equivalent form.
Theorem 1.4.1 A prior π(.) is HPD matching for θ if and only if it satisfies the
partial differential equation
∑j,r,s,u
Du{π(θ)LjrsIjrIsu} −
∑j,r
DjDr{π(θ)Ijr} = 0. (1.4.1.1)
26
1.4.2 Special Case: p=1
Peers (1968) and Severini (1991) explored HPD matching priors for scalar θ. Then
p = 1, θ = θ1, I becomes a scalar, and (1.4.1.1) becomes
π(θ)L111I−2 − d
dθ{π(θ)I−1} = constant.
Using regularity condition (1.2.2.3),the above is equivalent to
I−1(dπ(θ)
dθ
)+ π(θ)L1,11I
−2 = constant. (1.4.2.1)
The HPD matching condition (1.4.2.1), arising for p = 1, was first reported in Peers
(1968) and is equivalent to the corresponding condition given in Severini (1991).
Continuing with p = 1 and again using (1.2.2.3), a prior of the form π(θ) ∝ Ir, where
r is a real number, satisfies (1.4.2.1) if and only if
Ir−2{(1− r)L1,11 − rL111} = constant. (1.4.2.2)
In particular, taking r = 1/2 in the above, (1.4.2.1) holds for Jeffreys’ prior if and
only if
I−3/2(L1,11 − L111) = constant. (1.4.2.3)
The condition (1.4.2.3) holds for the one parameter location and scale models introduced
in (1.2.2.6) and (1.2.2.7) respectively. For these models, Jeffreys prior is HPD matching
for θ. However, even with p = 1 , Jeffreys’ prior does not always enjoy the HPD matching
property.
Example 1.4.1 Consider the bivariate normal model with zero means, unit variances
and correlation coefficient θ, where |θ| < 1. Then
I =1 + θ2
(1− θ2)2 , L1,11 = −1
2L111 =
2θ(3 + θ2)
(1− θ2)3 .
Then (1.4.2.3) does not hold but (1.4.2.2) is satisfied by r = -1. Hence Jeffreys’ prior is
not HPD matching for θ but π(θ) ∝ I−1 enjoys this property.
27
1.4.3 Orthogonal Parameterization
Now suppose interest lies in θ1 and θ2, . . . , θp be nuisance parameters. As before,
an HPD matching prior for θ1 is defined as one that ensures frequentist validity of HPD
regions for θ1 with margin of error o(n−1). In order to facilitate the presentations, we
work under an orthogonal parameterization. Since θ1 is one dimensional, this can always
be achieved by suitably choosing the nuisance parameters (Cox and Reid, 1987). The
following theorem is due to Ghosh, J.K. and Mukerjee (1995).
Theorem 1.4.2. Suppose orthogonal parameterization holds. Then a prior π(.) is
HPD matching for θ1 if and only if it satisfies the partial differential equation
p∑s=2
p∑u=2
Du{π(θ)I−111 IsuL11s}+ D1{π(θ)I−2
11 L111} −D12{π(θ)I−1
11 } = 0 (1.4.3.1)
It can be verified that for models where
D1(I−3/211 L111) = 0, (1.4.3.2)
under orthogonal parameterization, any second order matching prior for posterior
quantiles of θ1 is also HPD matching for θ1. HPD matching priors are invariant of the
parameterization as long as the object of interest, viewed either as a parametric function
under an original parameterization or as a canonical parameter after reparameterization,
remains unaltered.
1.4.4 Examples
Example 1.4.2. (Ghosh, J.K. and Mukerjee, 1995a) We revisit Example (1.2.3) where
interest lies in the ratio of independent normal means. Here orthogonal parameterization
holds. Furthermore,
I11 = θ22/(θ1
2 + 1)2, I22 = 1, L112 = −θ2/(θ1
2 + 1)2, L111 = −3L1,11 = 6θ1θ2
2/(θ12 + 1)
3.
(3.1.1) is not satisfied. In addition, no first order matching prior for posterior quantiles
of θ1 is HPD matching for θ1. Solutions to the HPD matching condition (1.4.3.1) for θ1
28
however exist. For example, any prior of the form
π(θ) ∝ θ1r1θ2
r2/(θ12 + 1)
r3,
where r1, r2 and r3 are real, satisfies (1.4.3.1) if and only if one of the following holds:
(a) r1 = 0, r2 = 6, r3 = 3/2,
(b) r1 = 1, r2 = 13, r3 = 2,
(c) r1 = 0, r2 = 1, r3 = −1,
(d) r1 = 1, r2 = −2, r3 = −1/2.
Example 1.4.3 Consider the gamma model
f(x; θ) = xθ1−1e−x/θ2/{θ2θ1Γ(θ1)}, x > 0,
in a natural parameterization, where θ1, θ2 > 0. Then
I11 =d2
dθ12 log Γ(θ1), I22 = θ1θ2
−2, I12 = θ2−1,
L111 = − d3
dθ13 log Γ(θ1), L112 = 0, L122 = θ2
−2, L222 = 4θ1θ2−3.
Hence the prior π(θ) ∝ (θ13θ2)
−1satisfies (1.4.1.1) and is HPD matching for θ = (θ1, θ2)
T .
1.5 Matching Priors Associated with Other Credible Regions
In this section we focus on priors that ensure approximate frequentist validity of
posterior credible regions obtained by inverting certain commonly used statistics. The
results, when combined with those of the HPD matching case, can help in narrowing down
the choice of matching priors especially when the interest parameter is multidimensional.
We will begin with the matching priors that are associated with the LR statistics followed
by those associated with the Rao’s Score and Wald’s statistic.
29
1.5.1 Matching Priors Associated with the LR Statistic
1.5.1.1 Introduction
Suppose that the interest lies in the entire parameter vector θ = (θ1, . . . , θp)T , i.e.,
there is no nuisance parameter. The LR statistic for θ is given by
MLR(θ,X) = 2n{l(θ) − l(θ)},
where X = (X1, . . . , Xn)T and l(θ) = n−1∑n
i=1 log f(Xi; θ). Given a prior π(.), the
inversion of the above statistic yields a posterior credible region for θ as
Q(1−α)LR (π,X) = {θ : MLR(θ,X) ≤ k1−α(π,X)},
where k1−α(π, X), which may depend on π(.) and X but not on θ, has to be so chosen that
the relation
P π{θ εQ(1−α)LR (π, X) |X} = 1− α + o(n−1)
holds.
1.5.1.2 Differential equation
The matching priors are characterized in the theorem below.
Theorem 1.5.1 A prior π(.) is LR matching for θ if and only if it satisfies the partial
differential equation
∑j,r,s,u
Du{π(θ) LjrsIjrIsu} −
∑j,r
DjDr{π(θ) Ijr}+ 2∑j,r
Dr{Ijrπj(θ)} = 0. (1.5.1.1)
The above result is due to Ghosh, J.K. and Mukerjee (1991).
1.5.1.3 Special case: p=1
If p=1 then θ = θ1, I is a scalar and (1.5.1.1) becomes
π(θ)L111I−2 − d
dθ{π(θ)I−1}+ 2I−1
(dπ(θ)
dθ
)= constant.
30
Using regularity condition (1.2.2.3),the above is equivalent to
I−1(dπ(θ)
dθ
)− π(θ)L1,11I−2 = constant. (1.5.1.2)
Equation (1.5.1.2) is in agreement with the findings of Severini (1991) who studied this
problem for scalar θ.
Continuing with p = 1 and again using (1.2.2.3), a prior of the form π(θ) ∝ Ir, where
r is a real number, satisfies (1.5.1.2) if and only if
Ir−2{rL111 + (1 + r)L1,11} = constant. (1.5.1.3)
In particular, taking r = 1/2 in the above and using the regularity condition (1.2.2.4),
it follows that Jeffreys’ prior satisfies (1.5.1.2) if and only if
I−3/2(L1,1,1) = constant. (1.5.1.4)
The above, by Theorem 1.2.2, is also the condition under which Jeffreys’ prior is second
order matching for the posterior quantiles of θ.
The condition (1.5.1.4) holds for the one parameter location and scale models
introduced in (1.2.2.6) and (1.2.2.7) respectively. For these models, Jeffreys prior is LR
matching for θ. On the other hand for the bivariate normal model considered in Examples
(1.2.1) and (1.4.1), the condition (1.5.1.4) is not met but (1.5.1.3) holds with r = 1. Thus
for this model π(θ) ∝ I is LR matching for θ though Jeffrey’s prior does not enjoy this
property.
1.5.1.4 Nuisance parameters and orthogonality
The LR matching condition (1.5.1.1) given above allows θ to be possibly multidimensional
but presumes that nuisance parameters are absent. Several results on characterizations
of LR matching priors in the presence of nuisance parameters have been reported in
the literature. DiCiccio and Stern (1994) allowed both the interest and the nuisance
parameters to be possibly multidimensional and made no assumption about orthogonal
31
parameterization. Earlier, Ghosh, J.K. and Mukerjee (1992b, 1994b) derived special cases
for the matching conditions when both the interest and nuisance parameters are one
dimensional and orthogonal parameterization holds.
Theorem 1.5.2 Suppose orthogonal parameterization holds. Then a prior π(.) is LR
matching for θ1 if and only if it satisfies the partial differential equation
p∑s=2
p∑u=2
Du{π(θ)I−111 IsuL11s}
+ D1[I−111 {
∂π(θ)
∂θ1
− π(θ)(I−111 L1,11 −
p∑s=2
p∑u=2
IsuL1su)}] = 0.
(1.5.1.5)
For a general comparison between the LR and HPD matching conditions (1.5.1.5) and
(1.4.3.1) for θ1, it can be shown that the difference in the left hand side of (1.5.1.5) and
(1.4.3.1) reveals that an HPD matching prior satisfying (1.4.3.1) is also LR matching for
θ1 if and only if it satisfies
D1[I−111 {2π1(θ) + π(θ)
p∑s=2
p∑u=2
IsuL1su}] = 0.
1.5.2 Matching Priors Associated with Rao’s Score and Wald’s Statistic
1.5.2.1 Introduction
Two other statistics enjoy widespread popularity for constructing matching priors.
These are Rao’s score and Wald’s statistics. Suppose interest lies in the entire parameter
vector θ. Then Rao’s score statistic is based on the score vector
∇l(θ) = (D1l(θ), . . . , Dpl(θ))T
In the posterior setup, with h = (h1, . . . , hp)T = n1/2(θ − θ),
n1/2∇l(θ) = −Ch + o(1),
32
where n1/2∇l(θ) is p-variate normal with null mean vector and dispersion matrix C. A
posterior version of the Rao’s score statistic is given by
MRao(θ, X) = n{∇l(θ)}T C−1{∇l(θ)}
Similarly Wald’s statistic is based on h = n1/2(θ − θ), which is asymptotically p-variate
normal with null mean vector and dispersion matrix C−1. A posterior version of the
Wald’s statistic is given by
MWald(θ, X) = n(θ − θ)TC(θ − θ).
1.5.2.2 Differential equation
We present the characterization of the matching priors via Rao’s score statistic below.
Theorem 1.5.3 A prior π(.) ensures frequentist validity with margin of error o(n−1),
of posterior credible regions for θ given by the inversion of Rao’s score statistic if and only
if it satisfies the partial differential equations
∑j
∑r
[2Dr{Ijr πj(θ)} − DjDr{π(θ)Ijr}] = 0, (1.5.2.1)
and∑
j,r,s,u
Du{π(θ)LjrsIjrIsu} = 0 (1.5.2.2)
As noted in Rao and Mukerjee (1995), the classes of matching priors based on Rao’s
score and Wald’s statistic are identical. Lee (1989) also studied the matching problem
associated with Wald’s statistic.
Equations (1.5.2.1) and (1.5.2.2) add up to the matching condition (1.5.1.1) for the
LR statistic. Therefore, any matching prior arising from Rao’s score or Wald’s statistic
also enjoys the same property for the LR statistic. The converse is however, not true in
general.
33
1.5.2.3 Special case: p=1
Using the regularity condition (1.2.2.3), the matching conditions (1.5.2.1) and
(1.5.2.2) reduce to
I−1(dπ(θ)
dθ
)− π(θ)(L1,11 + L111)I−2 = constant, (1.5.2.3)
and
π(θ)L111I−2 = constant, (1.5.2.4)
respectively. By (1.2.2.3), these conditions are met by Jeffreys’ prior if and only if
I−3/2L1,11 = constant; I−3/2L111 = constant. (1.5.2.5)
(1.5.2.5) entails the corresponding condition for the LR statistic. For the one parameter
location and scale models, (1.2.2.3) again holds. Thus in these situations, Jeffreys’ prior
enjoys the matching property for both the score statistic and the Wald statistic. On
the other hand, in Example (1.4.1), concerning a bivariate normal model with unknown
correlation cooefficient, not only (1.5.2.5) fails to hold but also no solution to the matching
conditions (1.5.2.3) and (1.5.2.4) is available.
Table 1-1. Simulated Tail Probabilities of Posterior Distributions in Fisher-Von MisesP(µ,λ)(0.05; µ)
n πJ πµ(2)(µ, λ)
5 0.0605 0.051810 0.0564 0.0538
Table 1-2. Simulated Tail Probabilities of Posterior Distributions in Fisher-Von MisesP(µ,λ)(0.95; µ)
n πJ πµ(2)(µ, λ)
5 0.9489 0.957010 0.9421 0.9475
34
CHAPTER 2MATCHING PRIORS FOR SOME BIVARIATE NORMAL PARAMETERS
In the last chapter we have seen that matching is accomplished through either (a)
posterior quantiles, (b) distribution functions, (c) highest posterior density (HPD) regions,
or (d) inversion of certain test statistics. However, priors based on (a),(b),(c), or (d) need
not always be identical. Specifically, it may so happen that there does not exist any prior
satisfying all the four criteria.
In this chapter, we consider the bivariate normal distribution where the parameters
of interest are either the (i) regression coefficient, (ii) the generalized variance, i.e. the
determinant of the variance-covariance matrix, and (iii) ratio of the conditional variance
of one variable given the other divided by the marginal variance of the other variable. We
have been able to find a prior which meets all the four matching criteria for every one of
these parameters.
2.1 The Orthogonal Reparameterization
Let (X1i, X2i), (i = 1, . . . , n) be independent and identically distributed random
variables having a bivariate normal distribution with means µ1 and µ2, variances σ12(> 0)
and σ22(> 0), and correlation coefficient ρ(|ρ| < 1). We use the transformation
β = ρσ2/σ1, θ = σ1σ2(1− ρ2)1/2
and η = σ2(1− ρ2)1/2
/σ1. (2.1.1)
With this reparameterization, the bivariate normal distribution can be rewritten as
f(X1, X2) = (2πθ)−1exp
[− 1
2
{(X2 − µ2 − β (X1 − µ1)
)2
θη+
η(X1 − µ1)2
θ
}](2.1.2)
It may be noted that β is the regression coefficient of X2 on X1, while θ2 = σ12σ2
2(1− ρ2)
is the determinant of the variance-covariance matrix. Also, η2 = V (X2|X1)/V (X1).
35
With the above reparameterization, and that E(X1 − µ1)2 = θ/η and E(X2 − µ2 −
β(X1 − µ1)2 = θη, the Fisher Information matrix reduces to
I(µ1, µ2, β, θ, η) =
A 0
0 Diag(η−2, θ−2, η−2)
, (2.1.3)
where
A =
β2
θη+ η
θ− β
θη
− βθη
1θη
.
This establishes immediately the mutual orthogonality of (µ1, µ2), β,θ and η in the
sense of Huzurbazar (1950) and Cox and Reid (1987). Such orthogonality is often referred
to as ”Fisher Orthogonality”.
The inverse of the information matrix is then
I−1(µ1, µ2, β, θ, η) =
A−1 0
0 Diag(η2, θ2, η2)
, (2.1.4)
where
A−1 =
1θ3η
βθ3η
βθ3η
β2
θ3η+ η
θ3
.
Since the parameters of interest are orthogonal to (µ1, µ2), and it is customary to use
uniform (R2) prior on (µ1, µ2). Hence, we shall consider only priors of the form
Π0(µ1, µ2, β, θ, η) ∝ π(β, θ, η), (2.1.5)
and find π such that the matching criteria given in (a)-(d) are all satisfied for β, θ and η
each individually. This we are going to explore in the next four sections.
Before ending this section we state a lemma which is repeatedly in the sequel.
Lemma 2.1 For the bivariate normal density given in (2.1.2),
E(∂logf/∂β)3 = 0, E[(∂logf/∂β)(∂2logf/∂β2)] = 0; (2.1.6)
36
E(∂3logf/∂β3) = 0, E(∂3logf/∂β2∂θ) = (θη2)−1
, E(∂3logf/∂β2∂η) = η−3; (2.1.7)
E(∂3logf/∂β∂θ2) = 0,E(∂3logf/∂β∂η2) = 0; (2.1.8)
E(∂logf/∂θ)3 = 2/θ3, E[(∂logf/∂θ)(∂2logf/∂θ2)] = −2/θ3; (2.1.9)
E(∂3logf/∂θ3) = 4/θ3, E(∂3logf/∂θ2∂η) = 0, E(∂3logf/∂θ∂η2) = (θη2)−1
; (2.1.10)
E(∂logf/∂η)3 = 0, E[(∂logf/∂η)(∂2logf/∂η2)] = −η−3,E(∂3logf/∂η3) = 3η−3. (2.1.11)
Proof The proofs are based on the independence of X2−µ2−β(X1−µ1) and X1−µ1
along with the fact that X2 − µ2 − β(X1 − µ1) ∼ N(0, θη) and X1 − µ1 ∼ N(0, θ/η). Then
E{(X1 − µ1)
2}
= σ21 = θ
ηand
E{(X2 − µ2 − β(X1 − µ1))
2}
= σ22 + β2σ2
1 − 2βρσ1σ2
= σ22 + ρ2σ2
2 − 2ρ2σ22 = σ2
2(1− ρ2) = θη.
We begin with (2.1.6).
E(∂logf
∂β
)3= E
((X1 − µ1)3(X2 − µ2 − β(X1 − µ1))
3
θ3η3
)
=E(X1 − µ1)
3E(X2 − µ2 − β(X1 − µ1))3
θ3η3, (by independence)
= 0, (odd central moments are zero)
and
E(∂logf
∂β
∂2logf
∂β2
)= −E
((X1 − µ1)3(X2 − µ2 − β(X1 − µ1))
θ2η2
)
=E(X1 − µ1)
3E(X2 − µ2 − β(X1 − µ1))
θ2η2, (by independence)
= 0, (odd central moments are zero).
Next, since ∂2logf∂β2 = − (X1−µ1)2
θη, it is free from β and hence E(∂3logf/∂β3) = 0 in (2.1.7).
Further
E(∂3logf
∂β2∂θ
)= E
{(X1 − µ1)2
ηθ2
}
=θ/η
ηθ2= (θη2)
−1,
37
and
E(∂3logf
∂β2∂η
)= E
{(X1 − µ1)2
η2θ
}
=θ/η
θη2= η−3.
To show (2.1.8) we note that
E(∂3logf
∂β∂θ2
)= 2 E
((X1 − µ1)(X2 − µ2 − β(X1 − µ1))
θ3η
)
=2 E(X1 − µ1)E(X2 − µ2 − β(X1 − µ1))
θ3η, (by independence)
= 0, (odd central moments are zero).
and
E(∂3logf
∂β∂η2
)= 2 E
((X1 − µ1)(X2 − µ2 − β(X1 − µ1))
θη3
)
=2 E(X1 − µ1)E(X2 − µ2 − β(X1 − µ1))
θη3, (by independence)
= 0, (odd central moments are zero).
To show that (2.1.9) holds, some tedious algebra needs to be done. To this end note that
(∂logf
∂θ
)3=
(− 1
θ+{X2 − µ2 − β(X1 − µ1)
2}2θ2η
+η(X1 − µ1)
2
2θ2
)3
=− 1
θ3+
1
8θ6
{{X2 − µ2 − β(X1 − µ1)}6
η3+ η3(X1 − µ1)
6
+3{X2 − µ2 − β(X − 1− µ1)}4(X1 − µ1)
2
η
+ 3η{X2 − µ2 − β(X1 − µ1)}2(X1 − µ1)4
}
+3
2θ4
({X2 − µ2 − β(X1 − µ1)}2
η+ η(X1 − µ1)
2)
− 3
4θ5
{{X2 − µ2 − β(X1 − µ1)}4
η2+ η2(X1 − µ1)
4
+ 2{X2 − µ2 − β(X1 − µ1)}2(X1 − µ1)2
}.
38
Now we take expectations and use the formulas for the higher order central moments of a
normal distribution. This results in
E(∂logf
∂θ
)3=− 1
θ3+
48θ3
8θ6+
6θ
2θ4− 24θ2
4θ5
=2
θ3.
Similarly
(∂logf
∂θ
)(∂2logf
∂θ2
)=
{− 1
θ+
1
2θ2
({X2 − µ2 − β(X1 − µ1)}2
η+ η(X1 − µ1)
2)}
×{
1
θ2− 1
θ3
({X2 − µ2 − β(X1 − µ1)}2
η+ η(X1 − µ1)
2)}
=− 1
θ3+
1
θ4
({X2 − µ2 − β(X1 − µ1)}2
η+ η(X1 − µ1)
2)
+1
2θ4
({X2 − µ2 − β(X1 − µ1)}2
η+ η(X1 − µ1)
2)
− 1
2θ5
({X2 − µ2 − β(X1 − µ1)}2
η+ η(X1 − µ1)
2)2
.
Taking expectations on both sides we obtain
E(∂logf
∂θ
∂2logf
∂θ2
)= − 1
θ3+
2θ
θ4+
θ
θ4− 8θ2
2θ5
= − 2
θ3
Next, (2.1.10) holds because
E(∂3logf
∂θ3
)= E
{− 2
θ3+
3
θ4
({X2 − µ2 − β(X1 − µ1)}2
η+ η(X1 − µ1)
2)}
= − 2
θ3+
6θ
θ4
= 4/θ3,
E(∂3logf
∂θ2∂η
)= E
{{X2 − µ2 − β(X1 − µ1)}2
θ3η2− (X1 − µ1)
2
θ3
}
=θη
θ3η2− θ
ηθ3
= 0,
39
and
E(∂3logf
∂θ∂η2
)= E
({X2 − µ2 − β(X1 − µ1)}2
θ2η3
)
=θη
θ2η3= (θη2)
−1.
Finally, to show (2.1.11), we note that
(∂logf
∂η
)3=
{{X2 − µ2 − β(X1 − µ1)}2
2θη2− (X1 − µ1)
2
2θ
}3
=
{{X2 − µ2 − β(X1 − µ1)}6
(2θη2)3− (X1 − µ1)
6
(2θ)3
− 3{X2 − µ2 − β(X1 − µ1)}4
(2θη2)2
(X1 − µ1)2
2θ
+3{X2 − µ2 − β(X1 − µ1)}2
2θη2
(X1 − µ1)4
(2θ)2
},
which on taking expectations gives
E(∂logf
∂η
)3=
15
8η3− 15
8η3− 9
8η3+
9
8η3= 0.
Also, because
(∂logf
∂η
∂2logf
∂η2
)=
({X2 − µ2 − β(X1 − µ1)}2
2θη2− (X1 − µ1)
2
2θ
)
× −{X2 − µ2 − β(X1 − µ1)}2
θη3,
E(∂logf
∂η
∂2logf
∂η2
)=E
{({X2 − µ2 − β(X1 − µ1)}2
2θη2− (X1 − µ1)
2
2θ
)
× −{X2 − µ2 − β(X1 − µ1)}2
θη3
}
=− E{X2 − µ2 − β(X1 − µ1)}4
2θ2η5
+E{X2 − µ2 − β(X1 − µ1)}2E(X1 − µ1)
2
2θ2η3
=− 3(θη)2
2θ2η5+
θ2
2θ2η3= − 1
2η3
=− (η3)−1,
40
and
E(∂3logf
∂η3
)=E
{3{X2 − µ2 − β(X − 1− µ1)}2
θη4
}
=3θη
θη4= 3η−3.
2.2 Quantile Matching Priors
Here one is interested in the approximate frequentist validity of the posterior quantiles
of a one-dimensional interest parameter. When β is the parameter of interest, from
(1.2.3.4) it follows that the class of first order probability matching priors for β is given by
π(.) ∝ η−1g0(θ, η), (2.2.1)
where g0 is an arbitrary smooth function of (θ, η). In order that such a prior satisfies the
second order matching property, we need to find g0 by solving ( see (1.2.3.5))
∂
∂θ
{g0(θ, η) η θ2 E
(∂3logf
∂β2∂θ
)}+
∂
∂η
{g0(θ, η) η η2 E
(∂3logf
∂β2∂η
)}
+1
6g0(θ, η)
∂
∂β
{η3E
(∂logf
∂β
)3}
= 0.
(2.2.2)
From (2.1.6) and (2.1.7) in Lemma 2.1, (2.2.2) reduces to
∂
∂θ[g0(θ, η)(θ/η)] +
∂
∂ηg0(θ, η) = 0, (2.2.3)
and a solution to (2.2.3) is provided by g0(θ, η) ∝ θ−1. Thus the prior π(β, θ, η) =
θ−1η−1 satisfies the second order matching property. Next we proceed towards finding a
second order matching prior for θ. First, from (1.2.3.4), we obtain the class of first order
matching priors for θ as given by
π(β, θ, η) = θ−1g1(β, η). (2.2.4)
41
In order to find a second order probability matching prior for θ, we now need to solve
∂
∂β
{g1(β, η) θ η2 E[(X1 − µ1)(X2 − µ2 − β(X1 − µ1))]
θ3η
}
+∂
∂η
{g1(β, η) θ η2E
{3(X2 − µ2 − β(X1 − µ1))
2
θ3η2− (X1 − µ1)
2
θ3
}}
+1
6g1(β, η)
∂
∂θ
{θ3 E(
∂logf
∂θ)3
}= 0.
(2.2.5)
Since E[(X1 − µ1)(X2 − µ2 − β(X1 − µ1))] = 0, E[(X2 − µ2 − β(X1 − µ1))2] = θη
and E[(X1 − µ1)2] = θ/η, from (2.1.9) of Lemma 2.1 and (2.2.5), one needs to solve
2 ∂∂η
[g1(β, η)(η/θ)] = 0. Any g1(β, η) ∝ η−1g∗(β) provides the solution. In particular, taking
g∗ = 1, π(β, θ, η) ∝ (θη)−1 is a second order matching prior for θ.
Finally, when η is the parameter of interest, from (1.2.3.4) , once again, the class of
first order matching priors is given by
π(β, θ, η) = η−1g2(β, θ). (2.2.6)
In order to find a second order matching prior for η, we need to solve
∂
∂θ
{g2(β, θ)η−1η2θ2E(
∂3logf
∂θ∂η2)
}− ∂2
∂η2
{g2(β, θ)η−1η2
}
+1
3
∂
∂η
{g2(β, θ)η−1η4E(
∂3logf
∂η3)
}= 0.
(2.2.7)
From (2.1.10) and (2.1.11) of Lemma 2.1, (2.2.7) reduces to
∂
∂θ
{g2(β, θ)η−1η2θ2 1
θη2
}− ∂2
∂η2
{g2(β, θ)η−1η2
}+
1
3
∂
∂η
{g2(β, θ)η−1η4 1
η3
}= 0. (2.2.8)
Hence, a solution to (2.2.8) is provided by g2(β, θ) ∝ θ−1. So the prior π(β, θ, η) = θ−1η−1
satisfies the second order matching property in this case too. Thus a second order quantile
matching prior which works for every β, θ and η is given by π(µ1, µ2, β, θ, η) ∝ (θη)−1.
Back to the original parameterization, namely, (µ1, µ2, σ1, σ2, ρ), this reduces to the prior
π(σ1, σ2, ρ) = σ1−2(1− ρ2)
−1. (2.2.9)
42
This prior has been identified by Berger and Sun (2007) as the right-Haar prior and
one-at-a-time reference prior, and indeed provides exact quantile matching rather than
just asymptotic matching for a variety of parameters of interest including the ones
considered here. Moreover, as shown by these authors, when β is the parameter of
interest, any prior of the form σ1−(3−a)(1− ρ2)
−1, a > 0 for (µ1, µ2, σ1, σ2, ρ) achieves exact
matching, while when θ is the parameter of interest, both the priors σ1−2(1− ρ2)
−1and
σ1−1σ2
−1(1− ρ2)−3/2
achieve exact matching.
2.3 Matching Via Distribution Functions
In this section, we target priors π which achieve matching via distribution functions
of some standardized variables. When β is the parameter of interest, distribution function
matching priors are obtained by solving the differential equations
∂
∂β
(η2
{θ2E
(∂3logf
∂β2∂θ
)+ η2E
(∂3logf
∂β2∂η
)}π(β, θ, η)
)= 0; (2.3.1)
and
∂
∂β
(η2
{E
(∂3logf
∂β∂θ2
)θ2 + E
(∂3logf
∂β∂η2
)η2
})= 0. (2.3.2)
Using (2.1.7) and (2.1.8) of Lemma 2.1, (2.3.1) simplifies to ∂∂β
[(θ + η)π(β, θ, η)] = 0
which holds trivially for any prior π(β, θ, η) which does not depend on β, including the
prior π(β, θ, η) ∝ (θη)−1, the one found in the previous subsection. Again, with β as the
parameter of interest, for any prior π(β, θ, η) which does not depend on β, we solve
∂
∂θ{(θη2)
−1η2θ2π(β, θ, η)}+
∂
∂η{η−3η2η2π(β, θ, η)}
=∂
∂θ
{θ π(β, θ, η)
}+
∂
∂η
{η π(β, θ, η)
}= 0.
Once again π(β, θ, η) ∝ (θη)−1 will satisfy (2.3.1). For this prior (2.3.2) reduces to
η2θ2 ∂
∂βE
(∂3logf
∂β∂θ2
)+ η4 ∂
∂βE
(∂3logf
∂β∂η2
)= 0.
43
From (2.1.8) of Lemma 2.1, E(
∂3logf∂β∂θ2
)= E
(∂3logf∂β∂η2
)= 0. Hence matching via distributions
is achieved with any prior of the form π(µ1, µ2, β, θ, η) ∝ h(θ, η), and in particular
h(θ, η) ∝ (θη)−1.
Next when θ is the parameter of interest, for finding a matching prior, one needs to
solve first
∂2
∂θ2
(θ2π
)− 2
∂
∂θ
(θ2∂π
∂θ
)− ∂
∂θ
(θ4E
(∂3logf
∂θ3
)π
)− ∂
∂θ
(θ4E
(∂3logf
∂θ3
)π
)= 0, (2.3.3)
which simplifies to
∂2
∂θ2
(θ2π
)− 2
∂
∂θ
(θ2∂π
∂θ
)− 12
∂
∂θ
(θπ
)= 0.
Hence any prior π(.) ∝ θ−1g(β, η) will satisfy this equation. Such a prior also satisfies
∂
∂θ
(θ4E
(∂3logf
∂θ3
)π)
= 0. (2.3.4)
Finally when η is the parameter of interest, for finding a matching prior, one needs to
solve
∂2
∂η2
{η2π
}− 2
∂
∂η
{η2∂π
∂η
}− ∂
∂θ
{E(
∂3logf
∂θ∂η2)η2θ2π
}
− ∂
∂η
{E(
∂3logf
∂η∂β2)η2η2π
}= 0,
(2.3.5)
and
∂
∂η
(η4E
(∂3logf
∂η3
)π)
= 0. (2.3.6)
Again, using (2.1.7), (2.1.10) and (2.1.11) of Lemma 2.1, the prior π(β, θ, η) = (θη)−1
satisfies the second order matching property.
2.4 Highest Posterior Density (HPD) Matching Priors
We now turn attention to HPD matching priors for each one of the parameters β, θ
and η. While quantile matching property is quite desirable for construction of one sided
credible intervals, the HPD matching or matching via inversion of test statistics seems
more appropriate for the construction of two-sided credible intervals. We first consider
44
HPD region for β. In view of the orthogonality result derived in Section 2.1, such a prior
π0(µ1, µ2, β, θ, η) ∝ π(β, θ, η) must satisfy (see (1.4.3.1) of Chapter 1)
∂
∂θ
(η2θ2E
(∂3logf
∂β2∂θ
)π
)+
∂
∂η
(η4E
(∂3logf
∂β2∂η
)π
)+
∂
∂β
(η4E
(∂3logf
∂β3
)π
)
− ∂2
∂β2
(η2π
)= 0.
(2.4.1)
Using (2.1.7) of Lemma 2.1, (2.4.1) reduces to
∂
∂θ(θπ) +
∂
∂η(ηπ)− ∂2
∂β2(η2π) = 0. (2.4.2)
Clearly the prior π(β, θ, η) ∝ (θη)−1 satisfies (2.4.2).
Next consider θ as the parameter of interest. Now one needs to solve
∂
∂β
(η2θ2E
(∂3logf
∂θ2∂β
)π
)+
∂
∂η
(θ2η2E
(∂3logf
∂θ2∂η
)π
)+
∂
∂θ
(θ4E
(∂3logf
∂θ3
)π
)
− ∂2
∂θ2
(θ2π
)= 0.
(2.4.3)
Again from (2.1.8) and (2.1.10) of Lemma 2.1, (2.4.3) simplifies to
− 2∂
∂θ(θπ)− ∂2
∂θ2(θπ) = 0, (2.4.4)
which is satisfied by the prior π(β, θ, η) ∝ (θη)−1.
Finally, when η is the parameter of interest, we need to solve
∂
∂β
(η4E
(∂3logf
∂η2∂β
)π
)+
∂
∂θ
(η2θ2E
(∂3logf
∂η2∂θ
)π
)+
∂
∂η
(η4E
(∂3logf
∂η3
)π
)
− ∂2
∂η2
(η2π
)= 0.
(2.4.5)
From (2.1.8), (2.1.10) and (2.1.11) of Lemma 2.1, (2.4.5) reduces to
∂
∂θ(θπ) +
∂
∂η(ηπ)− ∂2
∂η2(η2π) = 0. (2.4.6)
Again π(β, θ, η) ∝ (θη)−1 will do.
45
2.5 Matching Priors Via Inversion of Test Statistics
One traditional way to derive frequentist confidence intervals is inversion of certain
test statistics. The most popular such test is the likelihood ratio test. When β is the
parameter of interest, LR matching prior π for β is obtained by solving the differential
equation
∂
∂θ
(η2θ2(θη2)
−1π
)+
∂
∂η
(η4η−3π
)
+∂
∂β
(η2
{∂π
∂β− π
{η2E
((∂logf
∂β
)(∂2logf
∂β2
))− θ2E(∂3logf
∂β∂θ2
)− η2E(∂3logf
∂β∂η2
)}})= 0.
(2.5.1)
From (2.1.6) and (2.1.8) of Lemma 2.1, (2.5.1) reduces to
∂
∂θ(θπ) +
∂
∂η(ηπ) + η2 ∂
∂β(∂π
∂β) = 0,
i.e.
∂
∂θ(θπ) +
∂
∂η(ηπ) + η2 ∂2π
∂β2= 0.
Again π ∝ (θη)−1 provides a solution.
Next if θ is the parameter of interest, LR matching prior π for θ is obtained by
solving the differential equation
∂
∂β
{η2θ2.0.π
}+
∂
∂η
{η2θ2.0.π
}
+∂
∂θ
(θ2
{∂π
∂θ− π
{θ2E
((∂logf
∂θ
)(∂2logf
∂θ2
))− η2(E
(∂3logf
∂β2∂θ
)+ E
(∂3logf
∂η2∂θ
))}})= 0.
(2.5.2)
Again from (2.1.7), (2.1.9) and (2.1.10) in Lemma 2.1, (2.5.2) reduces to
∂
∂θ[θ2{∂π
∂θ+
2
θπ + π
2
θ}] = 0,
46
or, upon simplifying,
∂
∂θ[θ2∂π
∂θ+ 4θπ] = 0
which holds for π ∝ (θη)−1.
Finally, when η is the parameter of interest, the LR matching prior is obtained by
∂
∂β
{π{η4η−3.0}
}+
∂
∂θ
{η2θ2 1
θη2π
}
+∂
∂η
(η2
{∂π
∂η− π
{η2E
((∂logf
∂η
)(∂2logf
∂η2
))− η2 θ/η
θη2− 0
}})= 0.
(2.5.3)
Once again, using (2.1.11) in Lemma 2.1, (2.5.3) reduces to
∂
∂θ(θπ) +
∂
∂η
[η2
(∂π
∂η+
2
ηπ)]
= 0,
and the prior π ∝ (θη)−1 provides a solution.
2.6 Propriety of Posteriors and Simulation Study
The prior π(µ1, µ2, β, θ, η) ∝ (θη)−1 is improper. In this section, we prove the
propriety of the posterior under this prior. In addition, we find the marginal posteriors
for β, θ and η, and discuss methods for finding the HPD intervals for each one of these
parameters.
First we derive the posterior pdf of β. It turns out to be a proper t-density. This
immediately implies the propriety of the joint posterior also, because otherwise the
marginal posterior of β cannot be proper.
To this end, first writing Xi = (X1i, X2i)T , i = 1, . . . , n, the joint posterior is given by
π(µ1, µ2, β, θ, η|X1,X2) ∝
θ−nexp
[− 1
2
n∑i=1
{(X2i − µ2 − β (X1i − µ1)
)2
θη+
η(X1i − µ1)2
θ
}](θη)−1.
(2.6.1)
From the identities
n∑1
(X2i−µ2−β(X1i−µ1)
)2
=n∑
i=1
{(X2i− X2)−β(X1i− X1)2}+n(X2−µ2−β(X1−µ1))
2
47
andn∑1
(X1i − µ1)2 =
n∑1
(X1i − X1)2 + n(X1 − µ1)
2,
first integrating out µ2, and then µ1, one gets from (2.6.1) after simplification
π(β, θ, η|X1,X2) ∝ θ−nη−1exp
[− 1
2θ
n∑i=1
{(X2i − X2 − β
(X1i − X1
) )2
η+ η
(X1i − X1
)2}]
.
(2.6.2)
Next integrating out θ in (2.6.2) and writing Sjk =∑n
i=1(Xji − Xj)(Xki − Xk), j, k = 1, 2,
π(β, η|X1,X2) ∝ ηn−2
(η2S11 + S22 + β2S11 − 2βS12
)−(n−1)
. (2.6.3)
From (2.6.3), the marginal posterior of β is given by
π(β|X1,X2) ∝∫ ∞
o
ηn−2
(η2 +
S22 + β2S11 − 2βS12
S11
]
)−(n−1)
dη.
Putting η = z[S22 + β2S11 − 2βS12/S11]−1/2 in the above integral, one gets after
simplification,
π(β|X1,X2) ∝(
1 +(β − S12/S11)
2
S22.1
)−n−12
, (2.6.4)
where S22.1 = S22 − S122/S11. This posterior is a t-distribution with location parameter
S12/S11, scale parameter {S22.1/(n− 2)}1/2 and degrees of freedom n-2.
Next we find the posterior of θ. Integrating out β in (2.6.2), one gets
π(θ, η|X1,X2) ∝ θ−n+ 12 η−
12 exp
(− 1
2
(S22.1
θη+
ηS11
θ
)). (2.6.5)
Now the posterior of θ is given by
π(θ|X1,X2) ∝ θ−n+ 12
∫ ∞
0
η−12 exp
(− 1
2θ(S22.1
η+ ηS11)
)dη. (2.6.6)
Putting η = z−1, (2.6.6) is rewritten as
π(θ|X1,X2) ∝ θ−n+ 12
∫ ∞
0
z−32 exp
(− S22.1
2θz
(z − S11
1/2
S22.11/2
)2 − S111/2S22.1
1/2
θ
)dz. (2.6.7)
48
Recalling that an Inverse Gaussian random variable, say U, with mean µ and scale
parameter λ has pdf given by
fµ,λ(u) =λ1/2
(2πu)3/2exp
(− λ(z − µ)2
2µ2z
), z > 0, µ > 0, λ > 0,
it follows from (2.6.7) that
π(θ|X1,X2) ∝ θ−(n−1)exp(−S111/2S22.1
1/2/θ)I[θ>0], (2.6.8)
so that θ−1 has a Gamma distribution with shape parameter n-2 and scale parameter
(S11S22.1)− 1
2 .
Finally, integrating θ in (2.6.5), the marginal posterior of η is given by
π(η|X1,X2) ∝η−1/2(S22.1
η+ ηS11)
−(n−3/2)
∝ ηn−2(η2 +S22.1
S11
)−(n−3/2).
(2.6.9)
The construction of HPD credible intervals is fairly simple. The posterior of β being a
univariate-t (thus symmetric and unimodal), from (2.6.4), the 100(1 − α) % HPD credible
interval for β is given by S12/S11 ± {S22.1/(n− 2)}1/2tn−2;α/2, where tn−2;α/2 denotes the
upper 100α2% point of a Student’s t-distribution with n-2 degrees of freedom.
Next observing that the posterior of θ is log-concave, the 100(1 − α) % region for θ is
given by [θ1, θ2], where θ1 and θ2 satisfy
θ1−(n−1)exp(−S11
1/2S22.11/2/θ1) = θ2
−(n−1)exp(−S111/2S22.1
1/2/θ2) (2.6.10)
and ∫ θ2
θ1
θ−(n−1)exp(−S111/2S22.1
1/2/θ)(S11S22.1)n−2
2 dθ = 1− α. (2.6.11)
If w = θ−1, then the posterior pdf of w is given by
π(w|X1,X2) ∝ wn−3exp(−wS1/211 S
1/222.1).
49
Noting the log-concavity of this pdf as well, the HPD region [w1, w2] for w is obtained by
solving
w1n−3exp(−w1S11
1/2S22.11/2) = w2
n−3exp(−w2S111/2S22.1
1/2) (2.6.12)
and ∫ w2
w1
wn−3
Γ(n− 2)exp(−wS11
1/2S22.11/2)(S11S22.1)
n−22 dw = 1− α. (2.6.13)
Clearly the solution [w1, w2] of (2.6.12) and (2.6.13) is different from the solution [θ−12 , θ−1
1 ]
of (2.6.10) and (2.6.11).
Finally observing that the posterior of η in (2.6.9) is log-concave, the 100(1 − α) %
HPD interval [η1, η2] for η is obtained by solving
ηn−21 (η2
1 +S22.1
S11
)−(n−3/2) = ηn−22 (η2
2 +S22.1
S11
)−(n−3/2),
where
c
∫ η2
η1
ηn−2(η2 +S22.1
S11
)−(n−3/2) dη = 1− α,
where c is the normalizing constant.
Now we evaluate the frequentist coverage probability by investigating the HPD
credible interval of the marginal posterior densities of β, θ and η under our probability
matching prior for several ρ and n. That is to say, the frequentist coverage of a 100(1 −α)% HPD interval should be close to 1 − α. This is done numerically. Table 2-1 gives
numerical values of the frequentist coverage probabilites of 95% HPD intervals for β, θ and
η.
The computation of these numerical values is based on simulation. In particular, for
fixed (µ1, µ2, σ21, σ
22, ρ) and n, we take 5, 000 independent random samples of (X1,X2)
from the bivariate normal model. In our simulation study, we take µ1 = µ2 = 0 without
loss of generality. Under the prior π, the frequentist coverage probability can be estimated
by the relative frequency of HPD intervals containing true parameter value. An inspection
of Table 2-1 reveals that the agreement between the frequentist and posterior coverage
50
probabilities of HPD intervals is quite good for the probability matching prior even if n is
small.
Table 2-1. Frequentist Coverage Probabilities of 95% HPD Intervals for β, θ and η whenσ2
1 = 1 and σ22 = 1
ρ n β θ η0.25 4 0.952 0.947 0.949
8 0.946 0.955 0.95012 0.954 0.952 0.94816 0.952 0.954 0.95020 0.945 0.948 0.950
0.50 4 0.950 0.952 0.9498 0.944 0.952 0.94812 0.954 0.953 0.94416 0.946 0.950 0.94920 0.952 0.948 0.949
0.75 4 0.955 0.952 0.9538 0.953 0.948 0.94912 0.950 0.946 0.94716 0.948 0.946 0.95120 0.956 0.946 0.951
51
CHAPTER 3THE BIVARIATE NORMAL CORRELATION COEFFICIENT
One of the classic problems in statistics is inference for the correlation coefficient,
ρ, in a bivariate normal distribution. Beginning with Fisher’s hyperbolic tangent
transformation, there have been many proposals, both frequentist and Bayesian, which
address this problem. Added to this is the fiducial approach as found in Fisher (1930,
1956) and Pratt (1963).
Bayesian inference for ρ began in the early sixties with the work of Brillinger (1962)
and Geiser and Cornfield (1963). The main objective of these authors was to find whether
the fiducial distribution of ρ could be identified as a Bayesian posterior under possibly
some default or objective prior, and the conclusion was that this was most likely not
possible.
After a long fallow period, interest in this problem revived with the recent interesting
work of Berger and Sun (2006). These authors considered various parametric functions
arising from the bivariate normal distribution, and derived many objective priors which
satisfy the quantile matching property. In the process, they found a prior which achieves
this goal. In addition, they showed that the resulting posterior matched the fiducial
distribution of ρ as proposed by Brillinger (1962) and Geiser and Cornfield (1963).
In this chapter, as before, we first construct an orthogonal parameterization with ρ
as the parameter of interest and then find a prior, if any, which meets all the matching
criterion, at least asymptotically. In addition, we have considered several likelihood-based
methods as well for similar inferential purposes based on certain modifications of the
profile likelihood, namely conditional profile likelihood, adjusted profile likelihood and
integrated likelihood.
3.1 The Orthogonal Parameterization
Let (X1i, X2i), (i = 1, . . . , n) be independent and identically distributed random
variables having a bivariate normal distribution with means µ1 and µ2, variances σ12(> 0)
52
and σ22(> 0), and correlation coefficient ρ (|ρ| < 1). Using the transformation
θ1 = σ1/σ2, θ2 = σ1σ2(1− ρ2)1/2
and ψ = ρ, (3.1.1)
the bivariate normal pdf can be rewritten as
f(X1, X2 | µ1, µ2, θ1, θ2, ψ) ∝1
θ2
exp
{− 1
2(1− ψ2)1/2θ2
{(X1 − µ1)
2
θ1
+ θ1(X2 − µ2)2 − 2ψ(X1 − µ1)(X2 − µ2)
}}.
(3.1.2)
With this reparameterization, the Fisher Information matrix reduces to
I(µ1, µ2, θ1, θ2, ψ) =
A 0
0 D
, (3.1.3)
where
A =
1θ1θ2(1−ψ2)1/2 − ψ
θ2(1−ψ2)1/2
− ψθ2(1−ψ2)1/2
θ1
θ2(1−ψ2)1/2
and
D = Diag (1
θ21(1− ψ2)
,1
θ22
,1
(1− ψ2)2).
This establishes immediately the mutual orthogonality of (µ1, µ2), θ1,θ2 and ψ. The
inverse of the information matrix is simply then
I−1(µ1, µ2, θ1, θ2, ψ) =
A−1 0
0 D−1
, (3.1.4)
where
A−1 =
θ1θ2
(1−ψ2)1/2ψθ2
(1−ψ2)1/2
ψθ2
(1−ψ2)1/2θ2
θ1(1−ψ2)1/2
(3.1.5)
and
D−1 = Diag ( θ21(1− ψ2), θ2
2, (1− ψ2)2). (3.1.6)
53
For subsequent sections, we need also a few other results which are collected in the
following lemma.
Lemma 3.1 For the bivariate normal density given in (3.1.2),
E( ∂3logf
∂ψ2∂θ1
)= 0, E
( ∂3logf
∂ψ2∂θ2
)=
1
θ2(1− ψ2)2; (3.1.7)
E(∂3logf
∂ψ3
)= − 6ψ
(1− ψ2)3; (3.1.8)
E
((∂logf
∂ψ
)(∂2logf
∂ψ2
))=
2ψ
(1− ψ2)3; (3.1.9)
E(∂3logf
∂ψ∂θ21
)= − ψ
θ21(1− ψ2)2
,E(∂3logf
∂ψ∂θ22
)= 0; (3.1.10)
Proof To prove the Lemma note that E(X1 − µ1)2 = θ1θ2(1− ψ2)−1/2, E(X2 − µ2)
2 =
θ−11 θ2(1− ψ2)−1/2, and E
{(X1 − µ1)(X2 − µ2)
}= ψθ2(1− ψ2)−1/2. We begin with (3.1.7).
E( ∂3logf
∂ψ2∂θ1
)=− 1
2θ2
{1 + 2ψ2
(1− ψ2)5/2
}E
(− (X1 − µ1)
2
θ21
+ (X2 − µ2)2
)
=− 1
2θ2
{1 + 2ψ2
(1− ψ2)5/2
}(− θ2
θ1(1− ψ2)1/2+
θ2
θ1(1− ψ2)1/2
)
=0,
and
E( ∂3logf
∂ψ2∂θ2
)=E
(− 2ψ(X1 − µ1)(X2 − µ2)
θ22(1− ψ2)3/2
+1
2θ22
{1 + 2ψ2
(1− ψ2)5/2
}{(X1 − µ1)
2
θ1
+ θ1(X2 − µ2)2 − 2ψ(X1 − µ1)(X2 − µ2)
})
=− 2ψ
θ22(1− ψ2)3/2
ψθ2
(1− ψ2)1/2
+1
2θ22
1 + 2ψ2
(1− ψ2)5/2
(θ2
(1− ψ2)1/2+
θ2
(1− ψ2)1/2− 2ψ2θ2
(1− ψ2)1/2
)
=− 2ψ2
θ2(1− ψ2)2+
1 + 2ψ2
θ2(1− ψ2)2
=1
θ2(1− ψ2)2.
54
Next, (3.1.8) holds because
E(∂3logf
∂ψ3
)=E
(2(X1 − µ1)(X2 − µ2)
θ2
1 + 2ψ2
(1− ψ2)5/2+
(1 + 2ψ2)(X1 − µ1)(X2 − µ2)
θ2(1− ψ2)5/2
− 1
2θ2
{9ψ + 6ψ3
(1− ψ2)7/2
}{(X1 − µ1)
2
θ1
+ θ1(X2 − µ2)2 − 2ψ(X1 − µ1)(X2 − µ2)
})
= +3(1 + 2ψ2)
θ2(1− ψ2)5/2
ψθ2
(1− ψ2)1/2
− 1
2θ2
9ψ + 6ψ3
(1− ψ2)7/2
(θ2
(1− ψ2)1/2+
θ2
(1− ψ2)1/2− 2ψ2θ2
(1− ψ2)1/2
)
=3ψ(1 + 2ψ2)
(1− ψ2)3− 3ψ(3 + 2ψ2)
(1− ψ2)3
=− 6ψ
(1− ψ2)3.
To show that (3.1.9) holds, note that, using the Bartlett Identity we get
E
((∂logf
∂ψ
)(∂2logf
∂ψ2
))=−
(d
dψIψψ + E
(∂3logf
∂ψ3
))
=−(
d
dψ
1
(1− ψ2)2− 6ψ
(1− ψ2)3
)
=2ψ
(1− ψ2)3
and finally, in order to show that (3.1.10) holds, we have
E(∂3logf
∂ψ∂θ21
)=E
(− ψ(X1 − µ1)2
(1− ψ2)3/2θ2θ31
)
=− ψ
θ21(1− ψ2)2
,
and
E(∂3logf
∂ψ∂θ22
)=E
(+
2(X1 − µ1)(X2 − µ2)
θ32(1− ψ2)1/2
− 1
θ32
{ψ
(1− ψ2)3/2
}{(X1 − µ1)
2
θ1
+ θ1(X2 − µ2)2 − 2ψ(X1 − µ1)(X2 − µ2)
})
=2ψθ2
θ32(1− ψ2)
− ψ
θ32(1− ψ2)3/2
(θ2
(1− ψ2)1/2+
θ2
(1− ψ2)1/2− 2ψ2θ2
(1− ψ2)1/2
)
= 0.
We derive the matching priors in the next few sections.
55
3.2 Quantile Matching Priors
Here, as before, one is interested in the approximate frequentist validity of the
posterior quantiles of a one-dimensional interest parameter. Due to orthogonality of ψ
with (µ1, µ2, θ1, θ2), from (1.2.3.4), the class of first order matching priors is characterized
by
π(µ1, µ2, θ1, θ2, ψ) ∝ (1− ψ2)−1g0(µ1, µ2, θ1, θ2). (3.2.1)
As is often customary to assign a uniform prior to (µ1, µ2) on R2, we will consider only the
subclass of priors where g0(µ1, µ2, θ1, θ2) = g(θ1, θ2).
A prior of the form π ∝ (1 − ψ2)−1g(θ1, θ2) satisfies the second-order probability
matching property if and only if (see (1.2.3.5)) g satisfies the relation
∂
∂θ1
(g(1− ψ2)
{Iθ1θ1E
( ∂3logf
∂ψ2∂θ1
)+ Iθ2θ2E
( ∂3logf
∂ψ2∂θ2
)})
+g
3
∂
∂ψ
{(1− ψ2)3E
(∂3logf
∂ψ3
)}− g∂2
∂ψ2
{(1− ψ2)
}= 0.
(3.2.2)
Now by (3.1.7) and (3.1.8) from Lemma 3.1 and (3.1.6), (4.2.2) simplifies to (1 −ψ2)−1 ∂
∂θ2(gθ2) − 2g ∂
∂ψψ − g ∂2
∂ψ2 (1 − ψ2) = 0, which simplifies to ∂∂θ2
(gθ2) = 0 and
a solution is provided by g(θ1, θ2) ∝ h(θ1)θ−12 . Thus every prior π(µ1, µ2, θ1, θ2, ψ) ∝
h(θ1)θ−12 (1 − ψ2)−1 is a second order probability matching prior for ψ for any arbitrary
smooth function h of θ1. In particular if we let h(θ1) = θ−11 , then from Theorem 1 of
Datta and Ghosh, M. (1995), the one-at-a-time reference or reverse reference prior for ψ
is given by θ−11 θ−1
2 (1 − ψ2)−1. This prior was first proposed in Lindley (1965), and was
subsequently shown to be a one-at-a-time reference prior by Bayarri (1981). Due to the
invariance property of such a prior, back to the original parameterization, a second order
matching prior for ρ is π(µ1, µ2, σ1, σ2, ρ) ∝ σ−11 σ−1
2 (1− ρ2)−1.
The first order quantile matching prior π ∝ (1 − ψ2)−1g(θ1, θ2) is also first order
matching via distribution functions. It follows from (1.3.2.1) and (1.3.2.2) of Chapter 1
that in order that this prior is also a second order distribution function matching prior, it
56
needs to satisfy
∂2
∂ψ2{Iψψπ} − 2
∂
∂ψ{Iψψ ∂
∂ψπ} − ∂
∂θ2
{E( ∂3logf
∂ψ2∂θ2
)IψψIθ2θ2π}
− ∂
∂ψ{E(∂3logf
∂ψ∂θ21
)IψψIθ1θ1π} = 0
(3.2.3)
and
∂
∂ψ{E(∂3logf
∂ψ3
)(Iψψ)2π} = 0. (3.2.4)
It is easily verified that for the prior π ∝ (1 − ψ2)−1g(θ1, θ2), the left hand side of (3.2.4)
reduces to −6g. The above prior also fails to satisfy (3.2.3) for any g. Hence we do not
have a prior that satisfies the second order distribution function matching criteria.
3.3 Highest Posterior Density (HPD) Matching Priors
We now turn attention to HPD matching priors for ρ. Due to orthogonality of ψ with
(µ1, µ2, θ1, θ2), (see (1.4.3.1) of Chapter 1) we need a prior π which satisfies the differential
equation
∂
∂θ1
{(1− ψ2)2
(Iθ1θ1E
( ∂3logf
∂ψ2∂θ1
))π
}+
∂
∂θ2
{(1− ψ2)2
(Iθ2θ2E
( ∂3logf
∂ψ2∂θ2
))π
}
+∂
∂ψ
{(1− ψ2)4E
(∂3logf
∂ψ3
)π
}− ∂2
∂ψ2
{(1− ψ2)2π
}= 0
(3.3.1)
Using (3.1.7) and (3.1.8) from Lemma 3.1 and (3.1.6), (3.3.1) reduces to
∂
∂θ2
{θ2π
}− 6
∂
∂ψ
{ψ(1− ψ2)π
}− ∂2
∂ψ2
{(1− ψ2)2π
}= 0. (3.3.2)
Consider the class of priors π(θ1, θ2, ψ) ∝ h(θ1)θa2(1 − ψ2)b. With this prior, (3.3.2) can be
written as
h(θ1) θa2 [(a + 1)(1− ψ2)b + 2(b− 1){(1− ψ2)b+1 − 2(b + 1)ψ2(1− ψ2)b}] = 0.
On further simplification, this reduces to
h(θ1) θa2 (1− ψ2)b [a + 1 + 2(b− 1)(1− ψ2)− 4(b2 − 1)ψ2] = 0. (3.3.3)
57
Since this needs to hold for all ψ ε (−1, 1) one gets
a + 1 = 4(b2 − 1) = −2(b− 1)
Hence the two possible solutions are a = −1, b = 1 and a = 4, b = −3/2. This results
in π ∝ h(θ1) θ−12 (1 − ψ2) and π ∝ h(θ1) θ4
2(1 − ψ2)−3/2 which are both HPD matching
for ψ. In particular for h(θ1) = θ−11 , back to the original parameterization, we obtain
π ∝ σ−11 σ−1
2 (1 − ρ2) and π ∝ σ41 σ4
2 (1 − ρ2) as HPD matching for ρ. In general, HPD
matching priors suffer from lack of invariance. However, if the same object of interest is
considered over the two parameterizations then they are invariant of the parameterization
adopted. This has been discussed in detail in Datta and Mukerjee (2004 , p74).
3.4 Matching Priors Via Inversion of Test Statistics
The most popular test obtained by inverting certain test statistics is the likelihood
ratio test. But tests based on Rao’s score statistic or the Wald statistic are also of
importance, and are first order equivalent (i.e. upto o(n−1/2)) to the likelihood ratio tests.
From (1.5.1.5), a likelihood ratio matching prior π is obtained by solving
p∑s=2
p∑u=2
∂
∂u
{π(θ)I−1
11 IsuE(∂3logf
∂θ21∂θs
)
}
+∂
∂θ1
(I−111
{∂π
∂θ1
− π(θ)(I−1
11 E((∂logf
∂θ1
)(∂2logf
∂θ21
))−p∑
s=2
p∑u=2
IsuE(∂3logf
∂θ1∂θu∂θs
)
})= 0.
(3.4.1)
Under the orthogonal parameterization obtained in (3.1.1), (3.4.1) can be rewritten as
∂
∂θ1
{(1− ψ2)2Iθ1θ1E
( ∂3logf
∂ψ2∂θ1
)π
}+
∂
∂θ2
{(1− ψ2)2 Iθ2θ2E
( ∂3logf
∂ψ2∂θ2
)π
}
+∂
∂ψ
{(1− ψ2)2
{∂π
∂ψ− π
((1− ψ2)2E
((∂logf
∂ψ)(
∂2logf
∂ψ2))
− Iθ1θ1E(∂3logf
∂ψ∂θ21
)− Iθ2θ2E(∂3logf
∂ψ∂θ22
))}}
= 0.
(3.4.2)
58
Again, using results (3.1.7), (3.1.9) and (3.1.10) from Lemma 3.1 and (3.1.6), (3.4.2)
reduces to
∂
∂θ2
{θ2π
}+
∂
∂ψ
{(1− ψ2)2
{∂π
∂ψ− π
( 3ψ
1− ψ2
)}}= 0. (3.4.3)
Consider again the class of priors π = h(θ1)θa2(1 − ψ2)b. Then (3.4.3) further reduces
to
a + 1− (2b + 3){1− ψ2 − 2(b + 1)ψ2
}= 0. (3.4.4)
In order that (3.4.4) holds for all ψ ε(−1, 1) a unique solution is obtained for a = −1 and
b = −3/2. Hence the unique prior within the considered class of priors that satisfies the
likelihood ratio matching property is given by π ∝ h(θ1)θ−12 (1 − ψ2)−3/2. Once again, if
we let h(θ1) = θ−11 , then back to the original parameterization, π ∝ σ−1
1 σ−12 (1 − ρ2)−3/2
satisfies the likelihood ratio matching property for ρ.
3.5 Propriety of the Posteriors
We now establish the propriety of the posteriors. We chose h(θ1) = θ−11 . Then a prior
of the form π(µ1, µ2, θ1, θ2, ψ) ∝ (1 − ψ2)aθ−11 θ−1
2 satisfies the various matching properties
discussed above for different values of a. Also, the joint posterior of µ1, µ2, θ1, θ2, ψ given
X is
π(µ1, µ2, θ1, θ2, ψ|X)
∝ θ−n2 exp
{− 1
2(1− ψ2)1/2θ2
n∑i=1
{(X1i − µ1)2
θ1
+ θ1(X2i − µ2)2 − 2ψ(X1i − µ1)(X2i − µ2)
}}
× (θ1θ2)−1(1− ψ2)a.
(3.5.1)
Next consider the transformation
ρ = ψ , σ21 =
θ1θ2
(1− ψ2)1/2and σ2
2 =θ2
θ1(1− ψ2)1/2.
59
The Jacobian is given by (1−ρ2)1/2
σ22
. The posterior using this transformation can be written
as
π(µ1, µ2, σ21, σ
22, ρ|X)
∝ exp
{− 1
2(1− ρ2)
n∑i=1
{(X1i − µ1)2
σ21
+(X2i − µ2)
2
σ22
− 2ρ(X1i − µ1)(X2i − µ2)
σ1σ2
}}
× (σ21σ
22)−n/2−1(1− ρ2)−n/2+a.
(3.5.2)
Now , integrating out µ1 and µ2, we obtain
π(σ21, σ
22, ρ|X) ∝ (σ2
1σ22)−n+1
2 (1− ρ2)−n−1
2+aexp
{− 1
2(1− ρ2)
{S11
σ21
+S22
σ22
− 2ρS12
σ1σ2
}}. (3.5.3)
Consider another transformation
z1 = σ21(1− ρ2) , z2 = σ2
2(1− ρ2) and z3 = ρ.
The Jacobian here is given by (1 − z23)−2.Then, using a series expansion, the posterior can
be written as
π(z1, z2, z3|X) ∝ (1− z23)
n−12
+a(z1z2)−n+1
2 exp
{− 1
2
{S11
z1
+S22
z2
− 2z3S12√z1z2
}}
= (1− z23)
n−12
+a(z1z2)−n+1
2 exp
{− 1
2
{S11
z1
+S22
z2
}}
×∞∑
r=0
(z3S12)r
zr/21 z
r/22 r!
.
The marginal posterior of z3 is obtained by integrating out z1 and z2. So
π(z3|X) ∝ (1− z23)
n−12
+a
∞∑r=0
∫ ∞
0
∫ ∞
0
(z1z2)−n+r+1
2Sr
12
r!zr3exp
{− 1
2{S11
z1
+S22
z2
}}dz1dz2
∝ (1− z23)
n−12
+a
∞∑r=0
Sr12z
r3
r!Sr/211 S
r/222
×∫ ∞
0
∫ ∞
0
(w1w2)n+r−1
2−1exp
{− 1
2{w1S11 + w2S22}
}dw1dw2
= (1− z23)
n−12
+a
∞∑r=0
zr3
Rr
r!Γ2(
n + r − 1
2).
(3.5.4)
60
Clearly, for r odd, the integral∫ 1
−1zr3 (1 − z2
3)n−1
2+a dz3 = 0. So in order to show the
propriety, we need to only show that
I =
∫ 1
−1
(1− z23)
n−12
+a{ ∞∑
r=0
z2r3
R2r
(2r)!Γ2(
n + 2r − 1
2)}dz3 < ∞.
To this end, we first note that
∫ 1
−1
z2r3 (1− z2
3)n−1
2+a dz3 = 2
∫ 1
0
z2r3 (1− z2
3)n−1
2+a dz3
=
∫ 1
0
ur−1/2 (1− u)n−1
2+a du
= Beta (r +1
2,n + 1
2+ a)
=Γ(r + 1
2)Γ(n+1
2+ a)
Γ(n+2r+22
+ a)for a > −n− 2
2.
Hence
I =∞∑
r=0
R2r
(2r)!
Γ(r + 12)Γ2(n+2r−1
2)
Γ(n+2r+22
+ a). (3.5.5)
By the Legendre duplication formula, (2r)! = Γ(2r + 1) = Γ(r + 12)Γ(r + 1)22r/π1/2. Hence
writing k (> 0) as a generic constant which does not depend on r,
I = k
∞∑r=0
R2r
4rr!
Γ2(r + n−12
)
Γ(r + n+2+2a2
)=
∞∑r=0
br (say).
Note that
br+1
br
=R2
4
(r + n−12
)2
(r + 1)(r + n+2+2a2
)→ R2
4< 1 as r →∞.
Hence, by the ratio rule of convergence,∑∞
r=0 br < ∞ which proves the propriety of the
posteriors.
3.6 Likelihood Based Inference
The objective of this section is to describe methods for likelihood-based inference
for the bivariate normal correlation coefficient. As a general rule, the basic approach for
inference in the presence of nuisance parameters is to replace the nuisance parameters
in the likelihood function by their maximum likelihood estimates and examine the
resulting profile likelihood as a function of the parameter of interest. This procedure
61
is known to give inconsistent or inefficient estimates for problems where the number of
nuisance parameters grows in direct proportion to the sample size. The conditional profile
likelihood (Cox and Reid, 1987) on the other hand is based on the conditional likelihood
given maximum likelihood estimates of the orthogonalized parameters. It corrects the
inconsistency of the profile likelihood in some problems and also makes “degrees of
freedom” adjustments for estimating the normal variance. An alternative simpler approach
is to adjust the profile log-likelihood so that the mean score is zero and the variance of the
score function equals its negative expected derivative matrix, so that the score function
is unbiased (Godambe, 1960) and information unbiased (Lindsay, 1982). This method,
known as the adjusted profile likelihood, was proposed by McCullagh and Tibshirani
(1990). Also, an integrated likelihood (Kalbfleisch and Sprott, 1970), which is defined as
the integral over the nuisance parameter space of the likelihood times the prior density,
can be used for inference. Here one must be willing to specify a joint prior distribution for
the nuisance parameters conditional on the parameters of interest.
We begin with the profile likelihood for (θ1, θ2, ψ) given by
Lp(θ1, θ2, ψ) =
exp
{− 1
2(1−ψ2)1/2θ2
{S2
1
θ1+ θ1S
22 − 2ψrS1S2
}}
(2πθ22)
(n/2), (3.6.1)
where S21 =
∑ni=1(X1i − X1)
2, S22 =
∑ni=1(X2i − X2)
2. Let lp ≡ logLp.
To obtain the maximizer of the profile likelihood first we obtain
∂lp∂θ1
= − 1
2(1− ψ2)1/2θ2
{− S21
θ21
+ S22
},
which on equating to zero leads to
θ1(ψ) =
√S2
1
S22
.
62
On differentiating Lp(θ1, θ2, ψ) with respect to θ2 and equating to zero, we obtain
n =1
2(1− ψ2)1/2θ2
{S2
1
θ1
+ θ1S22 − 2ψrS1S2
}.
This gives
θ2(ψ) =S1S2(1− ψr)
n(1− ψ2)1/2. (3.6.2)
Thus the profile likelihood for ψ is given by
Lp(ψ) ∝ exp(−n)
(1− ψr)n(1− ψ2)n/2 ∝ (1− ψ2)n/2
(1− ψr)n. (3.6.3)
Next, from (3.1.3) the determinant of the matrices A and B are 1θ22
and 1θ21θ2
2(1−ψ2)
respectively. From Cox and Reid (1987), one obtains now the conditional profile likelihood
Lcp(ψ) ∝ (1− ψ2)n/2
(1− ψr)nθ1(ψ) θ2
2(ψ) (1− ψ2)1/2
∝ (1− ψ2)n/2
(1− ψr)n
(1− ψr)2
1− ψ2(1− ψ2)1/2
=(1− ψ2)
n−12
(1− ψr)n−2.
(3.6.4)
Next we derive the adjusted profile likelihood for ψ. Let λ denote the vector of
nuisance parameters and U(ψ) the score function, that is U(ψ) = dlogLp(ψ)
dψ. Then
m(ψ) = Eψ,λψU(ψ) and w(ψ) =
{− Eψ,λψ
d2
dψ2 lp(ψ) + ddψ
m(ψ)}
/varψ,λψ(U(ψ)). Also
let U(ψ) = {U(ψ) −m(ψ)}w(ψ). The adjusted profile log likelihood for ψ is obtained as
lap(ψ) =∫ ψ
U(t)dt. From (3.6.3), the score function is then given by
U(ψ) =dlogLp(ψ)
dψ=
n(r − ψ)
(1− ψr)(1− ψ2). (3.6.5)
From Kendall and Stuart (Vol 1,1968,pg 390) and (3.6.3)
m(ψ) = E[U(ψ)] =ψ
2(1− ψ2)+ O(n−1), (3.6.6)
and
m′(ψ) =1 + ψ2
2(1− ψ2)2+ O(n−1). (3.6.7)
63
Next, by Taylor’s expansion we obtain
E{U2(ψ)} =n2
(1− ψ2)2E{ (r − ψ)2
(1− ψr)2}
=n2
(1− ψ2)2E
{(r − ψ)2{1− ψ(r − ψ)− ψ2}−2
}
=n2
(1− ψ2)4E
{(r − ψ)2{1− ψ(r − ψ)
1− ψ2}−2
}
=n2
(1− ψ2)4E
{(r − ψ)2{1 +
2ψ(r − ψ)
1− ψ2+
3ψ2(r − ψ)2
(1− ψ2)2+ . . .}}
=n2
(1− ψ2)4
{(1− ψ2)2
n+ O(n−2)
}
=n
(1− ψ2)2+ O(1).
(3.6.8)
Hence
m(ψ) = O(1),
m′(ψ) = O(1),
V (U(ψ)) =n
(1− ψ2)2+ O(1).
(3.6.9)
Further,
d2lp(ψ)
dψ2= n
d
dψ{ r − ψ
(1− ψr)(1− ψ2)}
= nd
dψ
{ 1
ψ{ 1
(1− ψr)− 1
(1− ψ2)}}
= nd
dψ
{ 1
ψ+
r
(1− ψr)− 1
ψ(1− ψ2)
}
= n{− 1
ψ2+
r2
(1− ψr)2+
1
ψ2(1− ψ2)− 2
(1− ψ2)2
}
= n{ψ2r2 − (1− 2ψr + ψ2r2)
ψ2(1− ψr)2+
1− 3ψ2
ψ2(1− ψ2)2
}
= n{2ψr − 1
ψ2{1− ψ(r − ψ)− ψ2}−2 +
1− 3ψ2
ψ2(1− ψ2)2
}
= n
{2ψ(r − ψ) + 2ψ2 − 1
ψ2(1− ψ2)2{1− ψ(r − ψ)
1− ψ2}−2 +
1− 3ψ2
ψ2(1− ψ2)2
}.
64
Then
E[−d2lp(ψ)
dψ2] = −n
{2ψ2 − 1 + 1− 3ψ2
ψ2(1− ψ2)2+ O(n−1)
}
=n
(1− ψ2)2+ O(1).
(3.6.10)
This leads to
w(ψ) =E[− d2
dψ2 lp(ψ)] + m′(ψ)
var[U(ψ)]
=
n(1−ψ2)2
+ O(1)n
(1−ψ2)2)2+ O(1)
= 1 + O(n−1).
(3.6.11)
Thus
U(ψ) = [U(ψ)−m(ψ)]w(ψ)
= [U(ψ)− ψ
2(1− ψ2)+ O(n−1)][1 + O(n−1)]
= U(ψ)− ψ
2(1− ψ2)+ O(n−1).
(3.6.12)
In other words
dlap(ψ)
dψ= U(ψ)− ψ
2(1− ψ2),
and on integrating we obtain
lap(ψ) = lp(ψ) +1
4log(1− ψ2).
Therefore
Lap(ψ) = Lp(ψ)(1− ψ2)1/4
∝ (1− ψ2)n2+ 1
4 (1− ψr)−n.
(3.6.13)
Finally we wish to find the integrated likelihood. This requires specification of a prior
distribution for the nuisance parameters conditional on the parameter of interest. In
particular, the conditional reference prior of Berger, Liseo and Wolpert(1999) is given
by π(µ1, µ2, θ1, θ2, ψ) ∝ θ−11 θ−2
2 (1 − ψ2)−1/2. Then calculations similar to those done for
65
proving the propriety of ψ lead to the conditional reference integrated likelihood given by
LI(ψ) ∝ (1− ψ2)n−1
2
∞∑a=0
ψa ra
a!Γ2(
n + a
2). (3.6.14)
One common feature of all the modified likelihoods LP (ψ), LCP (ψ), Lap(ψ) andLI(ψ) is
that they are all dependent on the data only through the sample correlation coefficient r.
3.7 Simulation Study
In order to evaluate the three different priors, we undertook a simulation study where
data was generated from a bivariate normal distribution with (µ1, µ2, σ1, σ2) = (0, 0, 1, 1)
and varying values of ρ and varying sample sizes n. Throughout our simulation study, we
annotate the priors as follows:
Prior 1 π ∝ (1− ρ2)σ−11 σ−1
2
Prior 2 π ∝ (1− ρ2)−3/2σ−11 σ−1
2
Prior 3 π ∝ (1− ρ2)−1σ−11 σ−1
2 .
Since the full conditional distribution of the parameters under any of the three priors do
not follow a standard distributional form, we used Gibbs sampling with componentwise
Metropolis-Hastings updates at each iteration to generate random numbers from the
conditional posterior distributions of each parameter (Robert and Casella, 2001). We ran
two chains with different initial values and allowed a burn-in of 4000 each. A random-walk
jumping density with normal noise added to the existing value in the chain for the means
and log standard deviations were used. The correlations also had a random walk prior
by adding a small normal noise to the old values. Each chain was run 10,000 times and
convergence was judged by a Gelman-Rubin (Gelman and Rubin, 1992) diagnostic. The
trace plot presenting the time history of all 10000 iterations for all five parameters is
presented for a sample simulated dataset with ρ = 0.3, under Prior 3 and sample size
10, in Figure 3-1. Figure 3-2 presents the plot of Gelman-Rubin diagnostic for the ρ
chain under the same setting with diagnostic values close to 1 suggesting convergence.
Figures 3-3, 3-4 and 3-5 are posterior distributions for ρ under three different priors for
four different sample sizes n = 10, 20, 30, 40. One can immediately make the following
66
observations. Though there are certain numerical differences, the posterior distribution
of ρ does not seem to vary widely between Prior 1 and 3, even for smaller sample sizes.
Whereas Prior 2 appears to produce a smaller spread in the posterior distribution than
the other two priors. As data information increases with sample size, the posterior
distributions become very similar under the three priors. Some skewness can be observed
in the posterior distributions for smaller sample sizes, which was often noted during
our simulation, but the distribution becomes fairly symmetric as n becomes large. The
posterior distribution also becomes more concentrated around the true value of ρ with
increasing n.
We repeated our Gibbs sampling estimation technique for 500 datasets under each
configuration of ρ and n. Each time, we computed the posterior mean, the 95% quantile
interval (as given by the 2.5th and 97.5th sample percentile of the randomly generated
parameter values after the burn-in period) and the 95% HPD interval. Table 3-1 presents
the average of posterior means, the mean squared error, the frequentist coverage of the
Bayesian credible intervals (as estimated by the proportion of times the true parameter
value falls in the corresponding credible intervals) across the 500 datasets and under
three different priors. Some interesting differences can be noted in the behavior for
smaller sample sizes. Prior 1 appears to be performing worse than Priors 2 and 3 whereas
point estimation of ρ is concerned, with higher bias, though the MSE is not necessarily
larger for all values of ρ. On the other hand, Prior 1 has appreciably better coverage
property for the HPD intervals for smaller sizes, than Priors 2 and 3, and is in fact the
theoretically established HPD matching prior. Priors 1 and 3 are very comparable in
terms of coverage of quantile intervals, with Prior 3 having a slight edge over Prior 1 as it
attains nominal coverage for a smaller n in many cases. Prior 2, elicited from a inversion
of likelihood-ratio statistic point of view appears to be the least attractive from frequentist
coverage perspective. Based on our simulation results, if one is concerned about both
67
point and interval estimation, Prior 3 appears to have a slight edge over the other two
contenders.
Figure 3-1. Plot of Gelman-Rubin Diagnostic Statistic for ρ Under Prior III for n=10Under the Simulation Setting of Section 3.7.
68
Tab
le3-
1.Sim
ula
tion
Res
ult
toC
ompar
eth
eT
hre
eD
iffer
ent
Pri
ors
Sugg
este
dfo
rB
ivar
iate
Nor
mal
Cor
rela
tion
Par
amet
er.
The
Tru
ePar
amet
erSet
tings
are
µ1
=µ
2=
0,σ
1=
σ2
=1
and
Var
yin
gV
alues
ofρ
asLis
ted.
Pri
or1:
π∝
(1−
ρ2)σ−1 1
σ−1 2
,P
rior
2:π∝
(1−
ρ2)−
3/2σ−1 1
σ−1 2
,P
rior
3:π∝
(1−
ρ2)−
1σ−1 1
σ−1 2
.R
esult
sar
eB
ased
on50
0Sim
ula
ted
Dat
aset
s.∗ :
Ave
rage
valu
efo
rpos
teri
orm
ean
ofρ,av
erag
edac
ross
500
sim
ula
ted
dat
aset
s.ρ
nP
rior
1P
rior
2P
rior
3ρ∗
MSE
Cov
erag
eC
over
age
ρM
SEC
over
age
Cov
erag
eρ
MSE
Cov
erag
eC
over
age
(Qua
ntile
)(H
PD
)(Q
uant
ile)
(HP
D)
(Qua
ntile
)(H
PD
)
10-0
.65
0.05
0.87
0.91
-0.8
00.
020.
860.
82-0
.77
0.02
0.88
0.86
-0.8
20-0
.71
0.02
0.87
0.90
-0.7
80.
010.
900.
89-0
.77
0.01
0.90
0.90
30-0
.74
0.01
0.90
0.92
-0.7
90.
010.
920.
91-0
.78
0.01
0.93
0.92
40-0
.76
0.01
0.94
0.95
-0.7
90.
000.
930.
91-0
.79
0.00
0.94
0.92
10-0
.35
0.07
0.92
0.92
-0.4
70.
080.
900.
87-0
.44
0.07
0.92
0.89
-0.5
20-0
.41
0.04
0.91
0.91
-0.4
80.
040.
910.
90-0
.46
0.03
0.91
0.91
30-0
.44
0.02
0.94
0.93
-0.4
90.
020.
940.
93-0
.48
0.02
0.95
0.93
40-0
.44
0.02
0.95
0.96
-0.4
80.
010.
960.
94-0
.47
0.01
0.97
0.95
10-0
.14
0.06
0.94
0.92
-0.1
80.
100.
90.
85-0
.17
0.09
0.91
0.87
-0.2
20-0
.16
0.03
0.94
0.96
-0.2
00.
050.
940.
90-0
.19
0.04
0.95
0.92
30-0
.17
0.03
0.94
0.93
-0.1
90.
030.
920.
91-0
.19
0.03
0.93
0.92
40-0
.17
0.02
0.95
0.94
-0.1
90.
030.
940.
92-0
.18
0.02
0.94
0.94
100.
010.
060.
960.
920.
010.
110.
890.
860.
010.
090.
910.
880
200.
020.
040.
960.
930.
020.
050.
930.
900.
020.
050.
930.
9030
-0.0
10.
020.
930.
92-0
.01
0.03
0.92
0.90
-0.0
10.
030.
910.
9040
00.
020.
960.
950.
000.
030.
950.
920.
000.
030.
940.
9410
0.15
0.06
0.94
0.93
0.21
0.10
0.90
0.85
0.19
0.09
0.93
0.88
0.2
200.
150.
040.
950.
930.
180.
050.
930.
890.
170.
050.
930.
9030
0.17
0.03
0.94
0.93
0.20
0.03
0.93
0.92
0.19
0.03
0.94
0.92
400.
170.
020.
950.
940.
190.
020.
930.
920.
180.
020.
930.
9310
0.35
0.07
0.93
0.91
0.48
0.08
0.88
0.84
0.45
0.07
0.92
0.87
0.5
200.
410.
030.
930.
920.
480.
030.
930.
900.
470.
030.
930.
9230
0.44
0.02
0.93
0.93
0.49
0.02
0.92
0.92
0.48
0.02
0.92
0.92
400.
450.
020.
950.
950.
490.
020.
960.
940.
480.
020.
950.
9510
0.63
0.06
0.84
0.89
0.78
0.03
0.86
0.82
0.74
0.03
0.89
0.86
0.8
200.
710.
020.
890.
920.
790.
010.
930.
910.
770.
010.
930.
9330
0.75
0.01
0.90
0.93
0.80
0.01
0.94
0.92
0.79
0.01
0.95
0.93
400.
760.
010.
920.
940.
790.
000.
940.
930.
790.
000.
940.
94
69
Figure 3-2. Sample Trace Plot for All the Parameters under Prior III for n=10 Under theSimulation Setting of Section 3.7
70
Figure 3-3. Posterior Distribution for ρ under Prior I for Different Sample Sizes, Underthe Simulation Setting of Section 3.7
71
Figure 3-4. Posterior Distribution for ρ under Prior II for Different Sample Sizes, Underthe Simulation Setting of Section 3.7
72
Figure 3-5. Sample Posterior Distribution for ρ under Prior III for Different Sample Sizes,Under the Simulation Setting of Section 3.7
73
CHAPTER 4RATIO OF VARIANCES
There are many experimental situations in which an investigator wants to estimate
the ratio of variances of two independent normal populations. Study of the ratio of
variances dates back to 1920 when Fisher developed the F-statistic for testing the
variance ratio. The most well-used example involves testing of the hypothesis that the
standard deviations of two normally distributed populations are equal. Although ratio
of variances have been vigorously studied in the case of two independent normal samples
both in the frequentist and in the Bayesian literature, little study has been done for a
possibly correlated bivariate normal population. For testing the equality of variances in
a bivariate normal population, Pitman (1939) and Morgan (1939) introduced a variable
transformation which reduces the problem to testing a bivariate normal correlation
coefficient equal to zero. This same idea can be easily extended to test the null hypothesis
whether a variance ratio equals a particular value. Inverting this test statistic, Roy and
Potthoff (1958) obtained confidence bounds on the ratio of variances in the correlated
bivariate normal distribution. Since the test statistic has a Students’s t-distribution under
the null hypothesis, the resulting confidence bounds involves percentiles of a Student
t-distribution.
The objective of this Chapter is to find priors according to the different matching
criteria when the ratio of variances in the bivariate normal distribution is the parameter of
interest and compare the performance for moderate sample sizes. It turns out that there is
a general class of priors which satisfies all the matching criteria.
4.1 The Orthogonal Parameterization
We will continue in the setup of Section 3.1 of Chapter 3 where (X1i, X2i), (i =
1, . . . , n) are independent and identically distributed random variables having a bivariate
normal distribution with means µ1 and µ2, variances σ12(> 0) and σ2
2(> 0), and
74
correlation coefficient ρ (|ρ| < 1). We use the same transformation
θ1 = σ1/σ2, θ2 = σ1σ2(1− ρ2)1/2
and ψ = ρ, (4.1.1)
and obtain the bivariate normal pdf as in (3.1.2).
With this reparameterization, the Fisher Information matrix reduces to
I(µ1, µ2, θ1, θ2, ψ) =
A 0
0 D
, (4.1.2)
where
A =
1θ1θ2(1−ψ2)1/2 − ψ
θ2(1−ψ2)1/2
− ψθ2(1−ψ2)1/2
θ1
θ2(1−ψ2)1/2
and
D = Diag (1
θ21(1− ψ2)
,1
θ22
,1
(1− ψ2)2).
This establishes immediately the mutual orthogonality of (µ1, µ2), θ1,θ2 and ψ in the
sense of Huzurbazar (1950) and Cox and Reid (1987). Such orthogonality is often referred
to as “Fisher Orthogonality”.
The inverse of the information matrix is simply then
I−1(µ1, µ2, θ1, θ2, ψ) =
A−1 0
0 D−1
, (4.1.3)
where
A−1 =
θ1θ2
(1−ψ2)1/2ψθ2
(1−ψ2)1/2
ψθ2
(1−ψ2)1/2θ2
θ1(1−ψ2)1/2
(4.1.4)
and
D−1 = Diag ( θ21(1− ψ2), θ2
2, (1− ψ2)2). (4.1.5)
For subsequent sections, we need also a few other results which are collected in the
following lemma.
75
Lemma 4.1 For the bivariate normal density given in (3.1.2),
E(∂3logf
∂θ21∂ψ
)= − ψ
θ21(1− ψ2)2
, E(∂3logf
∂θ21∂θ2
)=
1
θ21θ2(1− ψ2)
; (4.1.6)
E(∂3logf
∂θ31
)=
3
θ31(1− ψ2)
; (4.1.7)
E
((∂logf
∂θ1
)(∂2logf
∂θ21
))= − 1
θ31(1− ψ2)
; (4.1.8)
E( ∂3logf
∂θ1∂ψ2
)= 0,E
(∂3logf
∂θ1∂θ22
)= 0; (4.1.9)
Proof Note that E(X1−µ1)2 = θ1θ2(1−ψ2)−1/2 and E(X2−µ2)
2 = θ−11 θ2(1−ψ2)−1/2.
We begin with (4.1.6).
E(∂3logf
∂θ21∂ψ
)=E
(− ψ(X1 − µ1)2
(1− ψ2)3/2θ31θ2
)
=− ψ
θ21(1− ψ2)2
and
E(∂3logf
∂θ21∂θ2
)=E
( (X1 − µ1)2
(1− ψ2)1/2θ31θ
22
)
=θ1θ2
θ31θ
22(1− ψ2)
=1
θ21θ2(1− ψ2)
.
To prove (4.1.7), we see that
E(∂3logf
∂θ31
)=E
( 3(X1 − µ1)2
(1− ψ2)1/2θ41θ2
)
=3θ1θ2
θ41θ2(1− ψ2)
=3
θ31(1− ψ2)
.
Next, (4.1.8) holds because from the Bartlett Identity
E
((∂logf
∂θ1
)(∂2logf
∂θ21
))=− E
(∂3logf
∂θ31
)− ∂
∂θ1
( 1
θ21(1− ψ2)
)
=− 3
θ31(1− ψ2)
+2
θ31(1− ψ2)
=− 1
θ31(1− ψ2)
.
76
Finally, we see that (4.1.9) holds because
E( ∂3logf
∂ψ2∂θ1
)=− 1
2θ2
{1 + 2ψ2
(1− ψ2)5/2
}E
(− (X1 − µ1)
2
θ21
+ (X2 − µ2)2
)
=− 1
2θ2
{1 + 2ψ2
(1− ψ2)5/2
}(− θ2
θ1(1− ψ2)1/2+
θ2
θ1(1− ψ2)1/2
)
=0
while
E(∂3logf
∂θ1∂θ22
)=− 1
(1− ψ2)1/2θ32
{− (X1 − µ1)2
θ21
+ (X2 − µ2)2)
}
=− 1
(1− ψ2)1/2θ32
{− θ1θ2
θ21(1− ψ2)1/2
+θ2
θ1(1− ψ2)1/2
}= 0.
We derive the matching priors in the next few sections.
4.2 Quantile Matching Priors
Due to orthogonality of (θ1, θ2) with (µ1, µ2, ψ), from (1.2.3.4), the class of first order
matching priors is characterized by
π(µ1, µ2, θ1, θ2, ψ) ∝ θ−11 (1− ψ2)−1/2g0(µ1, µ2, ψ, θ2). (4.2.1)
As is often customary to assign a uniform prior to (µ1, µ2) on R2, we will consider only the
subclass of priors where g0(µ1, µ2, ψ, θ2) = g(ψ, θ2).
A prior of the form π ∝ θ−11 (1 − ψ2)−1/2g(ψ, θ2) satisfies the second-order quantile
matching property if and only if (see (1.2.3.5) of Chapter 1) g satisfies the relation
∂
∂θ2
{θ−11 (1− ψ2)1/2g θ2
1θ22E
(∂3logf
∂θ21∂θ2
)}+
∂
∂ψ
{θ−11 (1− ψ2)1/2g θ2
1(1− ψ2)2E(∂3logf
∂θ21∂ψ
)}
+1
6(1− ψ2)−1/2g
∂
∂θ1
{θ31(1− ψ2)3/2E
(∂logf
∂θ1
)3}= 0
(4.2.2)
From (4.1.5)-(4.1.7), (4.2.2) simplifies to
θ−11 (1− ψ2)−1/2 ∂
∂θ2
{g θ2
}− θ−11
∂
∂ψ
{g ψ(1− ψ2)1/2
}= 0. (4.2.3)
77
Now let g be the class of functions given by g(θ2, ψ) = θa2 |ψ|a(1 − ψ2)−
a+22 . With this
choice of g the left hand side of the above equation reduces to
θ−11 |ψ|a(1− ψ2)−
a+32
∂
∂θ2
θa+12 − θ−1
1 θa2
∂
∂ψ
{|ψ|a+1(1− ψ2)−a+32
+1sgn(ψ)}
= (a + 1)θ−11 θa
2 |ψ|a(1− ψ2)−a+32 − θ−1
1 θa2
{|ψ|a+1(− a + 1
2
)(1− ψ2)−
a+32 (−2ψ)sgn(ψ)
+ (1− ψ2)−a+32
+1(a + 1)|ψ|asgn2ψ}
= (a + 1)θ−11 θa
2 |ψ|a(1− ψ2)−a+32
{1− (
ψ2 + (1− ψ2))}
= 0.
(4.2.4)
Thus every prior π(µ1, µ2, θ1, θ2, ψ) ∝ θ−11 θa
2 |ψ|a(1 − ψ2)−a+32 is a second order probability
matching prior for θ1. Due to the invariance property of such a prior, back to the original
parameterization, a second order matching prior for σ1
σ2is given by π(µ1, µ2, σ1, σ2, ρ) ∝
σa1σ
a2 |ρ|a(1− ρ2)−1.
4.3 Matching Via Distribution Functions
The class of first order quantile matching priors is also first order matching via
distribution functions. Under orthogonality of θ1 with (θ2, . . . , θp), it follows from (1.3.2.1)
and (1.3.2.2) of Chapter 1 that in order that this class of priors also satisfies the second
order distribution function matching criterion, it needs to satisfy the two differential
equations
A1 =∂2
∂θ12
(I11π(θ))− 2
∂
∂θ1
(I11 ∂
∂θ1
π)−p∑
s=2
p∑v=2
∂
∂θs
{E
( ∂3logf
∂θ12∂θs
)I11Isvπ(θ)
}
−p∑
s=2
p∑v=2
∂
∂θ1
{E
( ∂3logf
∂θ1∂θs∂θv
)I11Isvπ(θ)
}= 0.
(4.3.1)
and
A2 =
p∑s=2
p∑v=2
∂
∂θs
{E
( ∂3logf
∂θ12∂θs
)I11Isvπ(θ)
}= 0. (4.3.2)
In our context, when θ1 = σ1
σ2is the parameter of interest, any class of priors of the form
π ∝ θ−11 (1 − ψ2)−1/2g(ψ, θ2) ensures matching of the posterior and frequentist cumulative
78
distribution functions at the second order if from (4.3.1), (4.3.2) and (4.1.5)
g∂2
∂θ21
{θ−11 θ2
1(1− ψ2)1/2}− 2g
∂
∂θ1
{θ21(1− ψ2)1/2 ∂
∂θ1
θ−11
}
− ∂
∂θ2
{θ−11 g θ2
1(1− ψ2)1/2θ22E
(∂3logf
∂θ21∂θ2
)}− ∂
∂ψ
{θ−11 g θ2
1(1− ψ2)1/2(1− ψ2)2E(∂3logf
∂θ21∂ψ
)}
− g∂
∂θ1
{E
( ∂3logf
∂θ1∂θ22
)θ21(1− ψ2)1/2θ2
2θ−11
}− g∂
∂θ1
{E
( ∂3logf
∂θ1∂ψ2
)θ1
2(1− ψ2)1/2(1− ψ2)2θ−11
}
= 0
(4.3.3)
and
g(1− ψ2)−1/2 ∂
∂θ1
{E
(∂3logf
∂θ31
)θ−11 θ4
1(1− ψ2)2}
= 0. (4.3.4)
From (4.1.6) and (4.1.9) of Lemma 4.1, (4.3.3) reduces to
− θ−11 (1− ψ2)−1/2 ∂
∂θ2
{g θ2
}− θ−11
∂
∂ψ
{g ψ(1− ψ2)1/2
}= 0 (4.3.5)
while the left hand side of (4.3.4) reduces to 3g(1 − ψ2)−1/2 ∂∂θ1
{(1 − ψ2)
}which is clearly
0 for any g. So, we need to find g such that (4.3.5) is satisfied. In particular, (4.3.5) is
satisfied if we let g once again to be the class of functions g(θ2, ψ) = θa2 |ψ|a(1 − ψ2)−
a+22 .
In other words, the same class of priors enjoy second order matching for both quantiles as
well as distribution functions.
4.4 Highest Posterior Density (HPD) Matching Priors
We now turn attention to HPD matching priors for θ1. We will consider priors which
ensure that HPD regions with credibility level 1 − α also have asymptotically the same
frequentist coverage probability, the error of approximation being o(n−1). From (3.1.1)
of Chapter 1 any second order matching prior for posterior quantiles of θ1 is also HPD
matching for θ1 in the special case of models satisfying
∂
∂θ1
(Iθ1θ1E
(∂3logf
∂θ31
))= 0. (4.5.1)
79
It is easy to check that when θ1 = σ1
σ2is the parameter of interest from (4.1.5)
and (4.1.7), (4.5.1) holds and hence the second order quantile matching prior π ∝θ−11 θa
2 |ψ|a(1− ψ2)−a+32 is also HPD matching.
4.5 Matching Priors Via Inversion of Test Statistics
In this section, once again, we focus on priors that ensure approximate frequentist
validity of posterior credible regions obtained by inverting the likelihood ratio test
statistic. Then from (1.5.1.5), a likelihood ratio matching prior π is obtained by solving
∂
∂θ2
{π θ2
1(1− ψ2)θ22E
(∂3logf
∂θ21∂θ2
)}+
∂
∂ψ
{π θ2
1(1− ψ2)(1− ψ2)2E(∂3logf
∂θ21∂ψ
)}
+∂
∂θ1
{θ21(1− ψ2)
{∂
∂θ1
π − π(θ21(1− ψ2)E
((∂logf
∂θ1
)(∂2logf
∂θ21
))
− θ22E
(∂3logf
∂θ1∂θ22
)− (1− ψ2)2E( ∂3logf
∂θ1∂ψ2
))}}= 0
(4.6.1)
Then from (4.1.6),(4.1.8) and (4.1.9) of Lemma 4.1, (4.6.1) reduces to
∂
∂θ2
{π θ2
}+
∂
∂ψ
{π ψ(1− ψ2)
}+
∂
∂θ1
{θ21(1− ψ2)
{ ∂
∂θ1
π + πθ−11
}}= 0 (4.6.2)
Consider once again π(µ1, µ2, θ1, θ2, ψ) ∝ θ−11 θa
2 |ψ|a(1 − ψ2)−a+32 . Then ∂
∂θ1π + πθ−1
1 = 0,
and the left hand side of (4.6.2) simplifies to
θ−11 |ψ|a(1− ψ2)−
a+32
∂
∂θ2
θa+12 − θ−1
1 θa2
∂
∂ψ
{|ψ|a+1(1− ψ2)−a+32
+1}
which is exactly the same as the left hand side of (4.2.4), and leads to the same class of
matching priors as before. With this we conclude that we have been able to find a class
of priors π(µ1, µ2, σ1, σ2, ρ) ∝ σa1σ
a2 |ρ|a(1 − ρ2)−1 which satisfies all the different matching
criteria.
4.6 Propriety of the Posteriors
We now establish the propriety of the posteriors. A prior of the form π ∝ θ−11 θa
2 |ψ|a(1−ψ2)−
a+32 satisfies the various matching properties discussed above. Also, the joint posterior
80
of µ1, µ2, θ1, θ2, ψ given X is
π(µ1, µ2, θ1, θ2, ψ|X)
∝ θ−n2 exp
{− 1
2(1− ψ2)1/2θ2
n∑i=1
{(X1i − µ1)2
θ1
+ θ1(X2i − µ2)2 − 2ψ(X1i − µ1)(X2i − µ2)
}}
× θ−11 θa
2 |ψ|a(1− ψ2)−a+32 .
(4.7.1)
Next consider the transformation
ρ = ψ , σ1 =(θ1θ2)
1/2
(1− ψ2)1/4and σ2 =
θ1/22
θ1/21 (1− ψ2)1/4
.
The posterior using the above transformation can be written as
π(µ1, µ2, σ1, σ2, ρ|X)
∝ exp
{− 1
2(1− ρ2)
n∑i=1
{(X1i − µ1)2
σ21
+(X2i − µ2)
2
σ22
− 2ρ(X1i − µ1)(X2i − µ2)
σ1σ2
}}
× (σ1σ2)a−n|ρ|a(1− ρ2)−n/2−1.
(4.7.2)
Now , integrating out µ1 and µ2, we obtain
π(σ1, σ2, ρ|X) ∝ (σ1σ2)−n+a+1|ρ|a(1−ρ2)−
n+12 exp
{− 1
2(1− ρ2)
{S11
σ21
+S22
σ22
−2ρS12
σ1σ2
}}(4.7.3)
Consider another transformation
z1 = σ21(1− ρ2) , z2 = σ2
2(1− ρ2) and z3 = ρ.
Then, using a series expansion, the posterior can be written as
π(z1, z2, z3|X) ∝ (1− z23)−n+1
2+n−a|z3|a(z1z2)
−n+a2 exp
{− 1
2
{S11
z1
+S22
z2
− 2z3S12√z1z2
}}
∝ (1− z23)
n−12−a(z1z2)
−n+a2 exp
{− 1
2(S11
z1
+S22
z2
)
}
×∞∑
r=0
(z3S12)r
zr/21 z
r/22 r!
.
81
On integrating z1 and z2 the marginal posterior of z3 is
π(z3|X) ∝ |z3|a(1− z23)
n−12−a
∞∑r=0
zr3
(2R)r
r!Γ2(
n + r − a− 2
2) for a < n− 2 (4.7.4)
Clearly, for r odd, the integral∫ 1
−1zr3|z3|a (1 − z2
3)n−1
2−a dz3 = 0. So in order to show the
propriety, we need to only show that
I =
∫ 1
−1
|z3|a(1− z23)
n−12−a
{ ∞∑r=0
z2r3
(2R)2r
(2r)!Γ2(
n− a + 2r − 2
2)}dz3 < ∞.
To this end, we first note that
∫ 1
−1
z2r3 |z3|a (1− z2
3)n−1
2−a dz3 = 2
∫ 1
0
z2r+a3 (1− z2
3)n−1−2a
2 dz3
=
∫ 1
0
ur+a+12−1 (1− u)
n+1−2a2
−1 du
= Beta (r +a + 1
2,n + 1− 2a
2)
=Γ(r + a+1
2)Γ(n+1
2− a)
Γ(n+2r−a−22
).
So the posterior now reduces to
I =∞∑
r=0
(2R)2r
(2r)!Γ(r +
a + 1
2)Γ(
n− a + 2r − 2
2). (4.7.5)
By the Legendre duplication formula, (2r)! = Γ(2r + 1) = Γ(r + 12)Γ(r + 1)22r/π1/2. Hence
writing k (> 0) as a generic constant which does not depend on r,
I = k
∞∑r=0
R2r
r!
Γ(r + n−a−22
)Γ(r + a+12
)
Γ(r + 12)
.
Writing the sum as∑∞
r=0 ar, it follows that
ar+1
ar
=R2
r + 1
r + a+12
r + 12
(r +n− a− 2
2) → R2 < 1 as r →∞.
Hence, the summation converges by the ratio rule of convergence, so that the posterior
π(z3|x) and accordingly the joint posterior π(µ1, µ2, σ1, σ2, ρ|X) is proper.
82
4.7 Simulation Study
Using the parameterization θ1 = σ1/σ2, θ2 = σ1σ2(1− ρ2)1/2
and ψ = ρ, as
before, our parameter of interest is θ1. A general class of priors was obtained as π ∝θ−11 θa
2 |ψ|a(1 − ψ2)(−a+32
). This prior satisfies quantile matching, matching via distribution
functions, HPD matching as well as likelihood ratio matching property.
There are three priors that we wish to compare. The first one is π ∝ θ−11 . This was
recommended by Staicu (2007) in her PhD dissertation showing that this prior achieves
matching up to O(n−3/2). The second prior is π ∝ θ−11 (1 − ψ2)−3/2. This was suggested
by Mukerjee and Reid (2001). This is a special case (a=0) of the class of priors that we
obtained satisfying all the matching criteria. Finally, the prior π ∝ θ−11 θ−1
2 (1 − ψ2)−1 was
recommended by Berger and Sun (2007). This is also one-at-a-time reference prior for each
one of the parameters θ1, θ2 and ψ satisfying the first order matching property.
In order to evaluate the three different priors, we undertook a simulation study where
data was generated from a bivariate normal distribution with (µ1, µ2, σ2, ρ) = (0, 0, 1, 0.5)
and varying values of σ1 and varying sample sizes n. The values of θ1 varied from 0.5 to
2.0.
Since the full conditional distribution of the parameters under any of the three priors
do not follow a standard distributional form, we used Gibbs sampling with componentwise
Metropolis-Hastings updates at each iteration to generate random numbers from the
conditional posterior distributions of each parameter (Robert and Casella, 2001). We
ran two chains with different initial values and allowed a burn-in of 10000 each. A
random-walk jumping density with normal noise added to the existing value in the
chain for the means and log standard deviations were used. The correlations also had a
random walk prior by adding a small normal noise to the old values. Each chain was run
40,000 times and convergence was judged by a Gelman-Rubin (Gelman and Rubin, 1992)
diagnostic. The trace plot presenting the time history of the last 8000 iterations for all
five parameters is presented for a sample simulated dataset with θ1 = 0.7, under Prior 3
83
and sample size 20, in Figure 4-1. Figure 4-2 presents the plot of Gelman-Rubin diagnostic
for the θ1 chain under the same setting with diagnostic values close to 1 suggesting
convergence. Figures 4-3, 4-4 and 4-5 are posterior distributions for θ1 under three
different priors for four different sample sizes n = 10, 20, 30, 40. One can immediately make
the following observations. Though there are certain numerical differences, the posterior
distribution of θ1 does not seem to vary widely between priors 1,2 and 3, even for smaller
sample sizes, though Prior 2 typically gave smaller posterior standard deviations. As data
information increases with sample size, the posterior distributions become very similar
under the three priors. Some skewness can be observed in the posterior distributions for
smaller sample sizes, which was often noted during our simulation, but the distribution
becomes fairly symmetric as n becomes large. The posterior distribution also becomes
more concentrated around the true value of θ1 with increasing n, as expected.
We repeated our Gibbs sampling estimation technique for 500 datasets under each
configuration of θ1 and n. Each time, we computed the posterior mean, the 95% quantile
interval (as given by the 2.5th and 97.5th sample percentile of the randomly generated
parameter values after the burn-in period) and the 95% HPD interval. Table 4-1 presents
the average of posterior means, the mean squared error, the frequentist coverage of the
Bayesian credible intervals (as estimated by the proportion of times the true parameter
value falls in the corresponding credible intervals) across the 500 datasets and under three
different priors. Some interesting differences can be noted in the behavior for smaller
sample sizes. Prior 2 appears to be performing best in terms of both coverage of quantile
and HPD intervals and also has excellent point estimation properties in terms of average
posterior mean and MSE for smaller sample sizes. For larger sample sizes all three priors
become almost indistinguishable in terms of their performances.
84
Tab
le4-
1.Sim
ula
tion
Res
ult
toC
ompar
eth
eT
hre
eD
iffer
ent
Pri
ors
Sugg
este
dfo
rB
ivar
iate
Nor
mal
Rat
ioof
Sta
ndar
dD
evia
tion
Par
amet
erθ 1
.T
he
Tru
ePar
amet
erSet
tings
are
µ1
=µ
2=
0,σ
2=
1an
dV
aryin
gV
alues
ofθ 1
=σ
1as
Lis
ted.
Pri
or1:
π∝
θ−1
1,P
rior
2:π∝
(1−
ρ2)−
3/2θ−
11
,P
rior
3:π∝
(1−
ψ2)−
1θ−
11
θ−1
2.
Res
ult
sar
ebas
edon
500
sim
ula
ted
dat
aset
s.∗ :
Ave
rage
valu
efo
rpos
teri
orm
ean
ofθ 1
,av
erag
edac
ross
500
sim
ula
ted
dat
aset
s.θ 1
nP
rior
1P
rior
2P
rior
3θ 1∗
MSE
Cov
erag
eC
over
age
θ 1M
SEC
over
age
Cov
erag
eθ 1
MSE
Cov
erag
eC
over
age
(Qua
ntile
)(H
PD
)(Q
uant
ile)
(HP
D)
(Qua
ntile
)(H
PD
)
100.
520.
020.
930.
940.
510.
020.
950.
950.
520.
020.
930.
930.
520
0.51
0.01
0.94
0.94
0.50
0.01
0.95
0.94
0.51
0.01
0.94
0.94
300.
500.
010.
940.
940.
500.
010.
950.
960.
50.
010.
940.
9440
0.51
0.01
0.96
0.95
0.51
0.01
0.96
0.95
0.51
0.01
0.95
0.95
101.
060.
140.
930.
941.
060.
120.
950.
951.
070.
140.
930.
931.
020
1.01
0.04
0.96
0.94
1.01
0.03
0.95
0.95
1.01
0.04
0.95
0.94
301.
010.
030.
960.
951.
010.
020.
960.
951.
010.
030.
950.
9540
1.01
0.02
0.95
0.94
1.01
0.02
0.95
0.95
1.01
0.02
0.95
0.94
101.
610.
250.
920.
941.
610.
230.
950.
951.
610.
260.
930.
941.
520
1.54
0.12
0.94
0.92
1.54
0.11
0.93
0.93
1.54
0.12
0.93
0.93
301.
510.
060.
940.
941.
510.
050.
930.
921.
510.
060.
930.
9340
1.52
0.05
0.94
0.94
1.52
0.05
0.95
0.95
1.52
0.05
0.95
0.95
102.
110.
440.
940.
942.
110.
420.
950.
942.
110.
420.
960.
942.
020
2.04
0.18
0.95
0.95
2.04
0.18
0.95
0.95
2.04
0.18
0.95
0.95
302.
040.
110.
950.
962.
040.
100.
950.
952.
040.
110.
940.
9540
2.03
0.08
0.95
0.95
2.03
0.07
0.95
0.95
2.03
0.08
0.95
0.94
85
Figure 4-1. Sample trace plot for all the parameters under Prior 3 for n=20 under thesimulation setting of Section 4.8, True value of θ1 = 0.7.
86
Figure 4-2. Plot of Gelman-Rubin Diagnostic Statistic for θ1 under Prior 3 for n=20 underthe simulation setting of Section 4.8, True value of θ1 = 0.7.
87
Figure 4-3. Posterior Distribution for θ1 under Prior 1 for Different Sample Sizes, underthe Simulation Setting of Section 4.8. True value of θ1 = 0.7.
88
Figure 4-4. Posterior Distribution for θ1 under Prior 2 for Different Sample Sizes, underthe Simulation Setting of Section 4.8. True value of θ1 = 0.7.
89
Figure 4-5. Sample Posterior Distribution for θ1 under Prior 3 for Different Sample Sizes,under the Simulation Setting of Section 4.8. True value of θ1 = 0.7.
90
CHAPTER 5SUMMARY
Study of probability matching priors, that ensure approximate frequentist validity of
posterior credible sets, has received much attention in recent years. In this dissertation,
we develop some such priors for parameters and some functions of the parameters of a
bivariate normal distribution. The criterion used is the asymptotic matching of coverage
probabilities of Bayesian credible intervals with the corresponding frequentist coverage
probabilities. The paper uses various matching criteria, namely, quantile matching,
matching of distribution functions, highest posterior density matching, and matching via
inversion of test statistics. Orthogonal parameterizations were obtained which simplified
the differential equations that needed to be solved for obtaining these matching priors.
First, we considered the (i) regression coefficient, (ii) the generalized variance, i.e. the
determinant of the variance-covariance matrix, and (iii) ratio of the conditional variance
of one variable given the other divided by the marginal variance of the other variable as
the parameters of interest. Here we have been able to find a single prior which meets all
the four matching criteria for every one of these parameters. The agreement between the
frequentist and posterior coverage probabilities of HPD intervals is quite good for the
probability matching priors even for small sample sizes.
Next we consider the bivariate normal correlation coeffcient as the parameter of
interest. Here we obtain different priors satisfying the different matching criteria and
compare their performance for moderate sample sizes. There however, does not exist
a prior that satisfies the matching via distribution functions criterion. In addition,
we develop inference based on certain modifications of the profile likelihood, namely
conditional profile likelihood, adjusted profile likelihood and integrated likelihood. One
common feature of all the modified likelihoods is that they are all dependent on the data
only through the sample correlation coefficient r.
91
Finally, we considered the ratio of the standard deviations of the bivariate normal
distribution as our parameter of interest. A general class of priors was obtained in this
case which satisfied all the matching criteria. A specific prior from this class was chosen
and it’s performance was compared with some other commonly used priors. The chosen
prior seems to be performing the best in terms of both coverage of quantile and HPD
intervals and also has excellent point estimation properties in terms of average posterior
mean and MSE for smaller sample sizes.
Recently, Sun and Berger (2006) have illustrated objective Bayesian inference for
the multivariate normal distribution using different types of formal objective priors,
different modes of inference and different criteria involved in selecting optimal objective
priors. They, in particular, focus on reference priors, and show that the right-Haar
prior is a one-at-time reference prior for many parameters and functions of parameters.
Our future research will concentrate on finding probability matching priors for the
multivariate analogs of the bivariate normal parameters. Here interest lies in several
parameters or parametric functions. For instance, we may be interested in the generalized
variance, the regression matrix or the correlation matrix. Then posterior quantiles
are not well-defined. HPD regions and credible region via the LR statistic remain
meaningful and of much interest. They can be used to find matching priors. Also the
joint posterior c.d.f remains meaningful and provides a viable route for finding matching
priors. Orthogonal parameterizations are not guaranteed. However if found, they will
simplify the computations.
92
REFERENCES
Bayarri, M.J. (1981), “Inferencia bayesiana sobre elcoeficientede correlacin de unapoblacin normal bivariante,” in Trabajos de Estadistica e Investigacion Operativa, 32,18-31.
Bartlett, M. S. (1937), “Properties of Sufficiency and Statistical Tests”, Pro. Roy.Soc. London A, 160, 268-282.
Berger, J., and Bernardo, J. M. (1992a), “On the Development of Reference Priors”(with discussion), in Bayesian Statistics 4, eds. J. M. Bernardo, J.O. Bereger, A. P.Dawid, and A. F. M. Smith, Oxford, U.K.: Oxford University Press, pp. 35-60.
Berger, J., Liseo, B., and Wolpert, R.L. (1999), “Integrated Likelihood Methods forEliminating Nuisance Parameters,” Statistical Science, 14, 1-22.
Berger, J., and Sun, D. (2007), “Objective priors for the Bivariate Normal Model ,”to appear in the Annals of Statistics.
Berger, J., and Sun, D. (2006), “Objective priors for a Bivariate Normal Model withMultivariate Generalizations,” in ISDS Technical Report, Duke University.
Bernardo, J. M. (1979), “Reference Posterior Distributions for Bayesian Inference”(with discussion), Journal of the Royal Statistical Society, Ser. B, 41, 113-147.
Cox, D. R., and Reid, N. (1987), “Orthogonal Parameters and ApproximateConditional Inference” (with discussion), Journal of the Royal Statistical Society,Ser. B, 49, 1-39.
Datta, G. S., and Ghosh, J. K. (1995a), “Noninformative Priors for MaximalInvariant Parameter in Group Models,” Test, 4, 95-114.
Datta, G. S., and Ghosh, J. K. (1995b), “On Priors Providing Frequentist Validityfor Bayesian Inference,” Biometrika, 82, 37-45.
Datta, G. S., and Ghosh, M. (1995a), “Some Remarks on Noninformative Priors,”Journal of the American Statistical Association, 90, 1357-1363.
Datta, G. S., and Ghosh, M. (1996), “On the Invariance of Noninformative Priors,”Annals of Statistics, 24, 141-159.
Datta, G. S., Ghosh, M., and Mukerjee, R. (2000), “Some New Results onProbability Matching Priors,” Calcutta Statistics Association Bulletin, 50, 179-192.
Datta, G. S., and Mukerjee, R. (2004), Probability Matching Priors: Higher OrderAsymptotics, Lecture notes in Statistics, Springer, New York.
Datta, G. S., and Sweeting, T. J. (2005), “Probability Matching Priors,” Handbookof Statistics, Vol 25: Bayesian Thinking: Modeling and Computation, eds. D. Dey,and C.R. Rao, Elsevier, pp. 91-114.
93
Dawid, A. P., Stone, M., and Zidek, J. V. (1973), “Marginalization Paradoxes inBayesian and Structural Inference” (with discussion), Journal of the Royal StatisticalSociety, Ser. B, 35, 189-233.
DiCicio, T. J., and Stern, S.E. (1994), “Frequentist and Baysian Bartlett Correctionof Test Statistics based on Adjusted Profile Likelihoods,” Journal of the RoyalStatistical Society, Ser. B, 56, 397-408.
Fisher, R. A. (1956), Statistical Methods and Scientific Inference, Oliver and Boyd,Edinburgh.
Garvan, C. W., and Ghosh, M. (1997), “Noninformative Priors for DispersionModels,” Biometrika, 84, 976-982.
Garvan, C. W., and Ghosh, M. (1999), “On the Property of Posteriors for DispersionModels,” Journal of Statistical Planning and Inference, 78, 229-241.
Gelman, A., and Rubin, D.B. (1992), “Inference from Iterative Simulation UsingMultiple Sequences,” Statistical Science, 7, 457-472.
Ghosh, J. K. (1994), Higher Order Asymptotics, Institute of Mathematical Statisticsand American Statistical Association, Hayward, California.
Ghosh, J. K., and Mukerjee, R. (1991), “Characterization of Priors under whichBayesian and Frequentist Bartlett Corrections are Equivalent in the MultiparameterCase,” Journal of Multivariate Analysis, 38, 385-393.
Ghosh, J. K., and Mukerjee, R. (1992b), “ Bayesian and Frequentist BartlettCorrections for Likelihood Ratio and Conditional Likelihood Ratio Tests,” Journal ofthe Royal Statistical Society, Ser. B, 54, 867-875.
Ghosh, J. K., and Mukerjee, R. (1993a), “On Priors that match Posterior andFrequentist Distribution Functions,” Canadian Journal of Statistics, 21, 89-96.
Ghosh, J. K., and Mukerjee, R. (1993b), “Frequentist Validity of Higher PosteriorDensity Regions in Multiparameter Case,” Annals of the Institute of StatisticalMathematics, 45, 293-302.
Ghosh, J. K., and Mukerjee, R. (1994b), “Adjusted versus Conditional Likelihood: Power Properties and Bartlett-type Adjustment,” Journal of the Royal StatisticalSociety, Ser. B, 56, 185-188.
Ghosh, J. K., and Mukerjee, R. (1995), “Frequentist Validity of Higher PosteriorDensity Regions in the presence of Nuisance Parameters,” Statistical Decisions, 13,131-139.
Ghosh, M., Carlin, B. P., and Srivastava, M. S. (1995), “Probability Matching Priorsfor Linear Calibration,” Test, 4, 333-357.
94
Ghosh, M., and Kim, Y-H. (2001), “The Behrens-Fisher problem revisited: A BayesFrequentist Synthesis,” Canadian Journal of Statistics, 29, 5-17.
Ghosh, M., and Mukerjee, R. (1998), “Recent Developments on ProbabilityMatching Priors,” in: S. E. Ahmed, M. Ahsanullah and B. K. Sinha, eds., AppliedStatistical Science, III,, Nova Science Publishers, New York, pp. 227-252.
Ghosh, M., and Yang, M. C. (1996), “Noninformative problem for the Two-SampleNormal problem,” Test, 5, 145-157.
Ghosh M.(2001), “Interval Estimation for a Binomial Proportion: Comment,”Statistical Science, 16, 124-125.
Godambe, V.P. (1960), “An Optimum Property of Regular Maximum LikelihoodEstimation,” Annals of Mathematical Statistics, 31, 1208-1211.
Huzurbazar, V. S. (1950), “Probability Distributions and Orthogonal Parameters,”in Proceedings Cambridge Phil. Society, 46, 281-284.
Jeffreys, H. (1961), Theory of Probability, Oxford, U.K.: Oxford University Press.
Kalbfleisch, J.D., and Sprott, D.A. (1970), “Application of Likelihood Methods toModels Involving Large number of Parameters” (with discussion), Journal of theRoyal Statistical Society, Ser. B, 32, 175-208.
Kass, R. E., and Wasserman, L. (1996), “The Selection of Prior Distributions byFormal Rules,” Journal of the American Statistical Association, 91, 1343-1370.
Kendall, M.G., and Stuart, A. (1969), The Advanced Theory of Statistics, Vol 1: 390,Hafner Publishing Company, New York.
Lee, C.B. (1989), Comparison of Frequentist Coverage Probability and BayesianPosterior Coverage Probability, and Applications. Unpublished Ph.D. dissertation,Purdue university, Indiana.
Lindsay, B. (1982), “Conditional Score Functions: Some Optimality Results,”Biometrika, 69, 503-512.
Lindley, D.V. (1965), Introduction to Probability and Statistics from a BayesianViewpoint, Cambridge University Press: Cambridge.
McCullagh, P., and Tibshirani, R. (1990), “A Simple Method for the Adjustment ofProfile Likelihoods,” Journal of the Royal Statistical Society, Ser. B, 52, 325-344.
Morgan, W.A. (1939), “A Test for the Significance of the Difference Between theTwo Variances in a Sample from a Normal Bivariate Population,” Biometrika, 31,13-19.
95
Mukerjee, R., and Dey, D. K. (1993), “Frequentist Validity of Higher PosteriorQuantiles in the Presence of Nuisance Parameters: Higher Order Asymptotics,”Biometrika, 80, 499-505.
Mukerjee, R., and Ghosh, M.(1997), “Second Order Probability Matching Priors,”Biometrika, 84, 970-975.
Mukerjee, R., and Reid, N. (2001), “Second-Order Probability Matching Priors fora Parametric Function with Application to Bayesian Tolerance Limits,” Biometrika,88, 587-592.
Nicolaou, A. (1993), “Bayesian Intervals with Good Frequency Behavior in thePresence of Nuisance Parameters,” Journal of the Royal Statistical Society, Ser. B,55, 377-390.
Peers, H. W. (1965), “Confidence Properties of Bayesian Interval Estimates,”Journal of the Royal Statistical Society, Ser. B, 30, 535-544.
Pitman, E.J.G. (1939), “A Note on Normal Correlation,” Biometrika, 31, 9-12.
Rao, C.R., and Mukerjee, R. (1995), “On Posterior Credible Sets based on the ScoreStatistic,” Statistica Sinica, 5, 781-791.
Roy, S.N., and Potthoff, R.F. (1958), “Confidence Bounds on Vector Analogues ofthe “Ratio of Means” and the “Ratio of Variances” for Two Correlated NormalVariates and Some Associated Tests,” The Annals of Mathematical Statistics, 29,829-841.
Severini, T. A. (1991), “On the Relationship between Bayesian and Non-BayesianInterval Estimates,” Journal of the Royal Statistical Society, Ser. B, 53, 611-618.
Severini, T. A., Mukerjee, R., and Ghosh, M. (2002), “On an Exact ProbabilityMatching Property of Right-Invariant Priors,” Biometrika, 89, 952-957.
Staicu, A. (2007), “On Some Aspects of Likelihood Methods with Applications inBiostatistics,” Unpublished PhD dissertation, University of Toronto, Toronto.
Stein, C, (1985), “On the Coverage Probability of Confidence Sets Based on a PriorDistribution,” in Sequential Methods in Statistics, Banach Center Publications, 16,Warsaw: Polish Scientific Publishers, pp. 485-514.
Sun, D., and Berger, J. (2006), “Objective Bayesian Analysis for the MultivariateNormal Model,” ISDS Technical Report, Duke University. To appear in BayesianStatistics 8, eds. J.M. Bernardo, et. al., Oxford, U.K., Oxford University Press.
Tibshirani, R. (1989), “ Noninformative Priors for One Parameter of Many,”Biometrika, 76, 604-608.
96
Welch, B. L., and Peers, H. W. (1963), “On Formulae for Confidence Points Basedon Integrals of Weighted Likelihoods,” Journal of the Royal Statistical Society, Ser.B, 25, 318-329.
Yin, M., and Ghosh, M. (1997), “A Note on the Probability Difference BetweenMatching Priors Based on Posterior Quantiles and on Inversion of ConditionalLikelihood Ratio Statistics,” Calcutta Statistical Association Bulletin, 47, 59-65.
97
BIOGRAPHICAL SKETCH
Upasana Santra was born on March 4, 1977 in Kanpur, India. She graduated from St.
Mary’s Convent High School, Kanpur in 1995. She earned her B.Sc. from Banaras Hindu
University, Varanasi and her M.Sc. from Indian Institute of Technology, Kanpur in 1998
and 2000, respectively, majoring in Statistics.
Upon arriving to the United States with her husband, Swadeshmukul Santra, she
worked as a Statistical Consultant in the Statistics Unit of IFAS at the University of
Florida. She earned her M.S. in Statistics in 2003 from the University of Florida and
continued for her Ph.D. degree thereafter.
98