objective priors for the bivariate normal modelberger/papers/bivariate.pdfspecial cases of this...

20
The Annals of Statistics 2008, Vol. 36, No. 2, 963–982 DOI: 10.1214/07-AOS501 © Institute of Mathematical Statistics, 2008 OBJECTIVE PRIORS FOR THE BIVARIATE NORMAL MODEL BY JAMES O. BERGER 1 AND DONGCHU SUN 2 Duke University and University of Missouri-Columbia Study of the bivariate normal distribution raises the full range of issues involving objective Bayesian inference, including the different types of objec- tive priors (e.g., Jeffreys, invariant, reference, matching), the different modes of inference (e.g., Bayesian, frequentist, fiducial) and the criteria involved in deciding on optimal objective priors (e.g., ease of computation, frequentist performance, marginalization paradoxes). Summary recommendations as to optimal objective priors are made for a variety of inferences involving the bivariate normal distribution. In the course of the investigation, a variety of surprising results were found, including the availability of objective priors that yield exact frequentist inferences for many functions of the bivariate normal parameters, including the correlation coefficient. 1. Introduction and prior distributions. 1.1. Notation and problem statement. The bivariate normal distribution of (x 1 ,x 2 ) has mean parameters μ = 1 2 ) and covariance matrix = σ 2 1 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 2 , where ρ is the correlation between x 1 and x 2 . The density is 1 2πσ 1 σ 2 1 ρ 2 × exp σ 2 2 (x 1 μ 1 ) 2 + σ 2 1 (x 2 μ 2 ) 2 2ρσ 1 σ 2 (x 1 μ 1 )(x 2 μ 2 ) 2σ 2 1 σ 2 2 (1 ρ 2 ) . The data consists of an independent random sample X = (x k = (x 1k ,x 2k ), k = 1,...,n) of size n 3, for which the sufficient statistics are x = x 1 x 2 and S = n k=1 (x k x)(x k x) = s 11 r s 11 s 22 r s 11 s 22 s 22 , (1) 1 Supported in part by NSF Grant DMS-01-03265. 2 Supported in part by NSF Grant SES-03-51523 and NIH Grant R01-MH071418. AMS 2000 subject classifications. Primary 62F10, 62F15, 62F25; secondary 62A01, 62E15, 62H10, 62H20. Key words and phrases. Reference priors, matching priors, Jeffreys priors, right-Haar prior, fidu- cial inference, frequentist coverage, marginalization paradox, rejection sampling, constructive poste- rior distributions. 963

Upload: others

Post on 13-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

The Annals of Statistics2008, Vol. 36, No. 2, 963–982DOI: 10.1214/07-AOS501© Institute of Mathematical Statistics, 2008

OBJECTIVE PRIORS FOR THE BIVARIATE NORMAL MODEL

BY JAMES O. BERGER1 AND DONGCHU SUN2

Duke University and University of Missouri-Columbia

Study of the bivariate normal distribution raises the full range of issuesinvolving objective Bayesian inference, including the different types of objec-tive priors (e.g., Jeffreys, invariant, reference, matching), the different modesof inference (e.g., Bayesian, frequentist, fiducial) and the criteria involved indeciding on optimal objective priors (e.g., ease of computation, frequentistperformance, marginalization paradoxes). Summary recommendations as tooptimal objective priors are made for a variety of inferences involving thebivariate normal distribution.

In the course of the investigation, a variety of surprising results werefound, including the availability of objective priors that yield exact frequentistinferences for many functions of the bivariate normal parameters, includingthe correlation coefficient.

1. Introduction and prior distributions.

1.1. Notation and problem statement. The bivariate normal distribution of(x1, x2)

′ has mean parameters μ = (μ1,μ2)′ and covariance matrix

� =(

σ 21 ρσ1σ2

ρσ1σ2 σ 22

),

where ρ is the correlation between x1 and x2. The density is

1

2πσ1σ2

√1 − ρ2

× exp{−σ 2

2 (x1 − μ1)2 + σ 2

1 (x2 − μ2)2 − 2ρσ1σ2(x1 − μ1)(x2 − μ2)

2σ 21 σ 2

2 (1 − ρ2)

}.

The data consists of an independent random sample X = (xk = (x1k, x2k), k =1, . . . , n) of size n ≥ 3, for which the sufficient statistics are

x =(

x1x2

)and S =

n∑k=1

(xk − x)(xk − x)′ =(

s11 r√

s11s22r√

s11s22 s22

),(1)

1Supported in part by NSF Grant DMS-01-03265.2Supported in part by NSF Grant SES-03-51523 and NIH Grant R01-MH071418.AMS 2000 subject classifications. Primary 62F10, 62F15, 62F25; secondary 62A01, 62E15,

62H10, 62H20.Key words and phrases. Reference priors, matching priors, Jeffreys priors, right-Haar prior, fidu-

cial inference, frequentist coverage, marginalization paradox, rejection sampling, constructive poste-rior distributions.

963

Page 2: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

964 J. O. BERGER AND D. SUN

where, for i, j = 1,2,

xi = n−1n∑

j=1

xij , sij =n∑

k=1

(xik − xi)(xjk − xj ) and r = s12√s11s22

.

We will denote prior densities as π(μ1,μ2, σ1 σ2, ρ), and the corresponding poste-rior densities as π(μ1,μ2, σ1 σ2, ρ | X) (all with respect to dμ1 dμ2 dσ1 dσ2 dρ).

We consider objective inference for parameters of the bivariate normal distri-bution and functions of these parameters, with special focus on development ofobjective confidence or credible sets. Section 1.2 introduces many of the key is-sues to be covered, through a summary of some of the most interesting resultsinvolving priors yielding exact frequentist procedures; this section also raises in-teresting historical and philosophical issues. For easy access, Section 1.3 presentsour summary recommendations as to which priors to utilize.

Often, the posteriors for the recommended priors are essentially available incomputational closed form, allowing direct Monte Carlo simulation. Section 2 pro-vides simple accept-reject schemes for computing with the recommended priorsin other cases. Sections 3 and 4 develop the needed theory, concerning what arecalled reference priors and matching priors, respectively, and also present varioussimulations that were conducted to enable summary recommendations to be made.

Notation: In addition to (μ1,μ2, σ1, σ2, ρ), the following parameters will beconsidered:

η1 = 1

σ1, η2 = 1

σ2

√1 − ρ2

, η3 = − ρ

σ1

√1 − ρ2

,(2)

θ1 = ρσ2

σ1, θ2 = σ 2

2 (1 − ρ2), θ3 ≡ |�| = σ 21 σ 2

2 (1 − ρ2),

(3)

θ4 = σ2

√1 − ρ2

σ1,

θ5 = μ1

σ1, θ6 = σ 2

1 σ 22 , θ7 = σ2

σ1, θ8 = μ2

σ2,

(4)θ9 ≡ σ12 = ρσ1σ2,

θ10 = σ 21 + σ 2

2 − 2ρσ1σ2,(5)

θ11 = d′�d [d′ = (d1, d2) not proportional to (0,1)],(6)

λ1 = chmax(�), λ2 = chmin(�).(7)

Some of these parameters have straightforward statistical interpretations. Since(x2 | x1,μ,�) ∼ N(μ2 + θ1(x1 − μ1), θ2), it is clear that θ1 is a regression co-efficient, θ2 is a conditional variance, and η2

2 is the corresponding precision. Forthe marginal distribution of x1, η2

1 is the precision and θ5 is the reciprocal of the

Page 3: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

BIVARIATE NORMAL 965

coefficient of variation. θ3 is usually called the generalized variance. (η1, η2, η3)

gives a type of Cholesky decomposition of the precision matrix �−1 [see (13) inSection 2.1]. θ10 is the variance of x1 − x2, and θ11 is the variance of d1x1 + d2x2.Finally, λ1 and λ2 are the largest and smallest eigenvalues of �.

Technical issue. We will assume that |ρ| < 1 and |r| < 1 in virtually all ex-pressions and results that follow. This is because, if either equals 1 in absolutevalue, then ρ = {sign of r} with probability 1 (either frequentist or Bayesian pos-terior, as relevant). Indeed, the situation then essentially collapses to the univariateversion of the problem, which is standard.

1.2. Matching, constructive posteriors and fiducial distributions. The bivari-ate normal distribution has been extensively studied from frequentist, fiducial andobjective Bayesian perspectives. Table 1 summarizes a number of interesting re-sults.

• For a variety of parameters, it presents objective priors (discussed below) forwhich the resulting Bayesian posterior credible sets of level 1 −α are also exactfrequentist confidence sets at the same level; in this case, the priors are said tobe exact frequentist matching. This is a very desirable situation: see [23] and [2]for general discussion and the many earlier references.

• For μ1,μ2, σ1, σ2 and ρ, the constructive posterior distributions are also thefiducial distributions for the parameters, as found in Fisher [14, 15] and [21].

• Posterior distributions are presented as constructive random distributions, thatis, by a description of how to simulate from them. Thus to simulate from theposterior distribution of σ1, given the data (actually, only s11 is needed), onedraws independent χ2

n−1 random variables and simply computes the correspond-

ing√

s11/χ2n−1; this yields an independent sample from the fiducial/posterior

distribution of σ1.

Table 1 also lists the objective prior distributions that yield the indicated objec-tive posterior. The notation πab in the table stands for the important class of priordensities (a subclass of the generalized Wishart distributions of [8])

πab(μ1,μ2, σ1, σ2, ρ) = 1

σ 3−a1 σ 2−b

2 (1 − ρ2)2−b/2.(8)

Special cases of this class are the Jeffreys-rule prior πJ = π10, the right-Haar priorπH = π12, the independence Jeffreys prior πIJ = π21 = σ−1

1 σ−12 (1 − ρ2)−3/2 and

πRO which has a = b = 1. The independence Jeffreys prior follows from using aconstant prior for the means, and then the Jeffreys prior for the covariance matrixwith means given.

We highlight the results about ρ in Table 1 because they are interesting frompractical, historical and philosphical perspectives. First, it does not seem to be

Page 4: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

966 J. O. BERGER AND D. SUN

TABLE 1Parameters with exact matching priors of the form πab , and associated constructive posteriors:Here Z∗ is a standard normal random variable, and χ2∗

n−1 and χ2∗n−2 are chi-squared random

variables with the indicated degrees of freedom, all random variables being independent.For μ1,μ2, σ1, σ2 and ρ, the indicated posteriors are also fiducial distributions

Parameter Prior Posterior

μ1 π1b,∀b (including πJ and πH ) x1 + Z∗√χ2∗

n−1

√s11n

μ2 πJ = π10 x2 + Z∗√χ2∗

n−1

√s22n

d′(μ1μ2

), d ∈ R

2 πJ = π10 and πH ∗ (see Table 4) d′(x1, x2)′ + Z∗√χ2∗

n−1

√d′Sd

n

σ1 π1b,∀b (including πJ and πH )√

s11χ2∗

n−1

ρ πH = π12 ψ( −Z∗√χ2∗

n−1

+√

χ2∗n−2√

χ2∗n−1

r√1−r2

)

ψ(y) = y/

√1 + y2

η3 = − ρ

σ1√

1−ρ2πa2,∀a (including πH ) Z∗√

s11−

√χ2∗

n−2√s11

r√1−r2

θ1 = ρσ2σ1

πa2,∀a (including πH )r√

s22√s11

− Z∗√χ2∗

n−2

√1−r2√s22√

s11

θ2 = σ 22 (1 − ρ2) πa2,∀a (including πH ) s22(1−r2)

χ2∗n−2

θ3 = |�| πH = π12 and πIJ = π21|S|

χ2∗n−1χ

2∗n−2

θ4 = σ2√

1−ρ2

σ1πH = π12

√χ2∗

n−1√χ2∗

n−2

√s22(1−r2)√

s11

θ5 = μ1σ1

π1b,∀b (including πJ and πH ) Z∗√n

+ x1

√χ2∗

n−1√s11

d′�d πJ = π10 and πH ∗ (see Table 4)√

d′Sdχ2∗

n−1

known that the indicated prior for ρ is exact frequentist matching (proved here inTheorem 2). Indeed, standard statistical software utilizes various approximationsto arrive at frequentist confidence sets for ρ, missing the fact that a simple exactconfidence set exists, even for n = 3. It was, of course, known that exact frequentistconfidence procedures could be constructed (cf. Exercise 54, Chapter 6 of [18]),but explicit expressions do not seem to be available.

The historically interesting aspect of this posterior for ρ is that it is also the fidu-cial distribution of ρ. Geisser and Cornfield [16] studied the question of whetherthe fiducial distribution of ρ could be reproduced as an objective Bayesian pos-terior, and they concluded that this was most likely not possible. The strongestevidence for this arose from Brillinger [7], which used results from [19] and a dif-ficult analytic argument to show that there does not exist a prior π(ρ) such that

Page 5: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

BIVARIATE NORMAL 967

the fiducial density of ρ equals f (r | ρ)π(ρ), where f (r | ρ) is the density of r

given ρ. Since the fiducial distribution of ρ only depends on r , it was certainlyreasonable to speculate that if it were not possible to derive this distribution fromthe density of r and a prior, then it would not be possible to do so in general. Theabove result, of course, shows that this speculation was incorrect.

The philosophically interesting aspect of this situation is that Brillinger’s resultdoes show that the fiducial/posterior distribution for ρ provides another example ofthe marginalization paradox ([13]). This leads to an interesting philosophical co-nundrum of a type that we have not previously seen: a complete fiducial/objectiveBayesian/frequentist unification can be obtained for inference about ρ, but onlyif violation of the marginalization paradox is accepted. We will shortly introducea prior distribution that avoids the marginalization paradox for ρ, but which isnot exactly frequentist matching. We know of no way to adjudicate between thecompeting goals of exact frequentist matching and avoidance of the marginaliza-tion paradox, and so will simply present both as possible objective Bayesian ap-proaches. (Note that the same conundrum also arises for θ5 = μ1/σ1; the exactfrequentist matching prior results in a marginalization paradox, as shown in [24].)Some interesting examples of improper priors resulting in marginalization paradoxcan be found from Ghosh and Yang [17] and Datta and Ghosh [10, 11].

1.3. Recommended priors. It is actually rare to have exact matching priorsfor parameters of interest. Also, one is often interested in very complex functionsof parameters (e.g., predictive distributions) and/or joint distributions of parame-ters. For such problems it is important to have a general objective prior that seemsto perform reasonably well for all quantities of interest. Furthermore, it is unap-pealing to many Bayesians to change the prior according to which parameter isdeclared to be of interest, and an objective prior that performs well overall is oftensought.

The five priors we recommend for various purposes are πJ , πH ,

πRρ ∝ 1

σ1σ2(1 − ρ2), πRσ ∝

√1 + ρ2

σ1σ2(1 − ρ2)(9)

and

πRλ ∝ 1

σ1σ2(1 − ρ2)

√(σ1/σ2 − σ2/σ1)2 + 4ρ2

.(10)

The first prior in (9) was developed in [20] and was studied extensively in [1],where it was shown to be a one-at-a-time reference prior (see Section 3). Thesecond prior in (9) is new and is derived in Section 3. πRλ was developed as aone-at-a-time reference prior in [25].

With these definitions, we can make our summary recommendations. Table 2gives the four objective priors that are recommended for use, and indicates for

Page 6: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

968 J. O. BERGER AND D. SUN

TABLE 2Recommendations of objective priors for various parameters in the bivariate normal model:

indicates that the posterior will not be exact frequentist matching. (For μ2 and parameterswith σ1 replaced by σ2, use the right-Haar prior with the variances interchanged.)

Prior Parameter

πRρ ρ , σ1σ2

, general use

πH μ1, σ1, ρ, η3,ρσ2σ1

, σ 22 (1 − ρ2), |�|, σ2

σ1

√1 − ρ2, μ1

σ1

π̃H (see Table 4) d′(μ1,μ2)′, d′�d

πRλ chmax(�)

πRσ σ12 = ρσ1σ2

which parameters (or functions thereof) they are recommended. These recommen-dations are based on three criteria: (i) the degree of frequentist matching, discussedin Section 4; (ii) being a one-at-a-time reference prior, discussed in Section 3; and(iii) ease of computation. The rationale for each of the entries in the table, basedon these criteria, is given in Section 4.5.

Another commonly used prior is the “scale prior,” πS ∝ (σ1σ2)−1. The moti-

vation that is often given for this prior is that it is “standard” to use σ−1i as the

prior for a standard deviation σi , while −1 < ρ < 1 is on a bounded set and so onecan use a constant prior in ρ. We do not recommend this prior, but do consider itsperformance in Section 4.5.

2. Computation. In this paper, a constant prior is always used for (μ1,μ2),so that ((

μ1μ2

) ∣∣∣�,X)

∼ N2

((x1x2

), n−1�

).(11)

Generation from this conditional posterior distribution is standard, so the chal-lenge of simulation from the posterior distribution requires only sampling from(σ1, σ2, ρ | X).

The marginal likelihood of (σ1, σ2, ρ) satisfies

L1(σ1, σ2, ρ) ∝ 1

|�|(n−1)/2 exp(−1

2trace(S�−1)

).(12)

It is immediate that, under the priors πJ and πIJ , the marginal posteriors of � areInverse Wishart (S−1, n) and Inverse Wishart (S−1, n − 1), respectively.

Berger, Strawderman and Tang [4] gave a Metropolis–Hastings algorithm togenerate from (σ1, σ2, ρ | X) based on the prior πRλ. The following sections dealwith the other priors we consider.

Page 7: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

BIVARIATE NORMAL 969

TABLE 3Ratio π/πIJ , upper bound M , rejection step and acceptance probability for ρ = 0.80,0.95,0.99,

when π = πRρ , πRσ , π̃Rσ , πS and πMS

Bound Acceptance probability

Prior Ratio ππIJ

M Rejection Step ρ = 0.80 ρ = 0.95 ρ = 0.99

πRρ

√1 − ρ2 1 u ≤

√1 − ρ2 0.6000 0.3122 0.1410

πRσ

√1 − ρ4 1 u ≤

√1 − ρ4 0.7684 0.4307 0.1985

π̃Rσ

√1−ρ2

2−ρ21√2

u ≤√

2(1−ρ2)

2−ρ2 0.7276 0.4215 0.1975

πS (1 − ρ2)3/2 1 u ≤ (1 − ρ2)3/2 0.2160 0.0304 0.0028

2.1. Marginal posteriors of (σ1, σ2, ρ) under πRρ , πRσ , π̃Rσ , and πS . Forthese priors, an independent sample from π(σ1, σ2, ρ | X) can be obtained by thefollowing acceptance-rejection algorithm:

Simulation step. Generate (σ1, σ2, ρ) from the independence Jeffreys posteriorπIJ (σ1, σ2, ρ | X) [the Inverse Wishart (S−1, n − 1) distribution] and, inde-pendently, sample u ∼ Uniform(0,1).

Rejection step. Suppose M ≡ sup(σ1,σ2,ρ)π(σ1,σ2,ρ)

πIJ (σ1,σ2,ρ)< ∞. If u ≤ π(σ1, σ2, ρ)/

[MπIJ (σ1, σ2, ρ)], accept (σ1, σ2, ρ); else, return to Simulation step.

For each of the priors listed in Table 3, the key ratio, π/πIJ , is listed in the ta-ble, along with the upper bound M , the Rejection step and the resulting acceptanceprobability for ρ = 0.80,0.95,0.99. The rejection algorithm is quite efficient forsampling these posteriors. Indeed, for ρ ≈ 0, the algorithms accept with probabil-ity near one and, even for large |ρ|, the acceptance probabilities are very reasonablefor the priors πRρ , πRσ , and π̃Rσ . For large |ρ|, the algorithm is less efficient forthe posteriors under the prior πS , but even these acceptance rates may well be finein practice, given the simplicity of the algorithm.

2.2. Computation under πab. The most interesting prior of this form (besidesthe Jeffreys and independence Jeffreys priors) is the right-Haar prior πH , althoughother priors such as π11 arise as reference priors, and hence are potentially of inter-est. While Table 1 gave an explicit form for the most important marginal posteriorsarising from priors of this form, it is of considerable interest that essentially closedform generation from the full posterior of any prior of this form is possible (see,e.g., [8]). This is briefly reviewed in this section, since the expressions for the re-sulting constructive posteriors are needed for later results on frequentist coverage.

It is most convenient to work with the parameters (η1, η2, η3) given in (2).This parameterization gives a type of Cholesky decomposition of the precision

Page 8: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

970 J. O. BERGER AND D. SUN

matrix �−1,

�−1 =(

η1 η30 η2

)(η1 0η3 η2

),(13)

which accounts for the simplicity of ensuing computations. Note that (2) is equiv-alent to

σ1 = 1

η1, σ2 =

√η2

1 + η23

η1η2, ρ = − η3√

η21 + η2

3

.(14)

The prior πab of (8) for (μ1,μ2, σ1, σ2, ρ) transforms to the extended conjugateclass of priors for (μ1,μ2, η1, η2, η3), given by πab(μ1,μ2, η1, η2, η3) = η−a

1 η−b2 .

LEMMA 1. Consider the prior πab.

(a) The marginal posterior of η3 given (η1, η2;X) is N(−η2r√

s22/s11,1/s11).

(b) The marginal posterior distributions of η1 and η2 are independent and

(η21 | X) ∼ Gamma

(12(n − a), 1

2s11);

(η22 | X) ∼ Gamma

(12(n − b), 1

2s22(1 − r2)).

See [5] for a proof of this result. We next present the constructive posteriors of(η1, η2, η3), and from these derive the constructive posteriors of (μ1,μ2, σ1, σ2, ρ)

and other parameters. All results follow directly from Lemma 1 and (14).In presenting the constructive posteriors, we will use a star to represent a ran-

dom draw from the implied distribution; thus μ∗1 will represent a random draw

from its posterior distribution, Z∗1 ,Z∗

2,Z∗3 will be independent draws from the

standard normal distribution, and χ2∗n−a and χ2∗

n−b will be independent draws fromchi-squared distributions with the indicated degrees of freedom. Recall that theseconstructive posteriors are not only useful for simulation, but will be the key toproving exact frequentist matching results.

FACT 1. (a) The constructive posterior of (η1, η2, η3) given X can be ex-pressed as

η∗1 =

√χ2∗

n−a

s11, η∗

2 =√√√√ χ2∗

n−b

s22(1 − r2),

(15)

η∗3 = Z∗

3√s11

−√

χ2∗n−b√s11

r√1 − r2

.

Page 9: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

BIVARIATE NORMAL 971

(b) The constructive posterior of (σ1, σ2, ρ) given X can be expressed as

σ ∗1 =

√s11

χ2∗n−a

,(16)

σ ∗2 =

√s22(1 − r2)

√√√√√ 1

χ2∗n−b

+ 1

χ2∗n−a

(Z∗

3√χ2∗

n−b

− r√1 − r2

)2,(17)

ρ∗ = ψ(Y ∗), Y ∗ = − Z∗3√

χ2∗n−a

+√

χ2∗n−b√

χ2∗n−a

r√1 − r2

,(18)

where ψ(x) = x/√

1 + x2.(c) The constructive posterior for μ1 and μ2 can be written

μ∗1 = x1 + Z∗

1√χ2∗

n−a

√s11

n,(19)

μ∗2 = x2 + Z∗

1√χ2∗

n−a

r√

s22√n

+(

Z∗2√

χ2∗n−b

− Z∗3√

χ2∗n−b

Z∗1√

χ2∗n−a

)√s22(1 − r2)

n.(20)

3. Reference priors. This paper began with an effort to derive and cataloguethe possible reference priors for the bivariate normal distribution. The referenceprior theory (cf. Bernardo [6] and Berger and Bernardo [3]) has arguably beenthe most successful technique for deriving objective priors. Reference priors de-pend on (i) specification of a parameter of interest; (ii) specification of nuisanceparameters; (iii) specification of a grouping of parameters; and (iv) ordering of thegroupings. These are all conveyed by the shorthand notation used in Table 4. Thus,{(μ1,μ2), (σ1, σ2, ρ)} indicates that (μ1,μ2) is the parameter of interest, with theothers being nuisance parameters, and there are two groupings with the indicatedordering. (The resulting reference prior is the independence Jeffreys prior, πIJ .)As another example, {λ1, λ2, ϑ,μ1,μ2} introduces the eigenvalues λ1 > λ2 of �as being primarily of interest, with ϑ (the angle defining the orthogonal matrix thatdiagonalizes �), μ1 and μ2 being the nuisance parameters.

Based on experience with numerous examples, the reference priors that are typ-ically judged to be best are one-at-a-time reference priors, in which each parameteris listed separately as its own group. Hence we will focus on these priors. It turnsout to be the case that, for the one-at-a-time reference priors, the ordering of μ1and μ2 among the variables is irrelevant. Hence if μ1 and μ2 are omitted from alisting in Table 4, the resulting reference prior is to be viewed as any one-at-a-timereference prior with the indicated ordering of other variables, with the μi beinginserted anywhere in the ordering.

Page 10: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

972 J. O. BERGER AND D. SUN

TABLE 4Reference priors for the bivariate normal model (where μ̃1 = d′(μ1,μ2)′, (σ̃1)2 = θ7,

ρ̃ = d′�(0,1)′/(σ1√

θ7), θ̃2 = σ 22 [1 − (ρ̃)2] and θ̃1 = ρ̃σ2/σ̃1); {{ }} indicates that any ordering of

the parameters yields the same reference prior

Prior π(μ1,μ2,σ 1, σ 2,ρ) For parameter ordering Has form (8) with

πJ ∝ 1σ 2

1 σ 22 (1−ρ2)2 {(μ1,μ2, σ1, σ2, ρ)} (a, b) = (1,0)

πIJ ∝ 1σ1σ2(1−ρ2)3/2 {(μ1,μ2), (σ1, σ2, ρ)} (a, b) = (2,1)

πRρ ∝ 1σ1σ2(1−ρ2)

{ρ,σ1, σ2}, {θ7, θ6, ρ}πRσ ∝

√1+ρ2

σ1σ2(1−ρ2){σ1, σ2, ρ}

π̃Rσ ∝ 1σ1σ2(1−ρ2)

√2−ρ2

{σ1, ρ, σ2}{σ1, η3, θ2}

πRO ∝ 1σ 2

1 σ2(1−ρ2)3/2 {σ1, θ2, η3} (a, b) = (1,1)

πRλ ∝ [((σ1/σ2)−(σ2/σ1))2+4ρ2]−1/2

σ1σ2(1−ρ2){λ1, λ2, ϑ}

πH ∝ 1σ 2

1 (1−ρ2){{σ1, θ1, θ2}}, {{θ1, θ3, θ4}} (a, b) = (1,2)

{{η1, η2, θ1}} , {{η1, θ1, θ2}}π̃H ∝ dμ̃1dμ2dσ̃1dσ2dρ̃

(σ̃1)2[1−(ρ̃)2] {{d′(μ1,μ2)′,μ2, θ11, θ̃2, θ̃1}}

We are interested in finding one-at-a-time reference priors for the parametersμ1,μ2, σ1, σ2, ρ, η3, θ1, . . . , θ9 and λ1. This is done in [5], with the results sum-marized in Table 4, for all these parameters (i.e., the parameter appears as the firstentry in the parameter ordering) except η3, σ12, and μi/σi ; finding one-at-a-timereference priors for these parameters is technically challenging. (We do not explic-itly list the reference priors for σ2 in the table, since they can be found by simplyswitching with σ1 in the various expressions.)

4. Comparisons of priors via frequentist matching.

4.1. Frequentist coverage probabilities and exact matching. Suppose a pos-terior distribution is used to create one-sided credible intervals (θL, θ1−α(X)),where θL is the lower limit in the relevant parameter space and θ1−α(X) is theposterior quantile of the parameter θ of interest, defined by P(θ < θ1−α(X) |X) = 1 − α. (Here θ is the random variable.) Of interest is the frequentist cov-erage of the corresponding confidence interval, that is, C(μ1,μ2, σ1, σ2, ρ) =P(θ < θ1−α(X) | μ1,μ2, σ1, σ2, ρ). (Here X is the random variable.) The closerC(μ1,μ2, σ1, σ2, ρ) is to the nominal 1 − α, the better the procedure (and corre-sponding objective prior) is judged to be.

The main results about exact matching are given in Theorems 1 through 8. Theproofs of Theorems 1, 2 and 8 are given in Section 5; the rest can be found in [5].

Page 11: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

BIVARIATE NORMAL 973

The following technical lemmas will be repeatedly utilized. The first lemma isfrom (3d.2.8) in [22]. Lemma 3 is easy.

LEMMA 2. For n ≥ 3 and given σ1, σ2, ρ, the following three random vari-ables are independent and have the indicated distributions:

T2 =[

s11

σ 22 (1 − ρ2)

]1/2[r√

s22√s11

− ρσ2

σ1

]≡ Z3 (standard normal),(21)

T3 = s22(1 − r2)

σ 22 (1 − ρ2)

≡ χ2n−2,(22)

T5 = s11

σ 21

≡ χ2n−1.(23)

LEMMA 3. Let Y1−α denote the 1 − α quantile of any random variable Y .

(a) If g(·) is a monotonically increasing function, [g(Y )]1−α = g(Y1−α) for anyα ∈ (0,1).

(b) If W is a positive random variable, (WY)1−α ≥ 0 if and only if Y1−α ≥ 0.

We will reserve quantile notation for posterior quantiles, with respect tothe ∗ distributions. Thus the quantile [(σ1Z

∗3 − rZ3)/χ

2n−1 + ρ

√s11χ

2∗n−b]1−α

would be computed based on the joint distribution of (Z∗3 , χ2∗

n−b), while holding(σ1, ρ, r, s11,Z3, χ

2n−1) fixed.

4.2. Credible intervals for a class of functions of (σ1, σ2, ρ). We consider theone-sided credible intervals of σ1, σ2 and ρ and some functions of the form

θ = σd11 σ

d22 g(ρ),(24)

for d1, d2 ∈ R and some function g(·). We also consider a class of scale-invariantpriors

π(μ1,μ2, σ1, σ2, ρ) ∝ h(ρ)

σc11 σ

c22

,(25)

for some c1, c2 ∈ R and a positive function h.

THEOREM 1. Denote the 1 − α posterior quantile of θ by θ1−α(X) under theprior (25). For any fixed (μ1,μ2, σ1, σ2, ρ), the frequentist coverage of the credibleinterval (θL, θ1−α(X)) depends only on ρ. Here θL is the lower boundary of theparameter space for θ.

Note that parameters ρ, η1, η2, η3, θ1, . . . , θ4 are all functions of the form (24).From Theorem 1, under any of the priors πJ ,πIJ ,πRσ ,πRρ,πRO,πH ,πS , the

Page 12: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

974 J. O. BERGER AND D. SUN

frequentist coverage probabilities of credible intervals for any of these parameterswill depend only on ρ. We will show that the frequentist coverage probabilitiescould be exact under the prior πab. Since η1(η2) is a monotone function of σ1(θ2),

we consider only ρ and the last 5 parameters.

4.3. Coverage probabilities under πab.

THEOREM 2. (a) For ψ defined in (18), the posterior 1 − α quantile of ρ isρ∗

1−α = ψ(Y ∗1−α). (b) For any α ∈ (0,1), ξ = (μ1,μ2, σ1, σ2) and ρ ∈ (−1,1),

P (ρ < ρ∗1−α | ξ , ρ)

(26)

= P

(√1 − ρ2Z3 + ρ

√χ2

n−1√χ2

n−2

>

(√1 − ρ2Z∗

3 + ρ√

χ2∗n−a√

χ2∗n−b

∣∣∣ρ).

(c) (26) equals 1 − α if and only if the right Haar prior is used, that is, (a, b) =(1,2).

THEOREM 3. (a) For any α ∈ (0,1), ξ = (μ1,μ2, σ1, σ2) and ρ ∈ (−1,1),

P(η3 < (η∗

3)1−α | ξ , ρ)

(27)

= P

(Z3 + ρ√1−ρ2

√χ2

n−1√χ2

n−2

<

(Z∗3 + ρ√

1−ρ2

√χ2

n−1√χ2∗

n−b

)1−α

∣∣∣ρ).

(b) (27) equals 1 − α for any −1 < ρ < 1 if and only if b = 2.

THEOREM 4. (a) The constructive posterior of θ1 = ρσ2/σ1 has the expres-sion

θ∗1 = r

√s22√s11

− Z∗3√

χ2∗n−b

√1 − r2√s22√

s11.

(b) For any α ∈ (0,1), ξ = (μ1,μ2, σ1, σ2) and ρ ∈ (−1,1),

P(θ1 < (θ∗

1 )1−α | ξ , ρ) = P

(tn−2 <

√n − 2

n − b(t∗n−b)1−α

),(28)

which does not depend on ρ. Furthermore, (28) equals 1 − α if and only if b = 2.

THEOREM 5. (a) The constructive posterior of θ2 = σ 22 (1 − ρ2) is θ∗

2 =s22(1 − r2)/χ2∗

n−b.

(b) For any α ∈ (0,1), ξ = (μ1,μ2, σ1, σ2) and ρ ∈ (−1,1),

P(θ2 < (θ∗

2 )1−α | ξ , ρ) = P

(χ2

n−2 > (χ2∗n−b)α

),(29)

which does not depend on ρ. Furthermore, (29) equals 1 − α if and only if b = 2.

Page 13: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

BIVARIATE NORMAL 975

THEOREM 6. (a) The constructive posterior of θ3 = |�| is θ∗3 = |S|/

(χ2∗n−aχ

2∗n−b).

(b) For any ξ = (μ1,μ2, σ1, σ2) and ρ ∈ (−1,1),

P(θ3 < (θ∗

3 )1−α | ξ , ρ) = P

(χ2

n−1χ2n−2 > (χ2∗

n−aχ2∗n−b)α

),(30)

which does not depend on ρ. Furthermore, (30) equals 1 − α iff (a, b) is (1,2) or(2,1).

THEOREM 7. (a) The constructive posterior of θ4 is

θ∗4 =

√√√√χ2∗n−a

χ2∗n−b

√s22(1 − r2)

s11.

(b) For any ξ = (μ1,μ2, σ1, σ2) and ρ ∈ (−1,1),

P(θ4 < (θ∗

4 )1−α | ξ , ρ) = P

(χ2

n−1/χ2n−2 < (χ2∗

n−a/χ2∗n−b)1−α

),(31)

which does not depend on ρ. Furthermore, (31) equals 1 − α iff (a, b) = (1,2).

An interesting function of (μ1,μ2, σ1, σ2, ρ) not of the form (24) is θ5 = μ1/σ1.

THEOREM 8. (a) The constructive posterior of θ5 = μ1/σ1 is

θ∗5 = Z∗

1√n

+ x1√s11

√χ2∗

n−a.

(b) For any α ∈ (0,1), the frequentist coverage of the credible interval(−∞, (θ∗

5 )1−α) is

P(θ5 < (θ∗

5 )1−α | μ1,μ2, σ1, σ2, ρ)

(32)

= P

(Z1 − θ5

√n√

χ2n−1

<

(Z∗

1 − θ5√

n√χ2∗

n−a

)1−α

∣∣∣ θ5

),

which depends on θ5 only and equals 1 − α if and only if a = 1.

4.4. First order asymptotic matching. Datta and Mukerjee [9] and Datta andGhosh [12] discuss how to determine first-order matching priors for functions ofparameters; these are priors such that the frequentist coverage of a one-sided cred-ible interval is equal to the Bayesian coverage up to a term of order n−1. Foreach of the nine objective priors πJ , πIJ , πRρ, π̃Rσ , πRO, πRλ, πH , πS andπRσ , [5] determines if it is a first-order matching prior for each of the parametersμ1,μ2, σ1, σ2, ρ, η3, θ1, . . . , θ10. The results are listed in Table 5. For example, πJ

is a first order matching prior for μ1,μ2, σ1, σ2, θ1, θ5, θ7, θ8, and θ10, but not forη3, θ2, θ3 and θ9.

Page 14: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

976 J. O. BERGER AND D. SUN

TABLE 5The first-order asymptotic matching of objective priors for μ1,μ2, σ1, σ2, ρ, μ1 − μ2, η3,

θj , j = 1, . . . ,10. Here a boldface letter indicates exact matching

Asymptotic matching

Prior π(μ1,μ2,σ 1,σ 2,ρ) Yes No

πJ ∝ 1σ 2

1 σ 22 (1−ρ2)2 μ1,μ2,σ 1,σ 2 ρ

μ1 − μ2, θ1, θ5, θ7, θ8, θ10 η3, θ2, θ3, θ9πIJ ∝ 1

σ1σ2(1−ρ2)3/2 μ1,μ2 σ1, σ2, ρ

μ1 − μ2, θ1, θ3, θ7 η3, θ2, θ5, θ8, θ9, θ10πRρ ∝ 1

σ1σ2(1−ρ2)μ1,μ2, ρ σ1, σ2

μ1 − μ2, θ3, θ7 η3, θ1, θ2, θ5, θ8, θ9, θ10π̃Rσ ∝ 1

σ1σ2(1−ρ2)√

2−ρ2μ1,μ2 σ1, σ2, ρ

μ1 − μ2, η3, θ3, θ7 θ1, θ2, θ5, θ8, θ9, θ10πRO ∝ 1

σ 21 σ2(1−ρ2)3/2 μ1,μ2,σ 1 σ2, ρ

μ1 − μ2, θ1, θ5 η3, θ2, θ3, θ7, θ8, θ9, θ10

πRλ ∝ [σ1σ2(1−ρ2)]−1√((σ1/σ2)−(σ2/σ1))

2+4ρ2μ1,μ2 σ1, σ2, ρ

μ1 − μ2, θ3 η3, θ1, θ2, θ5,

θ7, θ8, θ9, θ10πH ∝ 1

σ 21 (1−ρ2)

μ1,μ2,σ 1,ρ σ2

μ1 − μ2, η3, θ1, θ2, θ3, θ4, θ5 θ7, θ8, θ9, θ10πS ∝ 1

σ1σ2μ1,μ2 σ1, σ2, ρ

μ1 − μ2, θ3, θ7 η3, θ1, θ2, θ5, θ8, θ9, θ10

πRσ ∝√

1+ρ2

σ1σ2(1−ρ2)μ1,μ2 σ1, σ2, ρ

μ1 − μ2, θ3, θ7, θ9 θ1, θ2, η3, θ5, θ8, θ10

4.5. Numerically computed coverage and recommendations. First-ordermatching is only an asymptotic property, and finite sample performance is alsocrucial. We thus also implemented a modest numerical study, comparing the nu-merical values of frequentist coverages of the one-sided credible sets P(θ > q0.05)

and P(θ < q0.95), for the parameters, θ , listed in Table 6 and for the eight ob-jective priors πJ ,πIJ ,πRρ,πRσ , πRO,πRλ,πH and πS . As usual, qα = qα(X) isthe posterior α-quantile of θ , and the coverage probability is computed based onthe sampling distribution of qα(X) for the fixed parameter (μ1,μ2, σ1, σ2) andρ. Many of the coverage probabilities depend only on ρ, which was thus cho-sen to be the x-axis in the graphs. We considered the case n = 3 (the minimalpossible sample size and hence the most challenging in terms of obtaining goodcoverage) and the two scenarios Case a: (μ1,μ2, σ1, σ2) = (0,0,1,1), and Caseb: (μ1,μ2, σ1, σ2) = (0,0,2,1).

Page 15: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

BIVARIATE NORMAL 977

TABLE 6Performance of objective priors for each of the parameters

Prior

Parameter Bad Medium Good

μ1 rest πRO,πH ,πJ

μ1 − μ2 rest πJ , πRO

σ1 πIJ rest πH ,πRλ,πMS

σ2 πH ,πRO,πIJ rest πJ

ρ πJ ,πIJ ,πS,πRO πRρ,πRσ ,πRλ,πH ,πMS

λ1 rest πJ ,πRλ,πRO

θ3 = |�| πRO,πJ rest πIJ ,πH

θ7 = σ2σ1

πH ,πJ ,πRO,πRλ restθ9 = σ12 πJ ,πIJ (due to size) rest πH ,πRρ,πRσ

Here we present the numerical results concerning coverage for only two of theparameters: ρ in Figure 1 and θ7 = σ2/σ1 in Figure 2. Table 6 summarizes theresults from the entire numerical study, the details of which can be found in [5].The recommendations made in Table 2 for the boxed parameters are justified fromthese numerical results as follows.

FIG. 1. Frequentist coverages for ρ, where Case a: (μ1,μ2, σ1, σ2) = (0,0,1,1), and Case b:(μ1,μ2, σ1, σ2) = (0,0,2,1). The x-axis is for ρ ∈ (−1,1).

Page 16: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

978 J. O. BERGER AND D. SUN

FIG. 2. Frequentist coverages for θ7 = σ2/σ1, where Case a: (μ1,μ2, σ1, σ2) = (0,0,1,1) andCase b: (μ1,μ2, σ1, σ2) = (0,0,2,1). The x-axis is for ρ ∈ (−1,1).

The inferences involving the nonboxed parameters in Table 2 are given in closedform in Table 1 (and so are computationally simple), and are exact frequentistmatching. Furthermore, with the exception of μ1/σ1 and η3, the nonboxed pa-rameters have the indicated priors as one-at-a-time reference priors, so all threecriteria point to the indicated recommendation.

For ρ, we recommend using πRρ , since this prior is a one-at-a-time-referencefor ρ, first-order matching (as shown in Table 5), and has excellent numericalcoverage as shown in Figure 1. Note that some might prefer to use the right-Haarprior because of its exact matching for ρ (even though it exhibits a marginalizationparadox). For σ2/σ1, the one-at-a-time reference prior was also πRρ . As this wasfirst-order frequentist matching and among the best in terms of numerical coverage(see Figure 2), we also recommend it for this parameter.

For λ1, the situation is unclear. The one-at-a-time reference prior is πRλ and ishence our recommendation, but first-order matching results for this parameter arenot known, and the numerical coverages of all priors were rather bad. For σ12, theonly first-order matching prior among our candidates is πRσ . It also had the bestnumerical coverages, and so is a clear recommendation. Note, however, that wewere not able to determine if it is a one-at-a-time reference prior for σ12, so therecommendation should be considered tentative.

The most interesting question is what to recommend for general use, as an all-purpose prior. Looking at Table 2, it might seem that πH or even πJ would be goodchoices, since they are optimal for so many parameters. However, both these priors

Page 17: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

BIVARIATE NORMAL 979

can also give quite bad coverages, as indicated in Figure 2 for πH and in Figures 1and 2 for πJ . Indeed, from Table 6, the only priors that did not have significantlypoor performance for at least one parameter (other than λ1, for which no priorgave good coverages) were πRρ and πRσ . The numerical coverages for πRρ andπRσ are virtually identical for all the parameters, so there is no principled way tochoose between them. πRρ is a commonly used prior and somewhat simpler, so itbecomes our recommended choice for a general prior.

5. Proofs. Due to space limitations, we give only the proofs of Theorems 1, 2and 8, because their proofs are quite different. The proofs of the other theorems inSection 4 are relatively easy consequences of Fact 1 and Lemmas 1–3. For detailsof these other proofs, see [5].

5.1. Proof of Theorem 1. With the constant prior for (μ1,μ2), the marginallikelihood of (σ1, σ2, ρ) depends on S and is proportional to

|�|−(n−1)/2 exp{−1

2 trace(S�−1)}.

Define

D = {(σ ∗1 , σ ∗

2 , ρ∗) :σ ∗d11 σ

∗d22 g(ρ∗) < σ

d11 σ

d22 g(ρ)},

G(X, σ1, σ2, ρ) =∫D

π(σ ∗1 , σ ∗

2 , ρ∗ | S) dσ ∗1 dσ ∗

2 dρ∗.

Clearly, the frequentist coverage probability is

P {θ < θ1−α(X) | μ1,μ2, σ1, σ2, ρ} = P {G(S, σ1, σ2, ρ) < 1 − α | σ1, σ2, ρ}.Under the prior (25),

G(X, σ1, σ2, ρ) =∫ ∫ ∫

Dh(ρ∗) exp(−0.5 trace(S�∗−1))

σ∗(n−1+c1)

1 σ∗(n−1+c2)

2 (1−ρ∗2)(n−1)/2dσ ∗

1 dσ ∗2 dρ∗

∫ ∫ ∫ h(ρ∗) exp(−0.5 trace(S�∗−1))

σ∗(n−1+c1)

1 σ∗(n−1+c2)

2 (1−ρ∗2)(n−1)/2dσ ∗

1 dσ ∗2 dρ∗

,

where �∗ is the 2 × 2 symmetric matrix, whose diagonal elements are σ ∗21 and

σ ∗22 , and off-diagonal element is σ ∗

1 σ ∗2 ρ∗. Denote = diag(1/σ1,1/σ2) and make

transformations

T = S =

⎛⎜⎜⎝S11

σ 21

S12

σ1σ2S12

σ1σ2

S22

σ 22

⎞⎟⎟⎠ and = �∗ =(

ω21 ω1ω2ρ

∗ω1ω2ρ

∗ ω22

).

Clearly trace(S�∗−1) = trace(T−1), and then

G(X, σ1, σ2, ρ) =∫ ∫ ∫

D̃h(ρ∗) exp(−0.5 trace(T−1))

ωn−1+c11 ω

n−1+c22 (1−ρ∗2)(n−1)/2

dω1 dω2 dρ∗

∫ ∫ ∫ h(ρ∗) exp(−0.5 trace(T−1))

ωn−1+c11 ω

n−1+c22 (1−ρ∗2)(n−1)/2

dω1 dω2 dρ∗,

Page 18: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

980 J. O. BERGER AND D. SUN

where D̃ = {(ω1,ω2, ρ∗) :ωd1

1 ωd22 g(ρ∗) < g(ρ)}. Since the sampling distribution

of T depends only on ρ, so does the sampling distribution of G(X, σ1, σ2, ρ). AlsoD̃ depends on ρ only. The result thus holds.

5.2. Proof of Theorem 2. It follows from (18) and Lemma 3 (a) that

P(ρ < ρ∗1−α | ξ , ρ) = P

{[ψ

( −Z∗3√

χ2∗n−a

+√

χ2∗n−b√

χ2∗n−a

r√1 − r2

)]1−α

> ρ∣∣∣ρ}

,

Note that ψ , defined in (18), is invertible, and ψ−1(ρ) = ρ/

√1 − ρ2, for |ρ| < 1.

It follows from Lemma 3 (a) and (b) that

P(ρ < ρ∗1−α | ξ , ρ) = P

(( −Z∗3√

χ2∗n−a

+√

χ2∗n−b√

χ2∗n−a

r√1 − r2

− ρ√1 − ρ2

)1−α

> 0∣∣∣ρ)

= P

(( −Z∗3√

χ2∗n−b

− ρ√1 − ρ2

√χ2∗

n−a√χ2∗

n−b

)1−α

+ r√1 − r2

> 0∣∣∣ρ)

.

It follows from (21)–(23) that

r√1 − r2

= s12/√

s11√s22(1 − r2)

= σ2

√1 − ρ2Z3 + (ρσ2/σ1)

√s11

σ2

√1 − ρ2

√χ2

n−2

= Z3√χ2

n−2

+ ρ√1 − ρ2

√χ2

n−1√χ2

n−2

.

Consequently,

P(ρ < ρ∗1−α | ξ , ρ)

= P

(Z3√χ2

n−2

+ ρ√1 − ρ2

√χ2

n−1√χ2

n−2

<

(Z∗

3√χ2∗

n−b

+ ρ√1 − ρ2

√χ2∗

n−a√χ2∗

n−b

)1−α

∣∣∣ρ).

This completes the proof of part (a). For part (b), if (26) equals to 1 − α for any−1 < ρ < 1, choose ρ = 0 and get

P

(Z3√χ2

n−2

<

(Z∗

3√χ2∗

n−b

)1−α

)= 1 − α,

which implies that b = 2. Substituting b = 2 into (26) shows that a = 1.

Page 19: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

BIVARIATE NORMAL 981

5.3. Proof Theorem 8. Part (a) is obvious. For part (b), since x1 = μ1 +Z1σ1/

√n and Z1 and χ2

n−1 are independent, we have

(θ5 < (θ∗

5 )1−α

) =([

Z∗1√n

+ θ5

(√√√√χ2∗n−a

χ2n−1

− 1)

+ Z1√n

√√√√χ2∗n−a

χ2n−1

]1−α

> 0).

It follows from Lemma 3 (a) and (b) that

(θ5 < (θ∗

5 )1−α

) =([

Z∗1√

χ2∗n−a

+ θ5

( √n√

χ2n−1

−√

n√χ2∗

n−a

)+ Z1√

χ2n−1

]1−α

> 0)

=(− Z1√

χ2n−1

− θ5

√n√

χ2n−1

<

(Z∗

1√χ2∗

n−a

− θ5

√n√

χ2∗n−a

)1−α

).

Because Z1 and −Z1 have the same distribution and Z1 and χ2n−1 are independent,

(32) holds. If (32) equals 1 − α for any θ5, choose θ5 = 0,

P

(Z1√χ2

n−1

<

(Z∗

1√χ2∗

n−a

)1−α

)= 1 − α,

which implies that a = 1. The result holds.

Acknowledgments. The authors are grateful to Fei Liu for performing thenumerical frequentist coverage computations, to Xiaoyan Lin for computing thematching priors in Table 5, and to Susie Bayarri for helpful discussions. The au-thors gratefully acknowledge the very constructive comments of the editor, an as-sociate editor and two referees.

REFERENCES

[1] BAYARRI, M. J. (1981). Inferencia bayesiana sobre el coeficiente de correlación de unapoblación normal bivariante. Trabajos de Estadistica e Investigacion Operativa 32 18–31. MR0697200

[2] BAYARRI, M. J. and BERGER, J. (2004). The interplay between Bayesian and frequentistanalysis. Statist. Sci. 19 58–80. MR2082147

[3] BERGER, J. O. and BERNARDO, J. M. (1992). On the development of reference priors (withdiscussion). In Bayesian Statistics 4 35–60. Oxford Univ. Press. MR1380269

[4] BERGER, J. O., STRAWDERMAN, W. and TANG, D. (2005). Posterior propriety and admissi-bility of hyperpriors in normal hierarchical models. Ann. Statist. 33 606–646. MR2163154

[5] BERGER, J. O. and SUN, D. (2006). Objective priors for a bivariate normal model with multi-variate generalizations. Technical Report 07-06, ISDS, Duke Univ.

[6] BERNARDO, J. M. (1979). Reference posterior distributions for Bayesian inference (with dis-cussion). J. Roy. Statist. Soc. Ser. B 41 113–147. MR0547240

[7] BRILLINGER, D. R. (1962). Examples bearing on the definition of fiducial probability with abibliography. Ann. Math. Statist. 33 1349–1355. MR0142183

Page 20: Objective priors for the bivariate normal modelberger/papers/bivariate.pdfSpecial cases of this class are the Jeffreys-rule prior πJ =π10,theright-Haar prior πH =π12,theindependence

982 J. O. BERGER AND D. SUN

[8] BROWN, P., LE, N. and ZIDEK, J. (1994). Inference for a covariance matrix. In Aspects ofUncertainty: A Tribute to D. V. Lindley (P. R. Freeman and A. F. M. Smith, eds.) 77–92.Wiley, Chichester. MR1309689

[9] DATTA, G. and MUKERJEE, R. (2004). Probability Matching Priors: Higher Order Asymptot-ics. Springer, New York. MR2053794

[10] DATTA, G. S. and GHOSH, J. K. (1995a). On priors providing frequentist validity for Bayesianinference. Biometrika 82 37–45. MR1332838

[11] DATTA, G. S. and GHOSH, J. K. (1995b). Noninformative priors for maximal invariant para-meter in group models. Test 4 95–114. MR1365042

[12] DATTA, G. S. and GHOSH, M. (1995c). Some remarks on noninformative priors. J. Amer.Statist. Assoc. 90 1357–1363. MR1379478

[13] DAWID, A. P., STONE, M. and ZIDEK, J. V. (1973). Marginalization paradoxes in Bayesianand structural inference (with discussion). J. Roy. Statist. Soc. Ser. B 35 189–233.MR0365805

[14] FISHER, R. A. (1930). Inverse probability. Proc. Cambridge Philos. Soc. 26 528–535.[15] FISHER, R. A. (1956). Statistical Methods and Scientific Inference. Oliver and Boyd, Edin-

burgh.[16] GEISSER, S. and CORNFIELD, J. (1963). Posterior distributions for multivariate normal para-

meters. J. Roy. Statist. Soc. Ser. B 25 368–376. MR0171354[17] GHOSH, M. and YANG, M.-C. (1996). Noninformative priors for the two sample normal prob-

lem. Test 5 145–157. MR1410459[18] LEHMANN, E. L. (1986). Testing Statistical Hypotheses, 2nd ed. Wiley, New York.

MR0852406[19] LINDLEY, D. V. (1961). The use of prior probability distributions in statistical inference and

decisions. Proc. 4th Berkeley Sympos. Math. Statist. Probab. 1 (J. Neyman and E. L. Scott,eds.) 453–468. Univ. California Press, Berkeley. MR0156437

[20] LINDLEY, D. V. (1965). Introduction to Probability and Statistics from a Bayesian Viewpoint.Cambridge Univ. Press.

[21] PRATT, J. W. (1963). Shorter confidence intervals for the mean of a normal distribution withknown variance. Ann. Math. Statist. 34 574–586. MR0148150

[22] RAO, C. R. (1973). Linear Statistical Inference and Its Applications. Wiley, New York.MR0346957

[23] SEVERINI, T. A., MUKERJEE, R. and GHOSH, M. (2002). On an exact probability matchingproperty of right-invariant priors. Biometrika 89 952–957. MR1946524

[24] STONE, M. and DAWID, A. P. (1972). Un-Bayesian implications of improper Bayes inferencein routine statistical problems. Biometrika 59 369–375. MR0431449

[25] YANG, R. and BERGER, J. (2004). Estimation of a covariance matrix using the reference prior.Ann. Statist. 22 1195–1211. MR1311972

ISDSDUKE UNIVERSITY

BOX 90251DURHAM, NORTH CAROLINA NC 27708-0251USAE-MAIL: [email protected]: www.stat.duke.edu/~berger

DEPARTMENT OF STATISTICS

UNIVERSITY OF MISSOURI-COLUMBIA

146 MIDDLEBUSH HALL

COLUMBIA, MISSOURI 65211-6100USAE-MAIL: [email protected]: www.stat.missouri.edu/~dsun