general variance modifications for linear bayes estimators

General Variance Modifications for Linear Bayes EstimatorsAuthor(s): Michael GoldsteinSource: Journal of the American Statistical Association, Vol. 78, No. 383 (Sep., 1983), pp. 616-618Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2288128 .

Accessed: 14/06/2014 21:16

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 185.44.79.85 on Sat, 14 Jun 2014 21:16:30 PMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=astata

http://www.jstor.org/stable/2288128?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


General Variance Modifications for Linear Bayes Estimators

MICHAEL GOLDSTEIN*

We consider the problem of modifying the linear Bayes estimator for the mean of a distribution of unknown form by using an estimate of sample variance. The general conditions under which there is no advantageous variance modification to the linear Bayes rule are identified. These results are applied to the problem of sampling from a normal distribution with unknown mean and variance, and the implications are discussed. KEY WORDS: Linear Bayes rule; Normal-gamma prior distribution; Linear variance estimation.

1. INTRODUCTION The formal Bayesian approach to estimation relies

completely on user-specified prior and likelihood struc- tures. If the restrictions thus implied by a standard for- mulation do not accord well with our judgment about an application, then we must consider precisely what information is available to us and the ways in which it may be incorporated into the analysis. As an example, consider the following problem.

We have two statistics X, Y based on observations drawn from F, a distribution of unknown form on the real line, where X is unbiased for ,u(F), the mean of F, and Y is unbiased for ur2(F), the variance of X given F. We wish to estimate j(F) using X and Y. (Thus, for example, X may be the sample mean and Y may be the sample variance divided by sample size from a random sample from F). Thus, the formal Bayesian specification involves a prior distribution on the possible distributions F and a joint conditional distribution G(x, y I F). These distributions determine the joint prior distribution of R(F) and a2(F), and also the joint conditional distribution H(x, y I [,L U2). In practice we may define these marginal distributions directly, and, for our purposes, we may only need to specify certain prior expectations for these distributions.

Using X alone, we can estimate pt(F) by p*, the linear Bayes rule in X for ti(F), defined, when X = x, by

* F V-l (R) + xU -' (1.1) v-I (j.t) + U-,'

where ,r = E(Ai), V(A) = E(R - p)2, and U = Eo2, all expectations being taken with respect to our prior measure over the set of all probability distributions on the real line. Similarly, we can estimate &2(F) by Vy*, the linear

* Michael Goldstein is Reader in Statistics, Department of Mathe- matical Statistics, University of Hull, Hull, England. This article has been greatly clarified by the comments of the referees.

Bayes rule in Y for (9(F), defined, when Y = y, by

* U V-'(&r2) + yW-' = V-(r2) + W-1 (1.2

where V(c2) = E(u2 - U)2, and W = E(Y - o.2)2. Fi- nally, we define the variance-modified linear Bayes rule for ,u in X and Y to be

= fr V-1(p) + xV *-1 (1.3)

The motivation for this estimator, with a simple example, is given in Goldstein (1979). In that paper, the justification is essentially intuitive, in that the estimator appears to use the available information in a sensible fashion. However, it is clear that there will be situations where ,** will be a worse estimator than ,u*. In particular, consider the following familiar Bayesian model for estimating the population mean jx when the population variance (2 is also uncertain. We suppose that X = (Xl, ... , XJ) is an independent random sample of size n from a normal distribution, with mean , and precision r = u-2, both uncertain. The joint prior density for , and (2 iS the normal-gamma prior density-that is, the conditional prior density for ,u, given r = ro, is normal, mean PI, precision Tro, for some positive constant T, and the marginal density of r is gamma p(r) t- r e- r having pa- rameters (x, ,B > 0.

With this prior specification,

T + nfl

where ,* is as defined in (1.1) and x, the sample mean, replaces x, (see, e.g., DeGroot 1970, Ch. 9).

Thus in this case, although the population variance is uncertain, we should not make any modifications to the estimator ,*, based on the observed variation Y. We ask two questions: First, what general conditions for the prior-likelihood specification lead to preference for ,u* over any estimator such as ,** that is modified by ob- serving Y? Second, in the particular case of sampling from the normal distribution, how wide a range of prior distributions will satisfy these general conditions?

Several authors have considered the conditions under which rules such as FL* will be optimal, for example by investigating the conditions under which E(R I x) = ,u*

? Journal of the American Statistical Association September 1983, Volume 78, Number 383

Theory and Methods Section

616



Goldstein: Linear Bayes Variance Modification 617

characterize normality (see, e.g., Rao 1976; Diaconis and Ylvisaker 1979; Goel and DeGroot 1980). However, we may also be interested in linear rules for their conceptual and computational simplicity. In such cases, it may be of more immediate interest to ask how the proposed rule will behave as compared to restricted classes of estimators of simple form, specifically, to assess limited optimality results. (In this article we restrict attention to strict optimality conditions within the limited class. Con- sideration of the conditions under which, for example, the rule ,u** is optimal to order lln, where n is sample size, will be reported elsewhere.)

Thus in Section 2, we define a general class of variance- modified estimators for the mean and give a general condition for ,i* to be optimal in this class with respect to Bayes risk under quadratic loss. In Section 3, we apply these results to sampling from the normal distribution. Finally, in Section 4 we give a general discussion of the implications of our results.

2. GENERAL VARIANCE MODIFIED LINEAR BAYES ESTIMATORS

Our notation is as in Section 1, namely, that X, Y are two statistics based on observations drawn from F, a distribution of unknown form on the real line; X is unbiased for ,i(F), the mean of F; and Y is unbiased for U2(F), the variance of X given F. We want to estimate 1i by a convex combination of X and ,, the prior expectation of ,u, and we are prepared to allow the weightings for X and ,i to depend on Y. Thus, let the class B denote that class of such linear Bayes rules in X, modified by Y, that is, all rules of the form ,i given by

= b(y)x + (1 - b(y)) P,

where b(y) is a general function of y taking values in [0, 1].

In some practical applications, it may be judged that the variance estimate does not give information directly about ,u. Thus, we consider the situation in which the observation Y gives direct information only about U2 and provides information about ,u only through a priori as- sociations between ,i. and U2. Thus, we suppose that given ,i and U2, X and Y are independent, and that given a2, > and Y are independent. (cf. Goldstein 1979, in which the much stronger assumption was made that ,uL and Y were unconditionally independent).

The first stage in our development is to identify the general conditions under which ,u* is the Bayes rule for ,u. in the class B. We start with the following theorem.

Theorem 1. Under the assumptions just given, ,u* will be the Bayes rule for ,u with quadratic loss in the class B if and only if, for almost all y,

E(U2) E(t - )2 E(U2 I y) E((p, -l)2 I y)

(The conditional expectations are both with respect to the posterior distribution given Y = y).

Proof. After some manipulation, we may show that

E(, )' = E{b2(Y)E(u2 | Y)}

+ E{(1 - b(Y))2E(pi - pI)2 Y y)}.

The function b(-) minimizing this expression will be constant if and only if, for almost all y,

E(U2 I y) = k E((, - )2 I y).

and the constant k is determined from the expected value of the above relation.

When will the conditions of the theorem be satisfied? Define the function g( ) by

g(U2) = E((t_ -)2 1 U2)/U2

If g(U2) is constant a.e., then

E((,u- p')2 I y) = E (E((, - p3)2 1 U2,y)Iy) = kE(u2 IY)

as Y and ,uL are independent given U2. Thus, from Theorem 1, we may deduce the following corollary.

Corollary 1. With the above assumptions of Theorem (1), ,u will be the Bayes rule for ,u in the class B if the function g(o-2), as defined, is constant for almost all U2.

Thus, whatever the prior measure for U2, we may straightforwardly create a wide variety of joint prior measures for ,u and U2 for which ,u is indeed the Bayes rule in the class B. Conversely, however, we will be able in practice to determine whether the above criterion is at least approximately satisfied with only fairly simple prior considerations. Thus, the obvious question arises as to whether g(&2) being constant is a necessary requirement for ,u to be the Bayes rule in B, so that, when this condition is not satisfied, there is useful information about jl in Y.

We have the following theorem, in which we assume that the conditional distribution of Y given U2 may be represented by a density function, f(y U u2). (Thus, f(y 1 U2) is the prior expected value of f(y I F), over all F for which U2(F) = a2).

Theorem 2. It is possible to construct ajoint prior measure for ,u and u2 with a given marginal prior distribution, p(U2), for a2, and fixed prior values ,, V(,u) for which g(U2) is not constant a.e., but ,u is the Bayes rule in the class B if and only if the functions f(y 1 U2) satisfy the property that there is some function h(u2) g u2, nonzero with positive prior probability, for which

I h(U2) dP(U 2) = 0,

and

I h(u2)f (y 1 U2) dP(U2) = 0

for almost all y. Proof. Suppose that h(uf2) satisfies the conditions of

Theorem 2. Define the prior specification of ,uL given u2



618 Journal of the American Statistical Association, September 1983

so that

MIL - _ )2 I u') = (or2 ' h(-o2)) V(F)

This will be a valid prior specification as u.2 - h(u2), and E(E((IL - r)2 I u2)) = V(A)

by the requirements on h(or2). One may check to see that although g(&') is not constant, the resulting prior specification satisfies the conditions of the previous theorem, and so ,u* is the Bayes rule in B.

Conversely, suppose that we may find some nonconstant specification of g(or2) for which ,* is the Bayes rule in B. Then defining h(of2) by

h(u)= (72(1 - U g(o.2))

and using the previous theorem we may show that h(of2) has the properties specified in the theorem and the result follows.

Note that if the functions f(y I (2) further satisfy the property that f f2(y: a2) dP(of2) < cc for all y, then Theo- rem 2 indicates that we may devise a nonconstant prior specification for g(U2) for which ,u is the Bayes rule in B if and only if the functions {f(y: r2)} do not form a complete set in L2(R +, P), the Hilbert space of square integrable functions on R + with respect to P, with inner product (f 1, f2) = f f1 f2dP.

3. APPLICATION TO SAMPLING FROM THE NORMAL DISTRIBUTION

An application of the results just presented that has importance in its own right is the situation discussed in Section 1, for which x = (xl,. . . , xJ) is an independent sample of size n from a normal distribution with unknown mean j.a and unknown precision r. Here x is the sample mean and y is (n(n - l))- (xi - x)2.

First, suppose that the prior distribution for ,u and r is the normal-gamma prior distribution as described in Sec- tion 1. For this prior distribution, the function g(o-2) that we have specified in Section 2 is a constant function of a2. Thus, as in Corollary 1, ,u* is the Bayes rule for p,l in B, and this will be the case even if we modify the marginal distribution for (72 to some arbitrary general distribution.

Second, we may apply Theorem 2 to assert the con- verse. For a general joint prior specification for L.a and a(2, if g(cr2) is not constant, then ,u* cannot be the Bayes rule in B.

Corollary 2. With notation just given, IL* will be the Bayes rule in the class B if and only if g(o2) is constant a.e.

Proof. As y is based on a sample from a normal distribution, we deduce that

(y 1

U) /C

\(nl - 3)/2

f(y / (72) oc (j) exp(- yI2(72).

Thus the conditions of Theorem 2 are essentially equiv- alent, in this case, to the condition that there is no nonzero function h with zero Laplace transform with respect to a general finite measure P(*), and the result follows.

4. DISCUSSION

The general point illustrated by the preceding analysis is as follows. Suppose we have difficulty in assessing the uncertainties in a problem and that we attempt to over- come the difficulty by using a plausible class of likelihood functions and fitting a prior distribution over the param- eters of the likelihood function. For even moderately complicated situations, this introduces a large number of extraneous restrictions that may have the effect of au- tomatically suppressing various kinds of information. Thus, in the example of Section 3, a prior specification with g(u2) constant, such as the normal-gamma prior distribution, is the only possible way of expressing the judgment that the sample variance provides no information either directly or indirectly relevant to the estimation of p, within the framework we are considering. Thus, if the normal-gamma prior distribution is chosen for lack of suitable alternatives, then effectively a choice has been made to suppress certain available information.

The problem for any Bayesian is to make good and clear use of his or her knowledge. This may or may not entail a complete specification of a likelihood function and a prior distribution. The danger in using a complete specification is that the prior assessments may relate to the model rather than to the individual. In such cases it may be more straightforward simply to restrict attention to prior and data quantities that are familiar and inform- ative and to combine these quantities in an intuitively reasonable fashion. For example, the estimator ,u** is constructed from the sample mean and the sample variance or, more generally, from any location statistic (subject to the unbiasedness constraint) such as a symmet- rically trimmed mean, and from any assessment of the variability of this statistic, such as an estimate obtained from a pilot study possibly in a related population.

This is not to claim that ,u ** is in some sense an optimal estimate but only that such an estimator is simple to compute, has intuitively sensible form, and has reasonable risk properties with respect to a wide range of prior be- liefs, whereas optimal rules will often be very difficult to compute and may have very poor risk properties outside the particular model chosen. (The risk properties of p** are briefly discussed in Goldstein 1979. More detailed treatment of the risk, limited optimality properties for ,u**, and suggestions for modifying ,u* with more detailed prior specification will be reported elsewhere.)

[Received July 1980. Revised August 1982.]

REFERENCES DEGROOT, M.H. (1970), Optimal Statistical Decisions, New York:

McGraw Hill. DIACONIS, P., and YLVISAKER, D. (1979), "Conjugate Priors for

Exponential Families," Annals of Statistics, 7, 269-281. GOEL, P.K., and DEGROOT, M.H. (1980), "Only Normal Distribu-

tions Have Linear Posterior Expectations in Linear Regression," Journal of the American Statistical Association, 75, 895-900.

GOLDSTEIN, M. (1979), "The Variance Modified Linear Bayes Es- timator," Journal of the Royal Statistical Society, Ser. B. 96-100.

RAO, C.R. (1976), "Characterization of Prior Distributions and Solution to a Compound Decision Problem," Annals of Statistics, 4, 823-835.



general variance modifications for linear bayes estimators

Documents