[part 1: physical sciences] || an empirical bayes estimation problem

An Empirical Bayes Estimation ProblemAuthor(s): Herbert RobbinsSource: Proceedings of the National Academy of Sciences of the United States of America,Vol. 77, No. 12, [Part 1: Physical Sciences] (Dec., 1980), pp. 6988-6989Published by: National Academy of SciencesStable URL: http://www.jstor.org/stable/9541 .

Accessed: 02/05/2014 00:10

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

National Academy of Sciences is collaborating with JSTOR to digitize, preserve and extend access toProceedings of the National Academy of Sciences of the United States of America.

http://www.jstor.org

This content downloaded from 194.29.185.203 on Fri, 2 May 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=nas

http://www.jstor.org/stable/9541?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


Proc. Natl. Acad. Sci. USA Vol. 77, No. 12, pp. 6988-6989, December 1980 Statistics

An empirical Bayes estimation problem (compound Poisson distribution)

HERBERT ROBBINS

Columbia University, New York, New York 10027

Contributed by Herbert Robbins, August 25, 1980

ABSTRACT Let x be a random variable such that, given 0, x is Poisson with mean 0, while 0 has an unknown prior distribution G. In many statistical problems one wants to estimate as accurately as possible the parameter E(OIx = a) for some given a = 0,1,.... If one assumes that G is a Gamma prior with unknown parameters a and ,3, then the problem is straightfor- ward, but the estimate may not be consistent if G is not Gamma. On the other hand, a more general empirical Bayes estimator will always be consistent but will be inefficient if in fact G is Gamma. It is shown that this dilemma can be more or less re- solved for large samples by combining the two methods of estimation.

Let x,xl,X2, . .. ,n be independent with probability function

co f(x) = ff(xlO)dG(O), [1]

where f(x I ) = e- 0x/x! (Poisson with mean 0) and G is unknown. We are concerned with estimating from observed values of xl,.. ., xn the parameter

6Of(a 6 )dG(0) h=h(a,G) = E(x = a)= f(

(a + l)f(a + 1)

f(a) [2]

where a is some fixed integer 0,1,.. . ; h is in fact the minimum mean squared error estimator of 0 when x is observed to have the value a.

Even in the absence of any information about G, a strongly consistent and asymptotically normal estimator Tn of h can be obtained as follows. Define

where

E(v-hu )2_ (a + 1)2f(a + 1) f(a + ) E2u f2(a) f(a)

I n(a) -nv (a + 1)2|1 + n(a

(+ >)]n(a + 1)/n2(a), [6] with n(j) = number of values xl .., x,

that equal j(j = 0,1,...). [7] From [5] and [6] it follows that

(a + 1)2n(a + 1) [ n(a + 1)] '

n2(a) n(a) so that for large n, an approximately 95% confidence interval for h is given by

(a 1n(a + 1) + 1.96(a + 1) a+) (a+ +.- n(a) V/n(a)

X )n(a ) 1 + n (a) [9

Although Tn, a "general" empirical Bayes estimator of h, is always consistent, there are circumstances in which its use would be inefficient. For example, if we assume that G is a member of the two-parameter Gamma family of conjugate priors with probability density function

g(0) = G'(0) = r( e-"?0a-, 1?(0)"

[10]

where a and 3 are unknown positive parameters, then an easy computation shows that

x if x= a + 1

and set ui = u(xi), vi = v(xi). Then h = Ev/Eu, so that as n - oo

n EVi

Tn = - h with probability 1, [4] n >ui 1

and by the central limit theorem

n n E (vi - hui) E (vi - hui)

Vn(T - h)= 1

1

_2), /n ) Eu -n Eui/n -\

f) (x+/3) ( ' 1 f(x) x!(/3) (1+ )1 + a) '

h= ( + l)f(a + 1) a+ /3

f(a) 1 + a' Now for [10]

E(or) -= (r + ) (r > 0), r(/3)ar

so that

EO =, Var = a a

Moreover, from [1]

E(x90) = Var(xI0) = 0,

[5] so that

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "ad- vertisement" in accordance with 18 U. S. C. ?1734 solely to indicate this fact.

Ex = EO, Var x = E+ + Var0.

From [16] and [14] we have

Ex = //a, Var x = 3(1 + a)/a2,

6988

[11]

[12]

[13]

[14]

[15]

[16]

117]

Tn-h - N(0 1) 181

1 if ifx = a u = u(x) =

{0 if x = a ifxa



Proc. Natl. Acad. Sci. USA 77 (1980) 6989

and hence Ex E2x

Var x - Ex ' Var x - Ex It follows from [12] that

h= Ex + (1 - rx (a-Ex),

asymptotics, as in ref. 1, chapter 28:

[18] o (2m-a)2 m2(m -a)2 A = 4+ - 42)

2m(m--a)(2m -a) ^3 - ,3, [29] A,2

[19] where

and there may be priors G other than those of the form [10] for which [19] also holds for the given value of a; in other words, for which h = k, where h is defined by [2] and k by

k=Ex + -x(a -Ex). Var x)??, [20]

For any such G, one could estimate h by replacing Ex and Var x in [19] by the consistent moment estimators

[21] in In

n =- X, Sn = -E (Xi - Xn)2, n 1 n 1

obtaining

Wn =Xn + (1 -n (a -n).

Whenever G has a finite fourth moment,

Vn(Wn - k) N(o,2),

[22]

[23]

where (22 is a function of a and moments of x that will be made explicit later. Now, if h = k (as it is for [10]), it is plausible that the "restricted" empirical Bayes estimator Wn should be more efficient than Tn as an estimator of h. However, in using Wn we run the risk that if in fact h # k, then Wn will not even be a consistent estimator of h.

The foregoing remarks suggest combining Tn and Wn to obtain an estimator that will behave like Tn when h 5 k and like Wn when h = k. One such estimator is

[30]

In the Gamma case [10] some tedious algebra gives

a2 = 1 + 2[2a + 3 + 2/(1 + a)]a2- 2a/[a + 2 2 (1 -l+ a)3 + 23(1 + a)]a + 32[a2+ 2a + 2+ 2/(1 + a)]), [31]

whereas

2 (a + /)a!Fr(f)(1 + )a+- -2[(1 + a)(1 + a) + a + 13] a Or(a + A)

[32] In particular, for a = 0, r2 < a1 iff

a"[a2 + 2a + 2+2(1 + )] < (1 + +a)+l(1 + a + 3).

[33] For large 3 this clearly holds, but not for /3 near 0, since as P -- 0

2- a2+ 2a + 2 . ++> 1, [34] (r2 a2+ 2a+1 + 1

so that Tn is better than Wn or Zn for sufficiently small f. When a = 1, for example, the critical value of 3 is -0.48. That o2 is not always less than or in the case [10] may come as a surprise, but it is presumably due to the fact that the method of moments is not optimal for estimating a and 3. In fact, if one is willing to assume a Gamma prior [10] (and not merely that h = k), then it would be reasonable to estimate h by replacing in [12] the unknown a and f by their maximum likelihood estimates &n and An, obtaining the estimator

1 . Wn + nf(Tn -Wn)2 Tn

1 + nf(Tn -Wn)2

where e is any constant such that 1/2 < e < 1. Then

V/(Zn -h)

-V/(Wn - h) + n(Tn - Wn)2(Tn - h)n 1 + nf(Tn - Wn)2

There are two cases to consider. (a) Suppose that h = k. Then from [5] and [23], Tn -'

= O(n-1/2) in probability, so that

nf(Tn - Wn)2 = O(n)-l) - O,

V/(Zn - h) \/(Wn - h) - N(0, 2).

(b) Suppose h # k. Then ITn -Wnl - I - kl > probability, so that

V/n(Zn - h) - /n(Tn - h) -N(O, ). VJn(Zn - h) V 7n(Tn - h) - ( a)

[24] n a + o3n

1 + An [35]

[I have not attempted to verify that the asymptotic variance of /n(hn, - h) as a function of a, a and 3 is always less than [32].] One could then define Zn by replacing W, by An in [24], thus hopefully assuring asymptotic efficiency in the case [10] and

[25] consistency in all cases. (But what of efficiency if G is not of the form [10] but hin k' = h?) Many variations on this theme are possible. In fact, Zn could be any combination UnWn + (1 -

Wn i Un)Tn with Un - 1 if h = k and Un/n - 0 if h # k in probability; for example, Zn = Wn or Tn according as Tn - Wn < n-1/4 or > n-1/4

[26] Wl n or ! ] Instead of h we may want to estimate H = E(O x > a) or to

predict the future performance of that subgroup of persons [27] whose x values are = a or > a, etc. For an indication of the

practical relevance of such problems, the reader may consult refs. 2-4. The idea embodied in [24] has many other applica- tions in the direction of combining efficiency with robust- ness.

[28]

Thus, Zn is asymptotically equivalent to Wn if h = k and to Tn if h y k.

Zn is, like Tn, a general empirical Bayes estimator of h, in that it is consistent no matter what the form of G, whereas Wn is only consistent when h = k. Thus Zn dominates Wn. However, does Zn dominate Tn? When h s k, they are equivalent. When h = k, Zn is better than Tn iff (2 < '2. We have in [6] a formula for or; we now exhibit one for ory, obtained by the usual

This research was supported by the National Science Foundation and the National Institute of General Medical Sciences.

1. Cramer, H. (1946) Mathematical Methods of Statistics (Princeton University Press, Princeton, NJ).

2. Robbins, H. (1977) Proc. Natl. Acad. Sci. USA 74, 2670-2671. 3. Robbins, H. (1980) Proc. Natl. Acad. Sci. USA 77,2382-2383. 4. Robbins, H. (1980) Asymptotic Theory of Statistical Tests and

Estimation (Academic, New York), pp. 251-257.

Statistics: Robbins

m = Ex, Ai = E(x -m)i(i > 2).



[part 1: physical sciences] || an empirical bayes estimation problem

Documents