chernoff 1953 locally doptimal designs

Upload: andre-luis-alberton

Post on 06-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    1/18

    Locally Optimal Designs for Estimating Parameters

    Author(s): Herman ChernoffSource: The Annals of Mathematical Statistics, Vol. 24, No. 4 (Dec., 1953), pp. 586-602Published by: Institute of Mathematical StatisticsStable URL: http://www.jstor.org/stable/2236782

    Accessed: 05/02/2010 06:29

    Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

    http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless

    you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you

    may use content in the JSTOR archive only for your personal, non-commercial use.

    Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/action/showPublisher?publisherCode=ims.

    Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

    page of such transmission.

    JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

    content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The

    Annals of Mathematical Statistics.

    http://www.jstor.org

    http://www.jstor.org/stable/2236782?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/action/showPublisher?publisherCode=imshttp://www.jstor.org/action/showPublisher?publisherCode=imshttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/2236782?origin=JSTOR-pdf
  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    2/18

    LOCALLY OPTIMAL DESIGNS FOR ESTIMATING PARAMETERSBY HERMAN CHERNOFFStanford University

    1. Summary. It is desired to estimate s parameters 01, 02, .., . There isavailable a set of experiments which may be performed. The probability dis-tribution of the data obtained from any of these experiments may depend on01, 02, , Ok, k > s. One is permitted to select a design consistingof n ofthese experiments to be performed independently. The repetition of experimentsis permitted in the design. We shall show that, under mild conditions, locallyoptimal designs for large n may be approximated by selecting a certain set ofr < k + (k - 1) + -.. + (k - s + 1) of the experiments available and byrepeating each of these r experiments in certain specified proportions. Examplesare given illustrating how this result simplifies considerably the problem ofobtaining optimal designs. The criterioniof optimality that is employed is onethat involves the use of Fisher's information matrix. For the case where it isdesired to estimate one of the k parameters, this criterioincorresponds to mini-mizing the variance of the asymptotic distribution of the maximum likelihoodestimate of that parameter.The result of this paper constitutes a generalization of a result of Elfving[1]. As in Elfving's paper, the results extend to the case where the cost dependson the experiment and the amount of money to be allocated on experimentationis determined instead of the sample size.2. Introduction. Before formulating the problem precisely we shall considera simple special example which will illustrate many of the points involved.Consider the regression problem

    (1) y=Y + Ax + u -1 < x

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    3/18

    LOCALLY OPTIMAL DESIGNS 587half the time (if n is even). It should be noted that if the set of experimentsavailable were decreased so that E. is available only for -1 < x < 1, niooptimaldesign could be found. This is essenitially due to the fact that given any design,a better one can be obtained by spreading out the values of x eveni more (i.e.,by taking values of x closer to the end points -1 and + 1).A peculiarity of this particular problem is that no matter how- many times aparticular experiment Ex is repeated, no reasonable estimate of 6 can be deter-mined. At least two distinict experiments are required. Another peculiarity ofthis problem is that the variance of 5, the maximum likelihood estimate of adoes not depenidon the value of -y and &.In general, this latter property willnot hold and we shall be restricted to obtaining locally optimal designs, that is,designs which are optimal if the parameters are known to be close to certainspecified values.We may consider a variation of the above problem. Suppose that it is desiredto estimate y and that a is the nuisance parameter. Then it is well known thatanioptimal design consists in repeating the experiment Eo, n times. An equallyoptimal design may also be obtained by using any set of x's so that x = 0.

    3. Information matrices and mixed experiments. The formulation of ourproblem will involve the concepts of information matrices [2] and of randomizedor mixed experiments. For the sake of notational convenience and in order toclear up some technicalities that arise, we shall discuss these concepts beforeproceeding to the formulation.R. A. Fisher defined the information matrix X(8) for an experimentinvolvingthe parameter0 = (01, * Ok, ) by

    a32L(2) X() -E =I UxiK) I i,j = 1, 2,. kwhereL is the logarithmof the likelihoodfunction. It should be noted that X(8)ordinarily depends on 0. It is easily seen and wel known that(3) X(O)= E faL aL'|~aoi o1)'and hence that X(8) is a nonnegative definite symmetric matrix.Another well kn-own property of information matrices is that of additivity.That is, if El, E2, ... , En, are experiments yielding information matricesX1(0), X2(0),1 -.-, Xn(0), the combined experiment or design which consistsin carrying out each of these experiments independently yields the informationmatrix Xl(8) + X2(0) + + X (0)The experiment w-hich consists in carrying out one of the available experi-ments, this one to be determined by a random device, is called a randomizedor mixed experiment. Hence if pi, P2, X -* * pn are positive numbers adding upto one, the experiment wNvhichonsists in carrying out Es with probability pi ismixed. It is easily seenithat this experiment has informatioin matrix piXi(0) +P2X2(O) . . . + p.X. (0).

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    4/18

    588 HERMAN CHERNOFFLet an experiment E with positive definite information matrix X(0) be carriedout m times and let - (', 02, . . , O be the resultingmaximum ikelihoodestimate of 0 = (01, 02, ***, Ok). Under mild conditions [3], the covariancematrix of the asymptotic (as m -* oo) distribution of V/r(O - 0) is given by

    (4) X-(a) = jjxt'(G) 1J i, j = 1, 2, ., kat all points of continuity of X(a). This property suggests the usefulness ofinformation matrices in comparing designs.Unfortunately, it is possible for an information matrix to be singular andhence to fail to have an inverse. To allow for this situation, we extend the notionof inverse to the class of nonnegative definite symmetric matrices. Let X benonnegativedefinite symmetric and let Y be any other symmetric matrix so thatX + XY is positivedefinite or positive X small enough. Then, let(5) = JJ = lim (X + XY)'.In Appendix A it will be shown that this new definition is consistent with theusual definition and is statistically meaningful. Also, if x"i and xii are finite,then xtj is finite and xii, xii and x"j are independent of the particular Y selected.It should be noted that X-1 is a continuous function of X on the set of positivedefinite symmetric matrices but that elements of X-' may fail to be continuousfor X singular.

    4. Formulation. In this section we shall formulate our problem and then in-dicate the reasons behind this formulation. Using the special example previ-ously mentioned, we shall examine conditions which we shall impose to obtainthe desired results.There is a set {E} of experiments available. The distribution of the datafrom one of these experiments depends on 0 = (01, 02, - 0k). The informationmatrix X(0) may be characterized by the elements oii and above the maindiagonal. These elements arranged in some order may be considered as com-ponents of a vector in k(k + 1)/2 dimensional space. This vector may be identi-fied with the matrix. Since we are interested in locally optimal designs, that is,designs that are optimal when 0 is known to be close to some given value, sayo O) = 'a(0) I (0) ()atnin t Y~~a - (01, At , ** , t ,,))we confine our attention to X(0(?)).Let R1 be the set of vectorscorrespondingto the X(0(?0) for the experiments of{E}. Let R be the convex hull of R1, that is, a typical clementof R is the convexlinear combination p1Xi + p2X2 + * * * + p,Xn where X1, X2, X * X, areelementsof R1and pi, P2 2 * * * pn are positivenumbersadding up to 1.From the previous section, it follows that R represents the set of informationmatrices of the class of mixed experimenlts.

    1 The author is indebted to Max A. Woodburyand the referee who independently pointedout a close relationship existing between this definition of inverse and the concept of thepseudo inverse of a matrix.

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    5/18

    LOCALLY OPTIMAL DESIGNS 589We shall be interestedin showing that under certain conditions, an element Xof R which minimizes

    (6) v8(X) = X"I + X22 +" + xSs s < kcan be representedas a convex linear combination of(7) r k + (k-1) + '+ (k-s + 1)elementsX1, X2 , Xr of R1.It is evident that X corresponds to a mixed experiment which is "optimal"in the sense that if a were based on n repetitions of this experiment, the sumof the variances in the asymptotic, (as n -* oo), distribution of n(bi( --)V n( - 02), * * n*, xn(s - 0), would be a minimum.Certain questions naturally arise concerning the usefulness of this criterion.First, it may be asked whether this criterion is relevant if one desires to confineoneself to pure experiments, that is, elements of {E}. Here we note that as n

    00, X may be approximated by (niXl + n2X2 + * * + nfrXr)/n where ni,n2, * **, nr are positive integers adding up to n. The latter expression represents1/n times the information matrix corresponding to the design where Ei is carriedout ni times. The answer to the last question would be yes if it were shown thatv8(X) is continuous at X = X on the convex set generated by X1, X2, ***Xr. One may also ask why our criterion should involve information matrices.Such a criterion has a certain aesthetic appeal. Furthermore, we shall discussin Appendix B how the main result yields a justification of this criterion.Finally, one may seriously inquire whether a "good" design must minimizethe sum of the asymptotic variances. In fact, we shall see in Appendix C thatvery often when one is interested in s parameters, a sound criterion for a "good"design involves minimizing tr(AV) where A is a nonnegative definite symmetricmatrix of rank less than or equal to s and V is the covariance matrix of theasymptotic distribution of V/n(O - 0). By a linear transformation of 0 thiscriterion may be transformed to that of minimizing the sum of no more thans asymptotic variances.Sinice certain conditions must be imposed to obtain our desired result, weshall explain these conditions by referring to the example considered in Section2. In that example the experiment E, yields a likelihood function with logarithmgiven by

    L = -2 log 27r (y -__X)2Let 0- = 6 and 02 =-y. The corresponding information matrix is given by

    2x xxz= ix 1For this example, R1 is the set of all points in three dimensional space whosecoordinates are (x2,x, 1),-1 < x < 1. This set represents a segment of a para-

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    6/18

    590 HERMTANHERNOFFbola lying in a plane of three dimensional space. The convex set R generatedby R1 is the set bounded by R1 and the line segment connecting the end pointsof R1. The optimal design consisting in using X1 and X-1, each half the time,corresponds to the mid-point of the above-mentioned line segnent, that is,the point (1, 0, 1).We mentioned previously that if x is restricted to -1 < x < 1, no optimalnth order design exists. Note that in this case R has been changed by deletingthe boundary line segment on which the optimizing point (1, 0, 1) lies. Althoughwe can get arbitrarily close to this point when -1 < x < 1, we cannot reachit. In general, to prevent this minor difficulty we shall impose the conditionthat R be closed. Then R will contain all of its boundary points.A second condition that we shall impose is that R be bounded. That is, noelement of Xx can be made arbitrarily large by selecting Ex properly. Thiscondition is satisfied in our example, for there no element cail exceed 1 in ab-solute value. If, however, the example were modified to permit all real valuesof x, the element of the first row and first column of X. would be unbounded.Note in this modified example, that if the parameter y were known, a could beestimated with arbitrarily small variance from one experiment by taking xlarge enough. This interpretation of the effect of unbounded R applies to thegeneral case, too. If some element of X is unbounded, the fact that X is non-negative definite implies that some element of the main diagonal of X is un-bounded. If the ith element of the main diagonal of X is unbounded, Oican beestimated with arbitrarily small asymptotic variance if all the other parametersare known.5. Main results. In this section we state our main results. The proofs willfirst be given for s = 1 and then extended to s > 1.

    THEOREM1. If R is closedand boundedthere is an elementX of R which mini-mizes v8(X) = X" + X+ + . + X88 and which is a convex linear combinationof r < k + (k- 1) + -. + (k-s + 1) elements X1, X2, , Xr of R1.FurthermoreX1, X2 , ... , may bechosenso that v8 X) is a continuousfunctionat X = X with respectto the topology of the convexset generatedby X1, X2,Xr .We treat the case s = 1 where we let(8) z(X) = vi(X) = x".In outline, our proof for s = 1 consists in obtaining an expression for

    8(X, A) = z(X + A) - z(X)which will be used to show the existence of an X(?)c R which minimizes z(X)and such that X(?) lies on a supporting hyperplane of R. It will also be evidentthat z(X) is constant on a sub-hyperplane. The dimension of this sub-hyper-plane leads to the existence of X with the desired properties. The complexityof the details of the proof arise mainly,from difficulties in the case that X(?) issingular since z(X) is not continuous at singular X.

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    7/18

    LOCALLYOPTIMAL DESIGNS 591LEMMA 1. If X and X + A are nonnegative efinitesymmetricmatricesandz(X) 5 oo, then

    (9) 6(X, A) = z(X + A)-z(X) = -E(X, A) + 2(X,A)where(10) E(X, A) = rimex(X, A),x-o+(11) Ex(X, A) = [(X + X1)-'A(X + X1)-,1,(12) X(X, A) = lrn % (X, A),(13) MM(X,) = [(X + XI)_1A(X + XI + A)-'A(X + XI)-%1and E(X, A) is a linear function in A and q X, A) _ 0.PROOF. ince the matrices (X + XI) and (X + A + XI) are positive definitefor X > 0(14) 6(X, A) = lim [z(X + A + XI) - z(X + XI)],(15) (X + xi + A)-_ (X + XIF) =-(X + XIY)'A(X + X1)

    + (X + XI)-'A(X + XI + A)-'A(X + XI)'.Since z(X) 0 oo it follows (see Appendix A, property 1) that limx.o+ (X+ XIP) exists and is finite for each i. Let us denote this limit by Xl = Xi'.Hence(16) E(X, A) = lim Eq(X, A) = >E X"iAijxl.It follows that as X -+ o+, 7)(X, A) converges (possibly to + oo). Since thematrix, of which ,(X, A) is the element of the first row and first column, isnonnegative definite it follows that 7(X, A) > 0.LEMMA 2. If X and X + A are nonnegative efinitesymmetricmatricesandz(X) = oo, then limA^o z(X + A) = oo. (We write A -> 0 if each element of Aapproaches zero. Note that A converges to zero subject to the condition thatX + A is nonnegative definite and symmetric.)PROOF. f A and B are symmetric matrices we use the notation A < B orB > A if p'Ap < p'Bp for every vector p. If A and B are positive definite andA < B, it is easily seen that B-' < A-' by diagonalizing B and A. Also,

    z(X + A) = Iim [(X + A + xI)']11.Let d be the largest characteristic value of A. Then(X + A + XI) -< (X + (X + d)I), (X + A + XI)-' > (X + (X + d)I)-

    z(X + A) > (X + dI) ".

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    8/18

    592 HERMAN CHERNOFFAs A -O 0, d -O 0. Furthermore z(X) = oo. Hence lima0o (X + dl) = co, andour lemma follows.LEMMA. If R is closed and bounded,z(X) attains its minimum on R.PROOF.Let w = infx,Rz(X). Because R is bounded, w > 0. The case w = 00is trivial and hence we assume 0 < w < oo. Since R is closed and bounded thereis a sequence {X(t)} such that X(t) e R, z(X(t)) -> w and {X(i)} has a limit pointX(0)e R. Let A(i) = x()- X(?). It suffices to show that z(X(?)) < w. By Lemma2, z(X(?0) oo. Hence

    z(X(0) + A") - z(X0) = -E(XW ) A()) + 7(X( A(X))Since E is linear in A(i)

    lim IE(X?, t) O.i-0*OO

    But 7(X(?),A(t)) > 0. Letting i -> oo, we obtain w - z(X(?)) > 0.Hereafterwe shall assume that R is closed and bounded.Then let p be the lowestrank associated with those elements of R which minimize z(X). Now we assumethat X(?) minimizes z(X) on R, z(X(?)) 5 oo and X(?) has rank p. We shall nowreduce the set under consideration from R to R n H1 where H1 is a hyperplanecontaining X(?) and H1 has dimension p(p + 1)/2. In the event that X(0) isnonsingular, no reduction from R has been effected. We shall not consider thetrivial case X) -0 for then w = oo.We construct H1 as follows. Corresponding to X), there is an orthogonalmatrix P = 11 such that(17) X(?- PIEPwhere(18) E=( I\O 0/and EI is a diagonal p X p matrix where all the elements on the main diagonalare positive. We define H1 as the set of X for which(19) P(X - X(?))P' = (D,0 0whereDI is a symmetricp X p matrix. It is evident that H1is a p(p + 1)/2 dimen-sional hyperplane containing X(?). We note that the nonnull set R n H1 is theconvex hull of R1n Hi and is also closed and bounded.

    LEMMA 4. If X(1)ER n H1, X(1)has rank p, z(X(1)) 00o, and X() + A e HI1,then 77(Xl), v'A)approacheszero at least quadraticallyas v -O 0 and z(X) is con-tinuous at X = XV) (in the topologyof H1).PROOF. ince X1) ER n Hi and X(1) has rank p,

    PX(')P'=(F1 0)r0 0s

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    9/18

    LOCALLY OPTIMAL DESIGNS 593where FI is a positive definite symmetric matrix. If X(1) + A e H1,

    PAP/ =where DI is symmetric. Let PI be the vector consisting of the first p elementsof the first column of P. If FI and F1 + vD, are nonnegative definite we have,(20) q,(X('), v/A) = pl(Fl + XI)-'(vD,)(Fl + XI + vD,)-'(vD,)(Fl + XI)-'p1.But FI is positive definite and for v small enough FI + vDi is also positivedefinite. Therefore,(21) (X VA) = PIFIpDI)(Fi + VD,)-'(vDI)F7'p-1and

    Ji (X() VA)/V2 = PF-'DI'DF7'DIFlpI < oo.Similarly one may obtain(22) e(X(1), vA)= p'Fij'(vD1)F1'pr.The continuity of z(X) at X = X(1) follows immediately from equations (21)and (22).LEMMA 5. There is a sub-hyperplaneH2 of H1 which is a supportinghyperplaneof R n Hi at X(?).H2has dimension (p(p + 1))-1.PROOF. uppose X = X(0) + A c R n H1. By convexity

    X(0) + vAeR nHi for0 < v 0, it follows from Lemma 4 and the linearity of e that z(X(?)+ vA) - z(X(?)) < 0 for small enough positive v. This contradicts the factthat X(?) minimizes z(X) on R. Hence

    E(X(),X - X(0)) < 0 for X,? R n H1.The sub-hyperplane H2 of H1 defined by the restriction(23) E(X(?),X - X( )) = p E1 D1E-1p, = 0is a supporting hyperplane of R n H1 at X(?).The fact that equation (23) actuallyconstitutes a restriction on X depends on the fact that pr # 0, and this in turnis easily established from z(X(?)) = oo,which implies that the last k - p elementsof the first column of P are all zero.

    LEMMA 6. Thereis a sub-hyperplaneH3 of H2 so thatz(X) = z(X(0) for X ? R nH3 . The dimension of H2 minus that of H3 is no more than p - 1.PROOF.For X - H2,E(X(?), X -X()) = 0. From equation (21) it follows thatif E, + DI is nonsingular, the restriction(24) PIES'DI = 0

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    10/18

    594 HERMAN CHERNOFFimplies 77(X("0, ) = 0 and hence z(XV0)+ A) = z(X(?)). This implication holdseven if E, + DI is singular and nonnegative definite. For then we may applyequation (20) with FI replaced by Er and v = 1. We note that subject to re-strictioni (24) pI(EI + XI)-1DI = 0(X). Furthermore (FI + XI + DI)-1' < (1/X)Iwhence 7X(X(c),A) = O(X) and z(XV0)+ A) = z(X(?)).Equation (24) constitutes a set of at most p linearly independent restrictionson X = X?0)+ A. However, since the restriction e(XV0),A) = 0 may be writtenpEIE'DrEI'pI = 0 it follows that on H2, the restriction (24) constitutes a setof at most p - 1 linearly independent restrictions. Let H3 be the sub-hyperplaneof H2 on which P1Ei1D1 = 0. Lemma 6 follows.

    LEMMA. Thereis an elementX of R whichminimizes z(X) and whichis a con-vex combinationof a set of r < p elementsof R1n H2.PROOF. The set R n H3 is closed, convex and bounded. There exists at leastone element X of R n H3 which is not a convex combination of any two distinctelements of R n H3. By Lemma 6, z(X) = z(X(?)), that is, X minimizes z(X)on R. The matrix it is an element of H2 which supports R n H1. Hence X is aconvex combination of elements of R1n H2 . Let r be the least number of elementsof R1n H2 which are required to yield I as a convex combination. Then X isan interior point of R n H4 where H4 is an r - 1 dimensional sub-hyperplane ofH2. Since X was selected so that it is not an interior point of any line segment

    of R n H3, H4n H3 must have dimension 0 and hence r - 1 < p - 1.LEMMA 8. Theorem1 is validfor s = 1.PROOF. Lemma shows the existence of X and the continuity property is givenby Lemma 4.Now that Theorem 1 has been established for s = 1, we shall extend the prooffor s > 1. In that case we change our notation slightly. We let

    (25) z(X) = v.(X) = x" + x22 + ... + x88(26) b(X, A) = z(X + A) - z(X)(27) e),(X, A) = e((X, A) + E2)(X A) + * + e,8'(X A)(28) e(X, A) = imn ex(X, A)x-.*o+(29) (X A) =71( (X A) + 71(2)X, A) + * + W(X, )(30) q(X, A) = lim -(X, A)where e(W(X, A) and 7p(iX, A) are obtainied from the ith diagonal terms of thematrices appearing in equations (11) and (13), respectively. Then Lemmas 1,2, 3, and 4 may be established as in the case for s + 1. Equations (20), (21), and(22) are slightly modified. To illustrate, equation (20) becomes

    1(X ) ) = i)'(Fr + XI)'(VDX)(Fi + XI + vDrY'(31)(vDr) Fr + XI)-'p1

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    11/18

    LOCALLY OPTIMAL DESIGNS 595where p(') is the vector whose components are the first p elements of the ithcolumn of P. It will be useful to note later that the condition z(X(?)) - 0o impliesthat the last k- p elements of the first s columns of P are all zeros. In fact,

    p( , p(2)* *, are then unit orthogonal vectors.Lemma 5 follows as before with the restriction defining H2 replaced by(32) ((X0 , X - = p()',E7lDIE71p = 0.i=1

    In Lemma 6, the wording must be modified so that the dimension of H2 minusthat of H3is no more than p + (p-1) + - - * + (p - s + 1)- 1. The changeis due to the fact that restriction (24) is now replaced by(33) PE7'Dl = 0where P, is the (p X s) matrix of rank s consisting of the first p rows and s columnsof P. It is possible to rearrange the rows and columns of DI (maintaining sym-metry) so that equation (33) may be expressed by

    IDi, D12\(34) (QllQl2)1 0\D21 D22/where Qll is nonsingular, Qll, Q12, Di1, D12 = D21 and D22 are of order s X s,s X (p - s), s X s, s X (p - s) and (p - s) X (p - s), respectively, and

    (/D11 2D21 D22/

    is the result of rearranging the rows and columns of Dr. But thenD21 = -Q11'1Q12D22

    Dll - 1Q12D12 = -Qj111Q12D1.Hence, after the restriction (33), D is determined by D22 and has only (p - s)(p - s + 1)/2 linearly independent elements. Hence, equation (33) imposesp(p +1)_ (p-s)(p-s + 1) s(2p-s + 1)2 2 2

    = p + (p -1 + ***+ (p- + 1independenit linear restrictions on the symmetric matrix D. But as before oneof these restrictionis is lost on H2 , for (33) implies E(X(?0,A) = 0. Lemma 7,with p replaced by p + (p - 1) + - + (p - s + 1) follows as befpre. Theorem1 is once more an immediate consequence of Lemmas 4 and 7.

    6. Remarks. In many cases, the cost of experimentation depends on theexperiment. Then the usual design problem is to maximize information, givena certain amount of moneV to spend on experimentation. Our results of Section

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    12/18

    596 HERMAN CHERNOFF5 are easily seen to apply in this case, too. Here we identify with each experi-ment a matrix(35) Y(6) = X(6)/cwhere c is the cost of performing the experiment. The matrix Y(6) represents in-formation per unit cost. The matrix which we associate with the mixed experi-mnentwhen Ej is carried out with probability pj, i = 1, 2, ... , n, is

    n nE piXi(0) E CjpiYi(o)(36) y(0).i

    pic cipii-1 i=1It is evident that a reasonable criterion for a good mixed experiment is thatvJ[Y(G)] e mirnimized.In [1], Elfving obtained our result (Theorem 1) for s 1 and s = k in thecase of linear regression. Elfving also indicated an elegant geometrical methodof obtaining the optimal design. The methods used by Elfving depend only onthe assumption that for any nonrandomized experiment, X(0) may be repre-sented in the form(37) X(8) = I ij(6) I - xi(6)xj(6)i.Hence, these methods may be applied in many examples which are not regres-sion problems.

    7. Examples. In this section we shall discuss some examples in order to showhow the results of Section 5 may be used to reduce considerably the amount ofwork required to obtain optimal designs of experiments. The results of Elfving[1] make it unnecessary for us to consider the important and interesting examplesfrom linear regression theory.EXAMPLE 1. Suppose that

    (38) pd = e G>O, d > ?,represents the probability that an insect will survive a dose of d units of a certaininsecticide. It is desired to select n dose levels to try on n insects to estimate 0in an optimal fashion.Here the information matrix corresponding to a particular d is given by(39) Xd = d2e-d/(1 - e-d) d > 0and(40) Xd=0 d = O.The conditions of Theorem 1 are satisfied and hence it follows that a locallyoptimal design consists of repeatingone dose leveln times. Maximizing Xd we findthat this dose level satisfies

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    13/18

    LOCALLYOPTIMAL DESIGNS 5972e0d + d =2

    (40) d ; 1.6For this locally optimal dose level, the probability of survival Pd is very close to.2. An interesting by-product is that for the general design the maximum likeli-hood estimates are not too simple to obtain or study. For the optimal designthe estimation problem is that of estimating the probability associated with abinomial distribution.2EXAMPLE. Let A and B represent two characteristics that members of apopulation may or may not have, for example, the habit of smoking andheart dis-ease. Let A and B represent the complementary characteristics. It is desired toestimate the degree of dependence of the two characteristicsA and B. Five experi-ments may be performed. These correspond to examining individuals either:(i) at random; (ii) with characteristic A; (iii) with characteristic A; (iv) withcharacteristic B; or (v) with characteristic B. The parameters involved arePA, PA = 1 PA ,PB, PB = 1 - PB, and 0 = PAB - pApBwhere p with a sub-script indicates the proportion of the population with the characteristics of thesubscript. There are three independent parameters.In the case where PA and PB are known, it has been shown by Blackwell [4]that to test for independence an optimal design involves repeating one experi-ment n times. This experiment is the one which corresponds to the smallest ofthe four probabilities PA, PA, PB, and PB. Here, Theorem 1 may be appliedto yield the same result if it is desired to estimate 6 when 0 is assumed to be closeto zero.Suppose, now, that our problem is modified so that PBis only approximatelyknown. Here, Theorem 1 applies with k = 2, s = 1, and tells us that we shoulduse at most two of the experiments. Furthermore, since selecting an individualat random is equivalent to a mixture of two of the other experiments we mayconfine our attention only to pairs of the other four experiments. Let us nowevaluate the information matrix XA corresponding to examining an individualwith characteristic A, this information matrix to be evaluated at 0 = 0. In thisexperiment the probability of observing a smoker is PB + O/PA . If the individualobserved has characteristic B

    L = log (PB + c) l (PB + T,)PAand

    dL 1APB PB__________ ~~~PA

    2 The author wishes to express his thanks to Fred Andrews for his assistance on thisexample.

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    14/18

    598 HERMAN CHERNOFFIf the individual observed has characteristic B,

    L = log (1 PB - ), -1(PA ~~PBT- ) PAand

    dL ~-19PBPB pA,

    Hence2 Pi(41) XA = 1 PA PA~ PA APBPB 1 PA P-XPB AP

    Similarly

    (42) X1 p PAPA PA PB PB PA PAPX(43) XB 1 PB0PA PA PB PB Ooand

    (44) XB| P1lPA Pi PB PB| o

    From the remarks of the previous section, it follows that Elfving's results [1]may be applied. The geometrical figure that is developed shows immediatelythat the optimal design consists of using either B, or B or A and A each half thetime, according as to which of the numbersVPB' 2-v\pA pi

    is greatest. This last result can also be obtained directly without computationaldifficulty.

    8. Appendices.APPENDIX A. Extension of the inverse to nonnegative definite symmetricmatrices.

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    15/18

    LOCALLY OPTIMAL DESIGNS 599Here we extend the notion of the inverse of a matrix to nonnegative definitesymmetric matrices and show how this extension has statistical significance.Suppose that X is nonnegative definite and symmetric. Let Y be a symmetricmatrix so that X + XY is positive definite for positive X sufficiently small. Thenwe definethe inverseof X relativeto Y by

    (45) X-I = irn (X + Xy)-.The usefulness of this definition arises mainly from the following property.PROPERTY 1. The diagonal elementsof Xy' are independentof Y. Furthermore,

    if the ith and jth diagonal elementsof XYl are finite, the (i, j) elementof Xy' is finiteand independentoj Y. If the ith diagonal elementof X'y is finite, all the elementsofthe ith row of X2' arefinite.PROOF.Corresponding to X there is an orthogonal matrix P such that(46) P'XP = E = l 0\O 0/where E11s a diagonal p X p matrix whose diagonal elements el, e2, . . . e,are positive. We define F by

    ( F i l F12\(47) P'YP =FF F21 F22Then F22 s positive definite and(X + XY)_l = P(E + XF)Y1P'

    El-' + 0(X) -[El- + O(X)]F12F21= P- ,'F22E211 + 0(x)] - [F22-XF21 [E11+ XF11]-1F12J)1

    Let pi, and Pi2 represent the first p and the remaining k - p elements of the ithcolumn of P. Then(X + XY)Yj = p1lEl 1pl- p'2F2 2F2lElljpji - pilEllFl2F22lPj2

    (48) +!>, pt2F2'pJ2 + pt2F2'F2iEi'Fi2F2p1j2 + O(X).Suppose that lim .o+(X + XY)Yis finite. Then Pi2 = 0 and

    p 2(49) iim (X + XY)" = Pi1Ej1Pi1 - E phiX @4 h=l ehwhich is independent of Y. Also(50) lim (X + xY)Y = p1Ej1'pji - p'1E-ji1Fl2F2pPj2X ,o+

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    16/18

    600 HERMAN CHERNOFFwhich is finite (though it depends on Y). Suppose that (X + XY)"and (X + XY)are finite. Then Pi2 = 0, p,2 = 0, and(51) irn (X + XY)ij = pl = phiph>,-.0+ h=1 ehwhich is independent of Y.Let us now assume that the probability distribution of the data of an experi-ment depends on sp = (sI , (P2, X * * * Xpa) and that the information matrix withrespect to 'p is positive definite. Let us assume that the above distribution isindependent of i = (9/1,2, X * * ,X h). Suppose now, that the parameters inwhich we are interested are 0 = (01, 02, X * - ,X) and -1= (X1, fl2 X * * ,X ta) wherea + b = c + d and there is a one to one relationship between (p, ik) and (0, t).In fact, suppose that(53) 0 = (w)(54) v = 92g(P,ik)where the Jacobian of the transformation is not zero and where for each com-ponent of X the partial derivative with respect to some component of it does notvanish. We also suppose that the likelihood may be expressed in terms of (0, n),that is,(55) L = u(p) = w(0, )We are interested in the following information matrices:(56) U = Efu ,u,(57) W =E{(, WooW;)} W ( c : We??)

    'Wo 'q )q (W W ),where us.is a row vector whose ith component is &u/&i. We shall also use thenotation 'po to denote an a X c matrix whose (i, j) element is 3'0ji/a0j.We as-sume that U is positive definite and U-' represents a covariance matrix E.VFor our extension of the notion of the inverse of a matrix to be suitable, it shouldyield for us the following property.

    PROPERTY 2. The matrix W-' may be decomposedas follows:(58) W-1 (W= We"where Wjjes uniquely definedand is given by(59) Wee= ov. = eand where thediagonal elementsof W7 are infinite.

    PROOF:W = A'( A. n n

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    17/18

    LOCALLY OPTIMAL DESIGNS 601where

    A = _ 0(# )(i/ )is nonsingular. Furthermore

    [w + XA'( ) A] = A-' ( ) A'"WA'(? I) A= { /XE$ /f+1m--1 I + 77 74

    Property 1, together with the fact that not all components of t7,pvanish, yieldsour desired results.APPENDIX B. Justification for the use of information matrices. We sketch here abrief justification for the use of information matrices in our formulation. Thisjustification presupposes that we are interested in the variances of the asymptoticdistribution of the estimate based on our design. Rubin has shown [5] that undermild conditions these variances are greater than or equal to the diagonal ele-ments of the inverse of the information matrix. On the other hand, if the designinvolves repeating a fixed number of these experiments in certain proportions,one (again under mild conditions) obtains equality. Since the "optimal" designusing the information criterion involves repeating a fixed number of experi-ments in certain proportions, the sum of the variances of the asymptotic dis-tributions of the estimates with this "optimal" design is actually equal to theminimum v, which is a lower bound for the sum of the variances of the asymptoticdistributions of the estimates for all designs.APPENDIX C. Therelevanceof sums of variances.If one is interested in the pa-

    rameters 01, 02, ,.. ,Os, it may be assumed that for a given estimate t1, t2, ,there is a loss represented by a function(60) g(t,0 ) = g(tl, t2, ..., ts 1, 2 28)which as a function of the ti is a minimum at ti = 0i. If we assume that g issufficiently well-behaved and that the sample is large enough so that the ti areclose to 0i with large probability

    g+(t, ) = g(0, 0) + Gq(I, ) (ti - oi)(t, - 0,) + O(t - 0).The "value" of our statistic is measured by how small E{g(t, 0)} is. For largesamples (size n) we have, under mild conditions,

    E{g(t, 0)} = g(0, 0) + - E aijaij + o()nli,j=i fl

  • 8/2/2019 Chernoff 1953 Locally Doptimal Designs

    18/18

    602 HERiMAN CHEPRNOFFwhere ol ij I s the covariance matrix of the asymptotic distribution of t andaij =Gdtj