efron 19fdf75 efficiency

Upload: mohd-shadab-alam

Post on 03-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Efron 19fdf75 Efficiency

    1/7

    TheEfficiencyfLogistic egressionompared oNormalDiscriminantnalysisBRADLEY FRON*

    A randomvectorx arises from ne of wo multivariate ormaldistribu-tionsdifferingn mean but not covariance. Atraining et xl,X2, X*,of previous cases, along withtheir correctassignments, is known.These can be used to estimate Fisher's discriminantby maximumlikelihood nd then to assign x on the basis of the estimated discrim-inant, method known s the normaldiscrimination rocedure.Logis-tic regressiondoes the same thingbutwith he estimationofFisher'sdisriminant one conditionally n the observed values of xi, X2, *-,x,. This articlecomputestheasymptotic elative fficiency fthe twoprocedures.Typically, ogisticregression s shown to be betweenonehalf nd twothirds s effective s normal discrimination or tatisti-cally interesting alues of the parameters.1. INTRODUCTION ND SUMMARY

    Suppose that a random vector x can arise fromoneof two p-dimensional normal populations differingnmean but not in covariance,x - 9t,,(a1, 1) with prob 7r- (1.1)x P(tiO, 1) with prob 7ro,

    where 7r- + ro = 1 -If theparameters i-,ro= 1 -7ri, il, Lo, X are known,thenx can be assignedto a populationon the basis ofFisher's lineardiscriminant unction 1].X(x) = /30 O'x,

    0--log- - - (Ul- io'L;-'lo) ' (1.2)7r- 2U1 (1.2)1

    The assignment s to population 1 if Xx) > 0 and topopulation 0 if Xx) < 0. This method of assignmentminimizes he expectedprobability f misclassification,as is easily shownby applyingBayes theorem.There isno loss of generality n assuming s nonsingular s wehave done, since singular cases can always be madenonsingularby an appropriatereductionof dimension.In usual practice,the parameters7r, To, U'o,Uo, Ywill be unknownto the statistician,but a training et(yl, xi), (y2,x2), --, (y.,xn) will be available, whereyj indicateswhichpopulationxj comesfrom,o

    yj = 1 with prob7ii , (1.3)0 with prob 7rO,* Bradley Efron is professor, epartment of Statistics, tanfordUniversity,

    Stanford, alif. 94305.The author sgratefuloGusHaggstrom f RAND Corpora-tionforhelpful omments.

    and, of course,Xi yi 9 IP Uvlj, ; (1.4)

    The (yj, xj) are assumed independent f each otherforj = 1, 2, *.. , n. In this case, maximum ikelihoodesti-mates of the parameters re available,7r*= nl/n, 7ro = no/n

    U1=Xl- E Xi/nl, Uo X- xi/no1Y=1 Yj=Oand

    Vj=1+ E (xj - xo)(xj - ko)']/nY/j=0

    whereni - 7Yo and no ni - no are the numberofpopulation1 and population0 cases observed, espec-tively. Substituting hese into (1.2) gives a versionofAnderson's [1] estimated inear discriminant unction,say, (x) = ABo 'x, and an estimated iscriminationprocedurewhich assigns a new x to population 1 or 0as S (x) is greaterthan or less than zero. This will bereferred o as the normal discrimination rocedure.Bayes' theorem hows that Xx), as given in (1.2), isactually the a posterioriog odds ratio for Population 1versusPopulation0 having observedx,

    7ri(Xj)X(x,) log j, rixi) prob {yj = iIxj}i = 1,0 . (1.6)To simplify otationwe will also write

    7rij- ri(xj) and X log (7r1/7ro) (1.7)Giventhevalues x1,x2, *.., xn, theyJ re conditionallyindependent inaryrandomvariables,prob {yj = l1xj} = 7rij

    = exp (Oo+ g'xj)/[l'+ exp (io + f'x)], (1.8)prob {yj = OlxjA= 7rOj 1/[1 + exp (3o + f'x)].To estimate (3oy, ), we can maximize the conditional

    a Journal f the AmericanStatisticalAssociationDecember 1975, Volume 70, Number 352Theory nd MethodsSection892

    This content downloaded from 132.170.168.139 on Thu, 4 Apr 2013 13:37:07 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 Efron 19fdf75 Efficiency

    2/7

    Logistic egressionersusNormal iscrimination 893likelihoodfflo,f(Yly. /n IX1, *...*,Xn)

    n= 7i jY7r o(l i1) )j=1 19n exp [(Eo + 5'x,)yj] (1)=11j= [1 + exp (3o + 5'xj)]'

    withrespect o (0o, 1).The maximizing alues,call them(Bo D), give X(x) = Bo+ D'x as an estimate fthe ineardiscriminantfunction. The discriminationprocedurewhich choosesPopulation 1 or 0 as X(x) is greater hanor less than zero will be referredo as the logisticre-gression rocedure. An excellentdiscussion fsuchpro-cedures s given n Cox's monograph 2].The logisticregression roceduremustbe less efficientthan the normaldiscrimination rocedureundermodel(1.1), at least asymptotically, s n goes to infinity,sincethe latter s based on thefull maximum ikelihoodestimator orX(x). This articlecalculatestheasymptoticrelative efficienciesARE) of the two procedures.Thecentralresult s that,under a varietyof situations ndmeasuresofefficiency,heARE is givenby

    1 + A'riro X-r2/2ARE = edA2/8 dx , (1.10)(27r) 7re- Ax/2 + woe-Ax/2where

    A-E( - vo)'1(Ii v- O)]i , (1.11)the square root ofthe Mahalanobis distance.Followingis a small tabulation of (1.10) forreasonablevalues ofA, with ri = 7= (the case most favorableto thelogisticregression rocedure).

    A 0 .5 1 1.5 2 2.5 3 3.5 4ARE 1.000 1.000 .995 .968 .899 .786 .641 .486 .343(1.12)Why use logisticregression t all if it is less efficient(and also moredifficulto calculate)? Because it is morerobust, t leasttheoretically,hannormaldiscrimination.The conditional ikelihood (1.9) is valid undergeneralexponential amily ssumptions n thedensityf(x) ofx,

    f x) = g 01,vj)h(x, ) exp (01'x) 'with probr (1.13)f x) = g 6o q)hx, 0) exp (0o'x) with prob7rowhere Ir + mro 1Here, n is an arbitrarynuisance parameter, ike X in(1.1). Equation (1.13) includes 1.1) as a special case.Unfortunately,1.12) showsthat thestatistician aysa moderatelysteep price for this added generality,assuming,of course,' hat (1.1) is actually correct.Justwhen good discrimination ecomes possible, for A be-tween2.5 and 3.5, theARE of thelogisticprocedure allsoff harply.The questionof howto chooseorcompromisebetweenthe two proceduresseems important,but noresults are available at this time. Another mportant

    unanswered uestion s therelative fficiencynder omemodel other than (1.1), when we are not playingballon normaldiscrimination'somecourt.In many situations, he samplingprobabilitiesrn, 7roacting in (1.1) may be systematicallydistortedfromtheirvalues in the populationof interest.For example,if Population 1 is murdervictims and Population 0 isall otherpeople, a study conducted n a morgue wouldhave Irlmuch arger han inthe wholepopulation.Quiteoftenni and noare set by the experimenternd are notrandomvariablesat all. These cases are discussedbrieflyin Section 5.Technical details relating to asymptoticnormalityand consistencyare omitted throughout the article.Thesegaps can be filled nbytheapplicationofstandardexponentialfamilytheory,as presented,say, in [5],to (1.1). For anothercomparisonofnormaldiscrimina-tion and logistic egression,hereader s referredo [4].In that article,and also in [3], the distributions f xare allowedto have discrete omponents.2. EXPECTED RROR ATE

    By means of a linear transformation = a + Ax,we can alwaysreduce (1.1) to the casex - 91p(A/2)el, ) , with prob7r,X (2.1)x- mp- (A/2)e1, ), with prob ro

    where ri + ro = 1andel' (1, 0, 0, * *,0); I is thep X p identitymatrix;and A = ((p, - po)''-l (- p)) as before.The boundary B -{ x: Xx) = 01 between Fisher'soptimumdecisionregionsforthe two populations rans-forms o the newoptimum oundaryn theobviousway,B_ {x:X(x) = O - {:x = a+Ax,xEB} . (2.2)

    Moreover, if xl, x2, * x.n is an iid sample from(1.1), and xi= a + Axi, i=1, 2, ... , n, is thetransformed ample, then both estimated boundariesP _ {x: S x) = 0} and B- {x: X(x) = 0} also trans-form s in (2.2). In words, hen,forbothlogisticregres-sion and normal discrimination, he estimated dis-criminationprocedurebased on the transformed atais the transformf that based on the originaldata. Allofthesestatements re easy to verify.Suppose we have the regionsRo and Ri, a partitionofthep-dimensionalpace E', and we decideforpopula-tion 0 or population1 as x falls into Ro or Ri, respec-tively. The errorrate of such a partition s the prob-ability ofmisclassificationnderassumptions 1.1),ErrorRate -7r prob {xC Ro x -D9p(tll m) (2.3)+ro prob {xE R1IX 9p(-o, t(.3When the partition s chosen randomly, s it is by thelogistic egressionnd normaldiscriminationrocedures,error ate is a randomvrariable. or eitherprocedure,tfollowsfrom he preceding hat error ate willhave the

    This content downloaded from 132.170.168.139 on Thu, 4 Apr 2013 13:37:07 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 Efron 19fdf75 Efficiency

    3/7

    894 Journalf theAmericantatistical ssociation,ecember 975same distributionnder 1.1) and (2.1). Henceforth,ewillworkwith the simpler ssumptions2.1), callingthis he standard ituation withhebasicrandomari-able referredoas x rather han x for onvenience).Forthe tandardituation,isher'sinear iscriminantfunction1.2) becomes

    Xx) =X + Ax1 (2.4)The boundary Xx) = 0 is the (p - 1)-dimensionalplane orthogonalo the x1 axis and intersectingt atthevalue (2.5)In thefigure,heoptimal oundarys labeledB(O, 0).The figure also shows anotherboundary, abeledB dr,da), intersectinghexiaxis t r + dT,with ormalvector t an angleda fromhexi axis. The differentialnotation r and da indicates malldiscrepanciesromoptimal, hichwill e the ase nthe arge ample heory.Theerrorate 2.3)ofthe egionseparatedyB dT, a)willbe denoted y ER (di, da).Letting

    Di) (A/2) T, Do- (A/2)+ , (2.6)we see that the error ate of the optimalboundaryB(O,0) is

    ER (0, 0) = rit(-Di) + 7ro4(-Do) ' (2.7)where

    ?I'(Z) f p(t)dt and p(t) (2r)-irexp -t2/2)-00as usual. (We aretacitly ssuming hatthe tworegionsdivided yB dr,doe) reassignedopopulationsand0,respectively,nthebestway.)

    Optimum oundary x) = 0 in StandardSituationTheOptimumoundary0,)(Mx) -Q)

    Non-OptimumBoundarydr,da)

    d~~d0 -A~~~~~~

    I < x1 xis-A/2 0 r

    a Also shown s someotherboundaryntersectinghexl axis at T + dT and atangle da.

    Now, definedi- (D1 - dT) cos (da), 2do (Do + dr) cos (da),

    the distancesfromL and t,o o B(dr, dca).Then,ER (dr, dac) = 7r4 - di) + ro4-do) . (2.9)

    From theTaylor expansions,cos (dca)= 1 - (da)2/2+ *.

    andP(-D + dT) = 4(-D) + p(D)dr+ D (pD) (dr)2/2 +we getthefollowingemma.

    Lemma 1: Ignoring differentialerms of third andhigher rders,ER (dr,da) = ER (0, 0)

    + (A/2)7riq'(Di)[(dr)2+ (da)2] . (2.10)Equation (2.10) makes use of the fact that, by Bayestheorem r pDi)/7rooDo) = 1, or equivalently,7rl o(Di)= ro.p(Do) . (2.11)

    Suppose now that the boundary,B(dr, da) is givenby thosex satisfying(X+ d,3o) (Ael + dg)'x = 0 , (2.12)

    dfo and dg = (d,3i, 02, *, dO,)', indicating mall dis-crepancies rom heoptimal inear function2.4). Again,ignoring igher-ordererms,we havedr = (1/A) -df3o+ (X/A)dO,)

    and so 1/ 2X 2(dr)2= - (Vd,o)2 --d#3dA + - (dgl)2). (2.13)Similarly, xpansionof

    da = arctanE (d,B2)2 * + (dfp)2)i/(A + df3l)]gives

    (dax)2 ((d32)2 + (d33)2 + ... + (diP)2)/A2 . (2.14)Finally, suppose that under some method of estima-tion, the (p + 1) vectorof errors d,3o,@)has a limit-ing normal distributionwith mean vector 0 andcovariancematrix s/n,

    ?S: v/(dIo) * 9zP+ (o, 1) . (2.15)The differentialermwhich ppears in Lemma 1,(dr)2 + (da)2 =- I (d3o) - - dfod/iA2 L

    + -2 (dA31)2 (dIA2)2 ** + (dp2, (2.16)will thenhave the limiting istribution f 1/ntimesthe

    This content downloaded from 132.170.168.139 on Thu, 4 Apr 2013 13:37:07 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 Efron 19fdf75 Efficiency

    4/7

    Logistic egressionersusNormal iscrimination 895normalquadraticform(1/A')[zE2 - (2X/A)zoz, + (X/A)2Z12 + Z22 + * + Zp2]where z - %p+1(O,1). Assuming moments convergecorrectly,which turns out to be the case forthe logisticregression nd normaldiscriminant rocedures,Lemma1 gives a simple expressionfor the expectederrorratein termsof the elements io of 1.Theorem : Ignoring ermsoforder ess than 1/n,El ER(dr,da) - ER (0, 0) }

    ______)l) r 2X I= L~~aoo - -01o + -2 11l + a722 ***+ OfppJ2An A A (2.17)The quantityE{EER dr, da) - ER (0, 0)1 is a measureofour expected regret, n terms of increased errorrate,when using some estimated discrimination rocedure.In Section 3, we evaluate X forthe logistic regressionprocedureand the normal discriminant rocedureandthenuse Theorem 1 to comparethe two procedures.

    3. ASYMPTOTIC RROR ATESOF THETWOPROCEDURESFirst we consider the normaldiscriminant roceduredescribed fter 1.5).Lemma 2: In the standard situation, the normaldiscriminant procedure produces estimates ( ')(X Ae,') + (do,0 d ') satisfying

    C: -\/n(d) 91P+1(?, ) ' (3.1)where-A ~ -r+/4 -(wo-27) 0 * **0 0

    7170 2 2 ? 1 +2r? i0 0 ... 00 0 + o .o 1 +alo 0 (3.2)

    Proof: The densityof a single (y, x) pair under (1.3)-(1.4) isA'Als.o.2; (y, x) = ry X I 2X exp -2-(x1- '- -XT)]. (2ir)x/I X (3.3)X log irl/lro,s before.Let us write the distinct elements of '-1 as ap(p + 1)/2vector irf , y12, . * lp, a22C 23, .. . orpp) andindicate thisvector as (e(l), C(2)), wherea 1) 9 (a(ll a12 * * * ,rlP) , (2) = (?22 ?728 . . . rPP) (3.4)Standard results using Fisher's information matrixthengive the following symptoticdistributions or themaximum ikelihood stimates n the standard situation.

    X3:V/n(>S ) -S~(0, (1/iroir,)),?: V'n(i~- w) p(O, (1/iri)I), i = 1, 0 , (3.5)S: V/n(&f() - a(1)) -' XL(O, + E,,).

    Moreover,X,Li, o,&(1) and (2) are asymptotically n-correlated. We do not need the limiting istribution f&(2) for heproof fLemma .) Here,X = log ri/7ro,ndEl, is the p X p matrix having upper leftelement oneand all others ero.Differentiating1.2) gives

    813o O/3o-- 1 2 t=_U/,a 0x ',af* t/30 __ d/0 hoi.-Lt0 - Uoli)lt

    aoi +ij b(3.6)-0,7

    ____ ~~a@ Ei, + EjC,at (Ocri I + 8b.jMuOindicating the ith component of Lao; ikewise for t1,with derivatives nvolvingvectors taken componentwisein the obvious way. Moreover, bij = 1 or 0 as i = j ori # j, and Eij is thematrixwith one in theijth positionand zero elsewhere. n the standardsituation,we havethe differentialelationship dx

    d) 1 2t ei 0 j(dO O I - I AI ?~ dc 1

    id(2)Letting M be the m,atrix n the right ide of (3.7),S: V/nd ^ ) -* St1(O, M[nl 1;$^# o;(1) (2)]M') , (3.8)where nl $,$,,r(i)

  • 8/13/2019 Efron 19fdf75 Efficiency

    5/7

    896 Journalf the Americantatistical ssociation,ecember975As-Ai(1, A) beingdefined y

    A o e-A2O8xi(p(x)Ai(7r, A)- dxJ_ 7rieAx/2 7roe-Ax/2i =; O,1 2 .(3. 11)

    Proof:The density 1.9) can be writtenn exponentialfamily orm sf,Bo, ,(y, y2, . . . XYn lxI, ** Xn)

    = exp [(13, g')T - q6(8o,g)]T y (3.12)i-1 Xjn'(/03, = E log (1 + exp (,1o 5'xj))j=1

    The sufficienttatisticT has meanvector nd covariancematrixEcO,OT E vlj( ) Covpo,#T

    =E 7rij7roj )(1, xj ) *(3.13)j=1 jLet F(n) denote the sample cdf of xl, x2, *, xn, andsupposethat S: F n) F as n > oo Then,

    1 r/1lim- Covo,,T j ) 1,x')ri(x)rro(x)dF(x) , (3.14)n ,o n Ep Xwhere 7ro(x) 1- ri(x) = [1 + exp - (3o + L'x)]-I.Exponentialfamily heory ays that the mappingfromthe expectationvectorE#,,#To the naturalparameters#o5 has JacobianmatrixECovP0,#T]-'.Therefore, hedelta method gives7 / o\imn Coveoatn--o

    =-[j (1) (1,x')7rl(x)ro(x)dF(x)7j. (3.15)Under the sampling scheme (2.1), F will be themixture of the normalpopulations 0Y,((A/2)ej,I) and9lp((-A/2)ej, I) in proportionsrl,7ro. n the standardsituation, iro(x) = 1 - r1(x) = [1 + exp - (X+ Ax,)}1.We get Ao A1 O ..o-1 A1 A2 ? ?.lim Cov#,,,#T7rlwroA 0 A00. (3.16)n *oo n A

    0 0 .. 0 Aofrom 3.14). The covariancematrix 3.10) for (d5o,d5')follows from (3.15). The fact that (5o, i') is consistentfor (B,} ~') and asymptotically ormal,which s the re-mainderofLemma 3, is not difficulto show,giventhestructure 2.1). Like most ofthe otherregularity rop-erties, t willnot be demonstrated ere.

    We can now computethe relativeefficiencyf ogisticregression to normal discriminationby Theorem 1.Denote the errorsfor the two proceduresby (di, d&x)and (dT,dai), respectively,and define the efficiencymeasure, E{ER (sr, da) ER (0, 0) (Eff (XI A) lim (3.17)n E{ER (dT,d&i) ER (0, O)}Theorem 1, and Lemmas 2 and 3 thengiveEffp- (Ql + (p - 1)Q2)/(Q3 + (p- 1)Q4), (3.18)

    where 'A2A1+- (2rol-ri)- -4 2AA

    Rewritin (3.19)gvsasml xrsinfrES \ )-landpro-.i-12orAEff2 A,

    Q2 1+ rj0A) (2,)+ Ap1\) 1ff (3.19)

    Q= (1~)AA (- A232 A)Q4 -.AoRewriting 3.18) givesa simple expression orffoo (X,A)as a weighted verage of the relative efficiencies henp = 1 and p -= roo.

    Theorem: The relative fficiencyf ogistic egressionto normaldiscriminationsEffp XI A)

    _q(X, A) Eff (X, A) ? (p - 1) Eff0.X, A)qXc A + (P - 1) (3.20)where

    Eff1N, A) Q-/Q3 and Effi\ X,A) Q2/Q4 ,(3.21)as defined n (3.19), are the relative efficiencies henp = 1 and p = c(A,respectively,nd

    q(),QA) Q4 (3.22)It is obviousfrom 3.18) that Effp.X,A)= Q2/Q4 reallyis the asymptotic fficiencys p -*, . For p = 1, (3.18)gives Effi (N,A) = Q1/Q3. This followsfromLemma 1because da can always be taken equal to zero whenp = 1.

    The case X = 0 gives a particularly imple answer(sincethenA, = 0).Corollary:WhenN = 0, i.e., when7ri = 7r0 =YEff (N,A) = Effw N,A) = Ao(1 + A2/4) ,(3.23)

    for ll values ofp.Table I gives numericalvalues for the quantities n-

    This content downloaded from 132.170.168.139 on Thu, 4 Apr 2013 13:37:07 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 Efron 19fdf75 Efficiency

    6/7

    Logistic egressionersusNormal iscrimination 897Relative fficienciesfLogisticRegression oNormalDiscrimination

    IT1(or iro) A Effx EffI q Ao Al A2.5 2 .899 .899 1 .450 0 .266.6 2 .892 .906 1.024 .458 -.038 .273.667 2 .879 .913 1.070 .465 -.067 .287.75 2 .855 .915 1.177 .488 -.108 .319.9 2 .801 .804 1.697 .589 -.253 .487.95 2 .801 .706 2.233 .674 -.375 .667.5 2.5 .786 .786 1 .307 0 .154.6 2.5 .778 .794 1.013 .311 -.025 .158.667 2.5 .762 .806 1.038 .319 -.044 .167.75 2.5 .733 .819 1.096 .337 -.074 .188.9 2.5 .660 .750 1.379 .423 -.181 .304.95 2.5 .650 .637 1.671 .501 -.282 .441.5 3 .641 .641 1 .197 0 .084.6 3 .633 .649 1.008 .200 -.016 .087.667 3 .618 .662 1.023 .206 -.027 .092.75 3 .589 .682 1.057 .219 -.046 .104.9 3 .511 .667 1.225 .282 -.117 .175.95 3 .492 .588 1.400 .344 -.189 .265.5 3.5 .486 .486 1 .120 0 .044.6 3.5 .479 .493 1.005 .122 -.009 .045.667 3.5 .467 .505 1.014 .125 -.016 .048.75 3.5 .442 .526 1.035 .134 -.027 .055.9 3.5 .370 .550 1.142 .176 -.070 .095.95 3.5 .348 .516 1.252 .220 -.116 .147.5 4 .343 .343 1 .069 0 .022.6 4 .338 .348 1.003 .070 -.005 .022.667 4 .328 .358 1.009 .072 -.009 .024.75 4 .309 .375 1.024 .077 -.014 .027.9 4 .252 .416 1.094 .103 -.039 .048.95 4 .230 .416 1.168 .131 -.065 .076a See (3.17), 3.19), 3.20), 3.21)for efinitionf erms.

    4. ANGLEAND INTERCEPT RRORThe terms Effl (X, A) and Effl X,A) whichappear in (3.20), Theorem2, have another nterpreta-tion. Effl,X, A) is the asymptoticrelativeefficiencyflogistic egressiononormaldiscriminationor stimatingthe angleofthe discriminant oundary,

    Effl,\, A) = limVar (dea) (4.1)n-.oVardue)(See the figure nd the definitions receding (3.17).)Likewise,Effl1X, A) is the asymptotic elative efficiencyfor stimating he ntercept fthe discriminantoundary,Var d )Effl1X, A) lim V (4.2)n-aoo Var (dT)These results follow immediately from (2.13), (2.14),(3.2), (3.10) and (3.21).Comparing 2.14) with Lemmas 2 and 3 shows that

    1S: n.(dA)2- - (1 + A27r,7ro)X2p_lnri7roA2

    The asymptoticrelative efficiencyf logisticregression

    tonormal iscriminationn termsf ngularrrors thusARE = (1 + A2irTlo)A o

    1+ A 7r 7rO X e_X2/2e-iA2/8 ex-I- dx (4.4)(2Xr)* - irie1AX2 roe-AxI2in the trongense hat sample f sizeniusingogisticregressionroduces symptoticallyhe same angularerror istributions a sample f size4 = ARE n, usingnormaldiscrimination.rom (1.12), we see that ifX= 0, A = 2.5, for example, n = 1,000 is approxi-mately quivalent o n = 786. ( Effo n the table isalso ARE as given y (4.4).)The correspondingtatementor nterceptrrors nottruebecause hetwomatricesnvolvedn thedefinitionofQ, andQ3, 3.19),are notproportional. e havetosettlefor the weaker econd-momentfficiencytate-ment4.2).However, henX= 0, .e.,when ri= ro 2,(2.13)andLemmas and3 show hat

    i, n. dA)2 (4/A2)(1+ A2/4) 2?: n* d 1)22 (4/A2)1/Ao)X21In this case, (4.4) with 7ri= 7ro 2 again gives theARE in the strongsense of asymptotically quivalentsamplesizes.Combining 4.3) and (4.5) withLemma 1 showsthatwhenX = 0 (and so D, = A/2),2: n{ER (di, dc) - ER (0, 0)}

    (so(A/2)/aA)(1 A2/4)X2p 4.6)S: n{ER (di,da) - ER (0, 0) (/2(A/2)/A)(1/Ao)X2v.Thus, errorrates for samples of size niand 4 = ARE*will have asymptotically quivalent distributions,withARE given by (4.4), 7r, = 7rO . This is not true for7rl,to # 2, but as thedimension gets arge, t is. Thatis, error atesforthe twoprocedureswill have the sameasympto.tic istribution f n = ARE, ni, ARE given by(4.4), when p --oo and ni/p oo. A simple proof of thisfollows rom 2.16) and Lemmas2 and 3.The angular error,da, unlike the errorrate, is notinvariantunderlineartransformations.ormulas (4.1),(4.3), and (4.4) refer o a standardizedangularerrordefined fterwe have made the linear transformations,which take the generalmodel (1.1) into the standardsituation (2.1). However,it is easy to show that (4.1)and (4.4) (but not (4.3)) also hold for the true, un-standardized, angular error. This true errorwill besome quadratic form n the standardized coordinatesdf32, 33, .., d,Bp, ot dependingon which procedure sused. The result follows,because for both procedures,(d32) ''c4 d3,) has a limitingnormaldistributionwithcovariancematrix roportionalo the dentity.Actually(4.3) holdswith 'X2p1 ' replaced y a certainweightedsumof ndependent2i variates.)There retwogoodreasons o be interestednangularerror. irst, nder he fixed ampling roportionetupof Section 5, it is the only errorof interest.Second,

    This content downloaded from 132.170.168.139 on Thu, 4 Apr 2013 13:37:07 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 Efron 19fdf75 Efficiency

    7/7

    898 Journal f the Americantatistical ssociation,ecember 975there is the well-known fact that minimizingLi=, [yi - (a + b'xi)]2 overall choicesof the constanta and vectorb givesb equal to . (But a does not equal4o.) This connects normaldiscriminationwithordinaryleast squares analysis and provides some justification,or at least rationale, for using } outside the frame-work 2.1).Other efficiencyomparisonsbetween the two pro-cedures, .g., in estimatinghe slope 11 1 of the dis-criminantfunction,can be obtained from Lemmas 2and 3.

    5. DISTORTED AMPLING ROPORTIONSIt may happen that the true probabilities il and ioforpopulations1 and 0 are distortedn a knownway todifferentalues ri and 7ro y the nature ofthe samplingschemeemployed.LettingX log Iri/ro,X-log ir/i?o,suppose that for ome knownconstant ,

    X = X+c . (5.1)For example,experimental onstraintsmightcause thestatisticianto randomly exclude from his trainingsetnine out of ten population 0 members, n which casec = log 10. The normal discriminationprocedurede-scribedat (1.5) is thenmodifiedn the obviousway. Anewx is assigned to Population 1 or0 as S x) is greateror less than c. The logisticregression rocedure 1.9) issimilarlymodified.Theorem 2 remains true as stated except for thefollowingmodification.The vector (1, X/A) (and itstranspose),which ppears in thedefinitions fQ1 nd Q3in (3.19), is replaced by (1, X/A). The constantsAi Ai((X, ), whichappear in Q3, re notchangedtoAi(X,A). The proof fthis s almostexactlythe same astheproof fTheorem2.Eff. (X,A), theangularefficiency,emainsunchanged,which snotsurprising,ince thediscriminationoundaryfor ny choiceof c is parallelto that for = 0. Onlytheintercept s changed. When ri = Iro= .5, the effect fchoosingc $ 0 is to reduce Eff1X, A), the interceptefficiencyflogisticregression omparedto normal dis-crimination,s shown n thefollowingabulation.

    A = 2, 71r1 = .5 A = 3, ri = .5c 0 1 42 ?3 0 i 1 ?2 ?3Eff, .899 .869 .836 .819 .641 .604 .550 .516 (5.2)

    Effiforothervalues of c, 71,A can be obtainedusing theentriesAO,A1,A2 in the table.Most frequently,he samplesizes ni and noare set bythe statistician nd are notrandomvariables at all. Theusual procedure n thissituation s to estimateonly theangle,notthe intercept, fthediscriminationoundary.In terms of the figure, he statisticianuses the data toselect a family of parallel boundaries B (., da). Thevalue ofthe ntercept T is chosenon a priorigroundsbyjust guessingwhat X s, ormay not be formallyelectedat all.

    Either normal discrimination r logistic regressionmay be used to estimate hevectorg in (1.2). It can beshown hat d' and d5 stillhave the imiting istributionsindicated n Lemmas2 and 3,with7r, nd rO eplacedbyri n1/n nd ro- no/n. n termsofangularerror, heARE (4.4) still gives the asymptotic elativeefficiencyflogisticregressiono normaldiscriminationn thestrongsense of Section 4. The quantities 7r,,7ro n (4.4) arereplaced by ri = n1/n,ro= no/n,wherethese propor-tions are assumed to existand do not equal zero in thelimit.The estimates s, , i, given n (1.5), are maximumlikelihood,whether 1,noare fixed rrandom. t followsthat ' = ( -' 'o) '-, which we can still call thenormaldiscriminationstimate, s maximum ikelihoodin either ase. Standard maximum ikelihood rgumeffts,similar to the proof of Lemma 2, show that d' is dis-tributed s stated in Lemma 2, with r, 7ro eplacedbyri,rO.

    Let T1- .1 yj be thefirst oordinate fT in (3.12),and let T2 be the remainingp coordinates.Given thatT, = ni, the conditionaldensityof Yl, Y2, .*, yn s anexponentialfamilywithnatural parameter I and suffi-cientstatisticT2,f(Yl, Y2,2) , Yn T1 = nx,Xl,12, ., Xn)= exp [ 'T2 - tnj(g) ], (5.3)whereVnl(g) is chosen to make (5.3) sum to unityoverall choicesof Yi, *,YnwithY7.=1j = ni. The analogof the logistic regressionprocedure is to select 5 tomaximize he ikelihood 5.3). A modification f theproofof Lemma 3, whichwill not be presented, hows thatd5 is distributed s statedthere,with7r,,mroeplacedbyri, ro.In practice, he simplestway to apply logisticregres-sionwhenn1 nd noarefixeds simply o ignore hisfact.The standardprogramsmaximize 1.9) overthepossiblechoices of 3o,g,and thenpresent hemaximizer as theestimate of 1.This methodcan be shownto be asymp-toticallyequivalent to the conditionalmaximum ikeli-hood estimator ased on (5.3).

    [ReceivedDecember 974.RevisedMarch1975.]REFERENCES

    [1] Anderson, .W., An Introductiono MultivariatetatisticalAnalysis, ewYork:JohnWiley& Sons, nc.,1958.[2] Cox, D.R., Analysis fBinaryData, London:Chapman ndHall,Ltd.,1970.[3] Dempster,A., Aspectsof the Multinomial ogit Model,Multivariatenalysis, (March1973),129-42.[4] Halperin,M., Blackwelder, .C. andVerter, .I., EstimationoftheMultivariate ogisticRiskFunction;A Comparison fthe Discriminant unctionand MaximumLikelihoodAp-proaches, Journal f Chronic iseases, 24 (January 971),125-58.[5] Lehmann, ., Testingtatistical ypotheses,ew York: JohnWiley& Sons, nc., 1959.

    This content downloaded from 132.170.168.139 on Thu, 4 Apr 2013 13:37:07 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp