c1s2

Upload: cupidkhh

Post on 14-Apr-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 c1s2

    1/6

    March18,20031.2 Decision Theory. Usually in statistics, instead ofjust two possible probabilitydistributionsP,Q,asinthelastsection,thereisaninfinitefamilyP ofsuchdistributions,defined on a sample space, which is a measurable space (X, B), in other words a set Xtogether

    with

    a-algebra

    B

    of

    subsets

    of

    X.

    As

    noted

    previously,

    if

    X

    is

    asubset

    of

    a

    Euclidean space, then B will usually be the -algebra of Borel subsets of X. If X is acountable set, then B will usually be the -algebra of all subsets of X (if also X Rk,then all its subsets are in fact Borel sets). A probability measure on B will be called alaw. ThefamilyP of lawson(X, B) isusuallywrittenas{P, }, where is called aparameter space. Forexample,ifP isthesetofallnormalmeasuresN(, 2) for Rand > 0, we can take = (, ) or (, 2) where in either case is the open upperhalf-plane,thatis,thesetofall(t, u)R2 suchthatu > 0. Weassumethatthefunction P from to laws on B is one-to-one, in other words P = P whenever = in. SothesetsP andarein1-1correspondenceandanystructureononecanbetakenover to the other. We also assume given a -algebra T of subsets of . Most often will be a subset of some Euclidean space and T the family of Borel subsets of . Thefamily{P, }willbecalledmeasurableon(, T) ifandonly ifforeachB B, the function P(B) ismeasurableon. If isfiniteorcountable,then(aswithsamplespaces) T will usually be taken to be the collection of all its subsets. In that case thefamily{P, } isalwaysmeasurable.

    Anobservationwillbeapointx ofX. Givenx,thestatisticiantriestomakeinferencesabout , such as estimating by a function (x). For example, if X =Rn and P =N(, 1)n, so x = (X1, . . . , Xn) where the Xi are i.i.d. with distribution N(, 1), then(x) = X := (X1 + Xn)/n istheclassicalestimatorof.

    Indecisiontheory,there isalsoameasurablespace(D, S),calledthedecision space.Ameasurablefunctiond() from X intoD iscalledadecision rule. Sucharulesaysthatifx isobserved,thenactiond(x)shouldbetaken.

    One possibledecision space D would be the set of all d for , where d is thedecision(estimate)that isthetruevalueoftheparameter. Or, ifwejusthaveasetPoflaws,thendP wouldbethedecisionthatP isthetruelaw. Thusinthelastsectionwehad P = {P, Q} and for non-randomized tests, D = {dP, dQ}. There, a decision rule isequivalenttoameasurablesubsetofX,whichwastakentobethesetwherethedecisionwillbedQ. Forrandomizedrules,stillforP={P, Q},thedecisionspaceDcanbetakenas the interval 0 d 1, where d(x) is the probability that Q will be chosen if x isobserved.

    Another possible decision space is a collection of subsets of the parameter space.SupposeP =N(, 2)n onX =Rn for< < where isfixedandknown. Then[X 1.96/n1/2, X+ 1.96/n1/2]isa95%confidenceintervalfor,meaningthatforall

    .>1.96/n1/2}= 0.05. HerethedecisionspaceDcouldbetakenastheset,P{|X |

    ofallclosed intervals [a, b]R. Givingaconfidence interval(oraconfidencesetmoregenerally)isonekindofdecisionrule.

    In decision theory, we also assume given a loss functionL, which is a measurablefunction: D[0, ]. HereL(, d) isthe losssufferedwhenthedecisiond istakenand is the true value of the parameter, sometimes called the state of nature. The

    1

  • 7/30/2019 c1s2

    2/6

    followingconditionona lossfunctionwillbenotedforlaterreference,thoughnotalwaysassumed:(1.2.1) L(P, dP)=0 forevery P P, inotherwords L(P, d)=0 forall ,thatis,acorrectdecisionincursnoloss. IfthedecisionruleisanestimatorT(x) = (x) for arealparameter,thenone frequentlyused loss function issquared-error loss,L(, t) = (t)2.

    Manyauthorstreatautility functionU(, d)ratherthana lossfunction. Here largervalues of U are more favorable. If L is a loss function, then U = L gives a utilityfunction, but not necessarily conversely: in accord with broad usage of the term amongstatisticiansandeconomists,autility functionmaytakevaluespositive,negativeorzero,and the values U(P, dP) may not be 0 and may be different for different P. To give a mathematicaldefinition,autility functionisameasurable,extendedrealvalued function(i.e.itsvaluesarein [, ])onP D, or equivalently on D. If for P andQinP,dP DanddQ D,then it isassumedthat(1.2.2) U(P, dQ)U(P, dP),that is, its better to make the right decisionthana wrong one. ValuesU(P, dQ) = are allowed, corresponding to the notion that a wrong decision could lead to ruin ordeathofthedecisionmakerandpossiblyothers.

    IfD={dP : P P} andU(P, dP)

  • 7/30/2019 c1s2

    3/6

    suchasdP. Instead, DcontainsprobabilitymeasuresonaspaceAofspecificactions. Ifx isobserved,andd(x) D isa lawx onA,onewillchooseanactiona Aatrandomaccording to x, then take the action a. If A = {dP, dQ}, a law on A is given by anumbery with0 y 1,wherey=(dQ) = 1 (dP).

    InSec.1.1,wesawthatrandomizedrulesgaveusadmissibletestsofallpossiblesizes,butalsothatforBayesrules itwasnotnecessarytorandomize(Remark1.1.9).Adecisionrulee(),whichmayberandomized,iscalledminimaxif itminimizesthemaximumrisk,ormathematically

    supr(,e) = infd() supr(,d),wheretheinfimum isoverallpossibledecisionrules,possiblyrandomized.

    Minimaxandrandomizedrulesarebothimportantinanothersubjectcloselyrelatedto decision theory,game theory. Here there are actions a A, parameters , and autility functionU(,a). Ingametheory,atanyrate inthebasic formof itbeingtreatedhere,thereisnosamplespaceX,andsonoobservationxnorlawsP. Also,anintelligentopponentcanchoose,possiblyevenknowingones(randomized)decisionrule,althoughnot ones specific action a. Thus, if there are minimax decision rules, then such rulesshouldbeused.

    Forexample,inthegameofscissors-stone-paper,eachoftwoplayershasavailablethreepossibleactions,scissors,stoneandpaper,andthetwotakeactionssimulta-neously. Here scissor beats paper, paper beats stone, and stone beats scissors. Supposethe winner of one round gains $1 and the loser loses $1. If both players take the sameaction, the outcome is a draw (no money changes hands). Then for repeated plays, oneshould use a randomized rule: if one always takes the same action, or any predictablesequenceofactions,theopponentcanalwayswin. Therandomizedruleofchoosingeachactionwithprobability1/3hasaveragewinnings0againstanystrategyoftheopponent,andthisstrategy isminimaxand istheuniqueminimaxstrategy: anyotherrandomizedrule, ifknownto theopponent,canbedefeatedandresult inanaveragenet loss(this isleftasaproblem).

    Even when minimax rules are not available, it may be that for each specific actiona A, there is a large loss L(,a) for some . Then, possibly, any non-randomizeddecisionrulechoosinga Ahasalargeriskforsome. Asintheinsurancebusiness,risksaboveacertainsizemaybeunacceptable,sothatonemayprefertochoosearandomizedruled()tokeepsupr(,d) frombeingtoo large. In insurance,unlikesimplerproblems,minimax seems too extreme a requirement: policies would only be sold if the buyerscouldbepersuadedtopaymoreinpremiumsthantheycouldeverreceiveinbenefits!

    Forameasurablespace(A,E)ofspecificactions, letDE bethesetofallprobabilitymeasureson(A,E). OnDE , let SE bethesmallest-algebraforwhichalltheevaluations (B)aremeasurableforB E. ForalossfunctionLon A, and DE , the loss atand willbe(1.2.3) L(,) := L(,a)d(a).

    AThenext factextendsthejointmeasurabilityofL fromsimpletorandomizedstrategies.Recallthat fortwomeasurablespaces(U,U) and (V, V),a functionF onU V iscalled

    3

  • 7/30/2019 c1s2

    4/6

    jointly measurableifandonlyifitismeasurablefortheproduct-algebraU V, which is thesmallest-algebraofsubsetsofU V containingallsetsB C forB UandC V.1.2.4 Proposition. If Lisnonnegativeandjointlymeasurableon A,thenasdefinedon DE by(1.2.3),it isjointlymeasurableinto [0,],forthe-algebraSE onDE .

    Iff isajointlymeasurablereal-valuedfunctiononAandWf isthesetofall(,)suchthatf(,) := f(,a)d(a)iswell-defined(not ),thenWf isameasurablesetfortheproduct-algebraand(,) f(,)isjointlymeasurableonitforT andSEinto [ ,].Proof. Measurabilityholds ifL= 1BC, where 1BC(,a) 1B()1C(a) for B TandC E,sincethenL(,) = 1B()(C),which ismeasurableforT SE . Next,anyfiniteunionW ofsetsBi Ci forBi TandCi E canbewrittenasafinite,disjointunionofsuchsets(RAP,Prop.3.2.2). Addingtheir indicators,wegettheresultforW.

    The set of all such W forms an algebra (RAP, Prop. 3.2.3). If Ln is a uniformlyboundedsequenceofmeasurablefunctionson A forwhichtheconclusionholds,andLnL or LnL, then it also holds for L. Since the smallest monotoneclass including analgebra is a -algebra (RAP, Theorem 4.4.2), the result holds for 1H for all H in theproduct-algebraT E. Then itholds forLsimple(afinite linearcombinationofsuch1H),thenforLnonnegativeandmeasurablebymonotoneconvergence(RAP,Prop.4.1.5andTheorem4.3.2).

    Thenforanymeasurablef on A,wewriteasusualf =f+ f wheref+ :=max(f, 0). ThenWf isthesetof(,)suchthat f+(,a)d(a) and f(,a)d(a) are notboth+,soit isaproductmeasurableset. ApplyingtheresultforL 0 to f+ andf, we get that on Wf, f(,) is a difference of two nonnegative measurable functions,notboth+,sothedifferenceisameasurablefunctioninto[ ,]. Remark. Althoughlossfunctionsareassumednonnegative,utilityfunctionscanbepos-itiveornegative,soProposition1.2.4canbeappliedwhenf isautilityfunction.

    Afunction : x x fromX intoDE willbecalledarandomized decision ruleifitismeasurablefrom(X, B) to (DE ,SE). Theriskof the rule for a given is

    r(,) := L(,x)dP(x),which isameasurable functionof. Thedefinitionsofadmissibleand inadmissiblerulesextenddirectlytoDE inplaceofA. Conversely,arandomizeddecisionrulemaybeviewedasanon-randomizedonewith the spaceDE in placeoftheactionspaceA, and the risk r(,)inplaceofthelossL(,a). So,letsjustconsiderordinarydecisionrulesa().Given a prior distribution on (,T), the riskof a decision rule a() is defined asr(a()) := L(,a(x))dP(x)d(). A decision rule a() will be called a Bayesrule ifit attains the minimum risk (for the given ) over all possible decision rules, and if thisminimumrisk isfinite.

    Evenfornon-Bayesianstatisticians,whodontbelievepriorsshouldbeusedinprac-tice, at least not in all cases, priors can be useful technically in showing that a decisionrule isadmissible:

    4

  • 7/30/2019 c1s2

    5/6

    1.2.5 Theorem. Supposea() isadecisionruleandforsomeprior on,a() is Bayes for and isuniqueBayes,meaningthat foranyotherBayesdecisionruleb() for , and all,a(x) = b(x) for P-almostallx. Then a() isadmissible.Proof. If there were a decision rule c() a(), then clearly c() is also Bayes, but forsome,c(x) =a(x)withpositiveprobabilityforP,acontradiction. 1.2.6 Theorem. Suppose the parameter space is countable. If a() is a decisionruleandthereisaprioron,positiveateverypoint,suchthata() is Bayes for , then a()isadmissible.Proof. Ifb()wereanotherdecisionrulewithba,thentheriskofb() for would besmallerthanthatofa(),acontradiction.

    A set C inaEuclideanspaceRk iscalledconvexifwheneverx,yC and0t1we have tx+ (1 t)yC.1.2.7 Proposition. Let (X,B)bethesamplespaceand(A,E)theactionspacewhereAisaconvex,BorelmeasurablesubsetofaEuclideanspaceRk withBorel-algebraE. Let 2 2a := (a + +ak)1/2. Let x x : X DE bearandomizeddecisionrule. Then1 B

    := {xX : a dx(a) for xB

    , where adx(a)

  • 7/30/2019 c1s2

    6/6

    5. Let P = N(,1)n onRn for < < . Let the action space beR and the lossfunction L(,a) = (a)2. Show that a decision rule (estimator) which is a linearfunction of X, d(x) = bX+c, is inadmissible if |b|>1 but admissible if b= 0. (It is admissibleforb= 1 and c=0,butthisproblemdoesntaskforaproofofthat.)

    NOTESTOSEC.1.2A classic reference on decision theory is Ferguson (1967). A more recent work is

    Berger (1980, 1985). Decision theory was also the subject of the last chapter in each oftwogeneraltextsbyother leadingstatisticians: Bickel andDoksum(1977)andCoxandHinkley (1974). In the second edition of Bickel and Doksum (2001), decision theory ismore integratedintothebook,beginningwiththefirstchapter,asitishere.

    REFERENCESBerger,JamesO.(1980,2ded.1985). Statistical Decision Theory. Springer,NewYork.Bickel,PeterJ.,andKjellA.Doksum(1977,2001). SeeSec.1.1.Cox,DavidR.,andDavidV.Hinkley(1974). Theoretical Statistics. ChapmanandHall,

    London.Ferguson,ThomasS.(1967). SeeSec.1.1.

    6