three ways to give a probability asignement a memory

Upload: yizuz-statrostklovich

Post on 03-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Three Ways to Give a Probability Asignement a Memory

    1/5

    rian Skyrms

    Three Ways to Give aProbability ssignment a Memory

    Consider a model of learning in which we update our probabilityass ignmentsby conditionalization; i.e., upon learning S, the probability ofnot-S is set at zero and the probabilities of statements entailing S areincreased by a factor of one over the initial probability of S. In such a model,thereis acertain peculiar senseinwhichweloseinformationevery timewelearn something. That is, we lose information concerning the initialrelativeprobabilities of statements not entailing S.

    The loss makes itselffelt invarious ways. Suppose that learningismeantto be corrigible. After condit ional iz ingon S, onemight wishto be able todecidethat this was an error and "deconditionalize."This is impossible ifthe requisite in format ion hasbeenlost. The missinginformat ion may alsohave other theoretical uses; e.g., in givingan accountof the warrantedassertabil i tyof subjunctive conditionals(Adams1975, 1976; Skyrms 1980,1981)or ingivinganexplicationof"evidenceEsupports hypothesisH"(seethe "paradox of old evidence" in Glymour 1980).

    It isthereforeofsome interest toconsider thewaysinwhich probabilityassignments can be given a memory. Here are three of them.I.MakeLikeanOrdinal(Tait's Suggestion)1: A probability assignment willnow assigneachproposition (measurable set) anorderedpair instead of asingle number. The second member of the ordered pair will be theprobability; the first member willbe the memory.Tomake the memoryworkproperly, weaugmentthe ruleofcond itionalization. Upon learningP, we put the current assignment into memory, and put the resultofconditionalizingon P as the second component of theorderedpairs in thenew distribution.Thatis, if thepair assignedto apropositionby the initialdistributionis (x, y),then thepair assignedby the finaldistributionis((x,y),z) ,where z is the finalprobabilityofthat proposition gotten byconditional-izingon P. (If P hasinitial probability zero,we go to the closest state in

    157

  • 8/12/2019 Three Ways to Give a Probability Asignement a Memory

    2/5

    158 Brian Skyrmswhich it has positive probability todeterminethe ratios offinal probabili-ties for propositions that entail P.)

    This suggestion gives probab ility assignm ents a perfect memory.From apractical viewpoint, th epricethat ispaid consistsin theenormou s amountofdetail that isbuilt intoanassignmentfor arelativelyo ldlearning systemofthis kind , and the consequent costs in terms ofcapacityof the system.II. Don t Quite Conditionalize (Probabili ty Kinem atics with or withoutInfinitesimals): Upon learning P, one might not quite give P probabilityone, but instead retain anitty-bitty portionofprobabilityfor itsnegation,distributing that portion among the propositions that entail not-P inproportion to their prior probabilities. There are two versions of thisstrategy, depending on whether the itty-bitty portion is a positive realmagnitude or aninfinitesimal one. Let usconsider th e first alternative. Itshould be noted that this m ay simp ly be a mo re realistic model of learningfor some circumstances. But I am concerned here only with itsvalueas amemory device. As such it has certain drawbacks. In the first place, theitty-bitty probabilities used as memory might be hard to distinguishfromgenuinely small current probabilities. In the second place, we get atbestshort-term memory.Aftera fewlearning episodes wh erewe learn PT . . .P n , th e information as to the relative initial values of propositions thatentailed not-Pi & .. . & not-Pn ishopelessly lost.O n the other hand,w ehave used nomachinery over and above the prob ability assig nm ent. Thisgives us a cheap, dirty, short-term mem ory .

    The drawbacks disapp ear if we m ake the itty-bitty po rtion infinitesimal.The dev elopm ent of nonstan dard analysis allows us to pu rsu e this possibil-ity in good mathematicalconscience.2Thereis no danger ofconfusing aninfinitesimal w ith a standard num be r. F urtherm ore, we can uti l izeordersofinfinitesimals toimplementlong term -m em ory . (Two nonstand ard realsare of the sameorder iftheirquotient isfinite.)Thereissome arbitrarinessabout how toproceed, because there is no largest order of infinitesimals.( B u t , of course, arbi t rar iness is already present in the choice of anonstandard model of analysis). Pick some order of infinitesimals tofunction as the largest work ing ord er. Pick some infinitesim al i of thatorder. On learning P, we update by probability kinematics on P; not-P,giving not-P final probabili ty i. Successive updatings do not destroyinformation, but instead push it down to smaller orders ofinfinitesimals.For instance, if we now learn Q, th e information as to the relative

  • 8/12/2019 Three Ways to Give a Probability Asignement a Memory

    3/5

    G I V E A P R O B A B IL IT Y A S S I G N M E N T A M E M O R Y .Z59

    ma gnitu de of the initial pro babilities of pro po sitions that entail not-P &not-Q lives in infinitesimals of the order i2.

    This strategyofprobabili ty kinem atics withinfinitesimalsgives probabil-itydist r ibut ionsa mem ory that isalmostasgood as that supplied by Tait'ssuggestion.3It has the advantage of acertain theoretical sim plic ity. A gainit is only the prob ability as signm ent that is doing the w ork. M em or y isimplicit rather than something tacked on. This theoretical simplicityisboughtat thepriceoftakingthe rangeof the probabilityfunctionto benon-A rchimedian inawaythat redu ces consideration of the practical exemplif i-cation of the model to the status of ajoke.III . Keep a Diary: O ur system could start with a given probabili tydistr ibution, andinsteadofcontinu ally up dating, simp ly keep tracko fwhatithas learne d. A t any stage of the gam e its cu rren t p roba bility distr ibu tionwill be encoded as a pair whose first member is the or ig inal priordistr ibut ion, and whose second member is the total evidence todate. If itneeds a cu rre nt pr obab ility, it com pu tes it by cond itionalization on its totalevidence. Su ch aCarnapian system has i ts me mory stru ctur ed in a waythat makes error correction a particularly simple processone simplydeletes from the total evidence.

    Information storage capacity isstillaproblem for an old Carnapian robot,although the problem is certainly no worse than on the two preceedingsuggestions.A noth er problem is choice of the approp riate p rior , providingwe do not believe that rationality dictates a unique choice.

    In certain tractible cases, however, this storage problem is greatlysimplified by the existence of sufficient stat ist ics4 Su ppo se I am o bservinga Bernoulli process, e.g., a series of independent flips of a coin withunknow n bias. Each experiment or observation willconsistofrecordingthe outcome ofsome finite n u m b e r of flips. Now instead ofwri t ing downthe whole outcome sequence for each exper iment , I can sum m ar ize theexper iment byw riting down (1) the n u m b e r oftrialsand (2) the n u m b e rofheads observed. Theorderedpairof(1)and (2) is asufficient statisticfor theexper iment . Condit ioning on th is summaryof the exper iment isguaran-teed to give you the sam e resu lts as con dition ing on the full description ofthe exper iment . Where we have sufficient statistics, we can save onme mo ry capacity byrelyingonstatist ical su m m ariesofexp er iments ra therthan exhaustive descriptions of them.

    In our example, we can do even better. Instead of w ritin g down an

  • 8/12/2019 Three Ways to Give a Probability Asignement a Memory

    4/5

    160 Brian Skyrmsordered pair (x, y) for each trial, we can summar iz eatotalityof n trialsbywriting down a single ordered pair.

    W e havehereasufficient statisticaffixed dimension,whichcan be gottenby component-by-component addition from sufficient statistics for theindividualexperiments . (Suchsufficient statisticsoffixeddimens ioncan beshownunder certain regularity conditionsto exist if and only if thecomm on densi tyof the individual outcomes is ofexponentialform.) Wheresuch sufficient statistics exist, we need only store one vector of fixeddimension as a s u m m a r y of our evidence.

    The existence of sufficient statistics of fixed dimens ion can also throwsome lighton the other problem, th e choiceof anappropriate pr ior .In ourexample, thepriorcan berepresenteda s aprobability distribu tion over th ebias of the coin; the actual physical probability of heads. De note thisparameter by"w."Suppose thatitsprior d is tr ibutionis abeta distribu tion;i .e . , for some a. and 3 greater than zero, th e prior probability density isproportional to w01"^ - w)^ 1. Then th e poster ior dis tr ibutionof wwillalso be a beta d is tr ibution. Fur therm ore, the poster ior dis t r ibut ion of wdepends on the prior dis tr ibution and the s u m m a r yof the evidence in anexceptionally simpleway.Rem ember ing thatx is the n u m b e roftrialsand yis the n u m b e r ofheads, we see that the posterior beta distributionhasparametersa andP',wherea = a +y and(3 ' = P + x - y. Thefamilyofbetad is tr ibutionsiscalled aconjugate family ofpriorsforrandom samplesfrom aB ernoulli distribution. It can be shown that whenever th e observa-tionsare drawn from afamily ofdis tr ibutionsfo rwhich there is a sufficientstatisticoffixeddimension, thereexistsacorrespondingfamilyofconjugatepriors. (There are conjugate priors forfamiliarand ub iqu itous dis tr ibutionssuch as Poisson, No rm al, etc.) R andom sam plin g, where the observation isdrawn from an exponential family and where the prior is a m em ber of theconjugate family, offers the u l t imatesimplification indata storageand dataprocessing. The diaryneedonly include the family of priors, the p aram e-ters of the p rior, and the c ur ren t v alu e of the sufficient statistic of fixeddimension. For these reasons, Raiffa and Schlaifer (1961) recommend, inthe case ofvague knowledge ofpriors, to chooseam e m b e r of the relevantfamily of conjugate priors that fits reasonably well.

  • 8/12/2019 Three Ways to Give a Probability Asignement a Memory

    5/5

    G I V E A P R O B A B IL IT Y A S S I G N M E N TA M E M O R Y 1 61

    We are not alwaysin such anice si tuation ,where sufficient statisticsdosomuch workfor us; but the range ofcases cov ered orapproximated by theexponent ia lfamil ies is not inconsiderable. In these cases, keeping a diary(inshor thand)is not ashopeless astrategyfor aquasi-Carnapian robot as itmight first appear .

    O ne might wonder whether these techniques of diary-keeping havesome application to an Austinian robot which, on principled gro un ds,never learns anything with probabil i ty one. I think that they do, but thisquestion goes outside th e scope of thi s no te. (See Fie ld 1978, Skyrms1980b, and Skyrm s for thcom ing) .

    Notes1. Proposed by Bill Tait in conversa t ion with myself and B rian Ell is .2. For a thu m bn ail sketch see appen dix 4 of Sky rm s (1980a). For details see the refer ence slisted there.3 . There is this difference. Suppose we think that we learn P, but then decide it was amistake and in fact learn not-P. On the inf in i tes imal app roach traces of the mis take arewiped out, while on Tait 's suggestion they remain on the record.4. On the Bayesian conception ofsu ff iciency, sufficient statisticso f fixed d i me n s i o n , andconjugate priors, see Raiffa and Schlaifer (1961).

    ReferencesAdams , E. 1975. The Logic of Conditionals. Dord recht, Reidel.. 1976. Prior Probabilities and C oun ter l actua lC ondit ionals . InFoundations of ProbabilityTheory Statistical Inference and Statistical Theories of Science, ed. W. Harpe r and C.

    Hooker . Dordrecht : D . Reidel .Field H. 1978. A N o t e on Jeffrey Condi t ional izat ion . Philosophy of Science 45: 171-85.G ly mo n r C. 1980. Theory and Evidence. Pr inceton: Pr inceton Un iver s i ty Press .Raiffa, H. and Schlaifer , R. 1961. Applied Statistical Decision Theory. Cambr idge , Mass . :Harvard B usine ss School. Paperback ed. Cam bridge , Mass . : MIT Press, 1968.Skyrms, B . 198()a. Causal Necessity. N ew H a v e n , C o n n . : Yale Un iver s i ty Press.Skyrms , B . 198()b. H i g he rOrder DegreesofB elief. In Prospects for Pragmatism, ed. D. H.M ellor . Cambridge , En glan d: Cam bridge U n i v e r s i t y Press.Skyrms, B . 1981. The Pr ior Prop ens i ty AccountofSubjunc t ive Condi t iona ls .I n Ifs ed. W.Harper , R. Sta lnaker , and G. Pearce. Dordrecht : Reidel .Skyrms, B. For thcoming. M axim um En trop y Inference as a Specia lCaseof Conditionaliza-tion. Synthese.