artificial intelligence - department of theoretical...

ArtificialIntelligence

Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

Rationaldecisions

Wearedesigningrationalagentsthatmaximizeexpectedutility.Probabilitytheoryisatoolfordealingwithdegreesofbelief(aboutworldstates,actioneffectsetc.).Now,weexploreutilitytheorytorepresentandreasonwithpreferences.

Finally,wecombinepreferences(asexpressedbyutilities)withprobabilitiesinthegeneraltheoryofrationaldecisions– decisiontheory.

Utilitytheory

Theagent‘spreferencesarecapturedbyautilityfunction,U(s),whichassignsasinglenumbertoexpressdesirabilityofastate.Theexpectedutilityofanactiongiventheevidenceisjusttheaveragevalueofoutcomes,weightedbytheirprobabilitiesEU(a|e)=∑s P(Result(a)=s|a,e)U(s)

Arationalagentshouldchoosetheactionthatmaximizes theagent’sexpectedutility(MEU)action=argmaxa EU(a|e)

TheMEUprincipleformalizesthegeneralnotionthattheagentshould“dotherightthing”,butweneedmakeitoperational.

Rationalpreferences

Frequently,itiseasierforanagenttoexpresspreferencesbetweenstates:

– A>B:theagentprefersAoverB– A<B:theagentprefersBoverA– A~B:theagentisindifferentbetweenAandB

WhatsortofthingsareAandB?– Theycouldbestatesofoftheworld,butmoreoftentannotthere

isuncertaintyaboutwhatisreallybeingoffered.– Wecanthinkofthesetofoutcomesforeachactionasalottery

(possibleoutcomesS1,…,Sn thatoccurwithprobabilitiesp1,…,pn)• [p1,S1;…;pn,Sn]

Anexampleoflottery(foodinairplanes)Chickenorpasta?

– [0.8,juicychicken;0.2,overcooked chicken]– [0.7,deliciouspasta;0.3,congealedpasta]

Propertiesofrationalpreferences

Rationalpreferencesshouldleadtomaximizingexpectedutility(iftheagentviolatesthemitwillexhibitpatentlyirrationalbehaviorinsomesituations.Werequireseveralconstraints(theaxiomsofutilitytheory)thatrationalpreferencesshouldobey.

– orderability:exactlyoneof(A>B)or(A<B) or(A~B)holds

– transitivity:(A<B)∧ (B<C)⇒ (A<C)

– continuity:(A>B>C)⇒ ∃p[p,A;1-p,C] ~B

– substitutability:A~B⇒ [p,A;1-p,C] ~ [p,B;1-p,C]

– monotonicity:A>B⇒ (p>q⇔ [p,A;1-p,B] >[q,A;1-q,B]

– decomposability:[p,A;1-p,[q,B;1-q,C]] ~ [p,A;(1-p)q,B;(1-p)(1-q),C]

Preferencesleadtoutility

Teh axiomsofutilitytheoryareaxiomsaboutpreferencesbutwecanderivethefollowingconsequencesformthem..

Existenceofutilityfunctionsuchthat:U(A)<U(B)⇔ A<BU(A)=U(B)⇔ A~B

Expectedutilityofalottery:U([p1,S1;…;pn,Sn] )=∑i pi U(Si)

Autilityfunctionexistsforanyrationalagentbutitisnotunique:

U‘(S)=aU(S)+bExistenceofautilityfunctiondoesnotnecessarilymeanthattheagentisexplicitlymaximizingthatutilityfunction.Byobservingitspreferencesanobservercanconstructtheutilityfunction(eveniftheagentdoesnotknowit).

Utilityfunctions

Utilityisafunctionthatmapsfromlotteriestorealnumbers.Wemustfirstworkoutwhattheagent‘sutilityfunctionis(preferenceelicitation).

– Wewillbelookingforanormalized utilityfunction.– Wefixtheutilityofa“bestpossibleprize”Smax to1,U(Smax)=1.

– Similarly,a“worstpossiblecatastrophe”Smin ismappedto0,U(Smin)=0.

– Now,toassesstheutilityofanyparticularprizeSweasktheagenttochoosebetweenSandastandard lottery[p, Smax;1-p, Smin]

– Theprobabilitypisadjusteduntiltheagentisindifferentbetween Sandthestandardlottery.

– ThentheutilityofSisgivenby,U(S)=p.

Theutilityofmoney

Universalexchangeabilityofmoneyforallkindsofgoodsandservicessuggeststhatmoneyplaysasignificantroleinhumanutilityfunctions.

– Anagentprefersmoremoney toless,allotherthingsbeingequal.

Butthisdoesnotmeanthatmoneybehavesasautilityfunction(becauseitsaysnothingaboutpreferencesbetweenlotteriesinvolvingmoney).Assumethatyouwonacompetitionandthehostoffersyouachoice:eitheryoucantakethe1mil.USDpriceoryoucangambleitontheflipofcoin.Ifthecoincomesupheads, youendupwithnothing, butifitcomesuptails,youget2.5mil.USD.Whatisyourchoice?

– Expectedmonetary valueofthegambleis1.250.000USD.– Mostpeopledeclinethegambleandpocketthemillion.Are

theybeingirrational?

the utility of money

Theutilityofmoney

Thedecisioninthepreviousgamedoesnotdependontheprizeonlybutalsoonthewealthoftheplayer!LetSn denoteastateofpossessingtotalwealthnUSD,andthecurrentwealthiskUSD.Thetheexpectedutilitiesoftwoactionsare:

– EU(Accept)=½U(Sk)+½U(Sk+2.500.000)– EU(Decline)=U(Sk+1.000.000)

SupposeweassignU(Sk)=5,U(Sk+1.000.000)=8,U(Sk+2.500.000)=9.Thentherationaldecisionwouldbetodecline!

risk-averse area (prefer a sure thing with a payoff less than the expected monetary value of a gamble)

risk-seeking behavior in this area

an agent that has a linear curve is said to be risk-neutral.

Humanjudgment(certaintyeffect)

Theevidencesuggeststhathumansare“predictableirrational“.Allaisparadox

– A:80%chanceof4000USD– B:100%chanceof3000USD

Whatisyourchoice?• MostpeopleconsistentlypreferBoverA(takingthesurething!)

– C:20%chanceof4000USD– D:25%chanceof3000USD

Whatisyourchoice?• MostpeoplepreferCoverD(higherexpectedmonetaryvalue)

Certaintyeffect– peoplearestronglyattractedtogainsthatarecertain

Humanjudgment(ambiguityaversion)

EllsbergparadoxTheurncontains1/3redballs,and2/3eitherblackoryellowballs.– A:100USDforaredball– B:100USDforablackball

Whatisyourchoice?• MostpeoplepreferAoverB(Agivesa1/3chanceofwinning,whileBcouldbeanywherebetween0and2/3)

– C:100USDforaredoryellowball– D:100USDforablackoryellowball

Whatisyourchoice?• MostpeoplepreferDoverC(Dgivesyoua2/3chance,whileCcouldanywherebetween1/3and3/3)

However,ifyouthinktherearemoreredthanblackballsthenyourshouldpreferAoverBandCoverD.

Ambiguityaversion– mostpeopleelecttheknownprobabilityratherthantheunknownunknown.

Humanjudgment

Framingeffect– theexactwordingofadecisionproblemcanhaveabigimpactontheagent‘schoices

– medicalprocedureAhas90%survivalrate– medicalprocedureBhas10%deathrate

Whatisyourchoice?• MostpeoplepreferAoverBthoughbothchoicesareidentical

Anchoringeffect– peoplefeelmorecomfortablemakingrelativeutilityjudgmentsratherthanabsoluteones

– Therestauranttakesadvantageofthisbyofferinga$200bottlethatitknowsnobodywillbuy,butwhichservestoskewupwardthecustomer’sestimateofthevalueofallwinesandmakethe$55bottleseemlikeabargain.

Multi-attributeutilitytheory

Inreal-lifetheoutcomesarecharacterizedbytwoormoreattributessuchascostandsafetyissues–multi-attributeutilitytheory.Wewillassumethathighervaluesofanattributecorrespondtohigherutilities.Thequestionishowtogetpreferencesformoreattributes– withoutcombiningtheattributevaluesintoasingleutilityvalue– dominance

– combiningtheattributevaluesintoasingleutilityvalue

Dominance

Ifanoptionisoflowervalueonallattributesthatsomeotheroption,itneednotbeconsideredfurther– strictdominance.

Strictdominancecanbedefinedforuncertainoutcomestoo.– ifallpossibleoutcomesofBstrictlydominateallpossibleoutcomesofA

Strictdominancewillprobablyoccurlessoftenthaninthedeterministiccase.

Stochasticdominance

Stochasticdominanceoccursmorefrequentlyinrealproblems.Itiseasiertounderstandinthecontextofasinglevariable.Stochasticdominance isbestseenbyexaminingthecumulativedistribution thatmeasurestheprobabilitythatthatthecostislessthanorequalanygivenamount(itintegratestheoriginaldistribution).

probability distribution cumulative distribution

Preferencestructure

Tospecifythecompleteutilityfunctionfornattributeseachhavingdvalues,weneeddnvaluesintheworstcase.– Thiscorrespondstoasituationinwhichagent‘spreferenceshavenoregularityatall.

Preferencesoftypicalagentshavemuchmorestructuresothetheutilityfunctioncanbeexpressedas:

U(x1,…,xn)=F[f1(x1),…,fn(xn)]

Preferencestructure(withoutuncertainty)

Thebasicregularityiscalledpreferenceindependence.TwoattributesX1 andX2 arepreferentiallyindependentofathirdattributeX3 ifthepreferencebetweenoutcomes� x1,x2,x3�and� x’1,x’2,x3�doesnotdependontheparticularvaluex3.Ifeachpairofattributesispreferentiallyindependentofanyotherattribute,wetalkaboutmutualpreferentialindependence(MPI).Ifattributesaremutuallypreferentiallyindependentthentheagent’spreferencebehaviorcanbedescribedasmaximizingthefunction:

U(x1,…,xn)=∑i Ui(xi)Avaluefunctionofthistypeiscalledanadditivevaluefunction.

Preferencestructure(withuncertainty)

Whenuncertaintyispresentweneedtoconsiderthestructureofpreferencesbetweenlotteries.Formutuallyutilityindependent(MUI)attributeswecanusemultiplicativeutilityfunction:U=k1U1 +k2U2 +k3U3+k1k2U1U2 +k2k3U2U3 +k1k3U1U3+k1k2k3U1U2U3

FornattributesexhibitingMUIwecanrepresenttheutilityfunctionusingnconstantsandnsingle-attributeutilities.

Thevalueofinformation

Sofarwehaveassumedthatallrelevantinformationisprovidedtotheagentbeforeitmakesitsdecision.Inpractice,thisishardlyeverthecase.Forexample,adoctorcannotexpecttobeprovidedwiththeresultofallpossiblediagnostictests.Oneofthemostimportantpartsofdecisionmakingisknowingwhatquestionstoask.

Wewillnowlookatinformationvaluetheory,whichenablesanagenttochoosewhichinformationtoacquire.

Thevalueofinformation(example)

Supposeanoilcompanyishopingtobuyoneofthenindistinguishableblocksofocean-drillingrights.Letusassumefurther thatexactlyoneoftheblockscontainoilworthCdollars,whileothersareworthless.TheaskingpriceofeachblockisC/n.

ExpectedmonetaryvalueofbuyingoneblockisC/n – C/n =0.Nowsuppose thataseismologistoffers thecompanytheresultofasurveyofonespecificblock,whichindicatesdefinitelywhethertheblockcontainsoil.

Howmuchshouldthecompanytopayforthatinformation?– Withprobability1/n,thesurveywillindicateoilinagivenblockandthethe

companywillbuyitandmakeaprofitC– C/n.– Withprobability(n-1)/n,thesurveywillshowthattheblockcontainsnooil,

inwhichcasethecompanywillbuyanotherblock.Nowtheprobabilityoffindingoilinthatotherblockis1/(n-1),sotheexpectedprofitisC/(n-1)–C/n.

– Togethertheexpectedprofitgiventhesurveyinformationis:1/n(C– C/n)+(n-1)/n(C/(n-1)– C/n)=C/n

ThereforethecompanyshouldbewillingtopaytheseismologistuptoC/n dollarsfortheinformation:theinformationisworthasmuchastheblockitself.

Thevalueofinformation(ageneralformula)

WeassumethatexactevidencecanbeobtainedaboutthevalueofsomerandomvariableEj – thisiscalledvalueofperfectinformation (VPI).Thevalueofthecurrentbestaction& (withtheinitialevidencee)isdefinedby:

EU(&|e)=maxa ∑s‘ P(Result(a)=s‘|a,e)U(s‘)

Thevalueofthebestaction&jk afterthenewevidenceEj =ejk isobtainedisdefinedby:

EU(&jk|e,Ej=ejk)=maxa∑s‘ P(Result(a)=s‘|a,e, Ej=ejk)U(s‘)

ButthevalueofEj iscurrentlyunknownsowemustaverageoverallpossiblevaluesthatwemightdiscoverforEj:

VPIe(Ej)=(∑k P(Ej = ejk|e)EU(&jk|e,Ej=ejk))- EU(&|e)

Thevalueofinformation(qualitatively)

Whenisitbeneficialtoobtainnewinformation?

Informationhasvaluetotheextendthat– itislikelytocauseachangeofaplanand– thenewplanwillbesignificantlybetterthattheoldplan.

Clear choice, the information is not needed

The choice is unclear and the information is crucial

The choice is unclear, the information is less valuable

Propertiesofthevalueofinformation

Isitpossibleforinformationtobedeleterious?Theexpectedvalueofinformationisnonnegative.∀e,Ej VPIe(Ej)≥ 0

Thevalueofinformationisnotadditive.VPIe(Ej,Ek)≠ VPIe(Ej)+VPIe(Ek)

Theexpectedvalueofinformationisorderindependent.

VPIe(Ej,Ek)=VPIe(Ej)+VPIe,ej(Ek)=VPIe(Ek)+VPIe,ek(Ej)

Informationgathering

Asensibleagentshould– askquestionsinareasonableorder– avoidaskingquestionsthatareirrelevant– takeintoaccounttheimportanceofeachpieceofinformationinrelation

itscost– stopaskingquestionswhenthatisappropriate

Weassumethatwitheachobservableevidence variableEj,thereisanassociatedcost,Cost(Ej).Information-gatheringagentcanselect(greedily)themostefficientobservationuntilnoobservationworthitscosts(myopicapproach)

bartak@ktiml.mff.cuni.cz

artificial intelligence - department of theoretical...

Documents

theoretical immunology - theoretical biology &...

artificial and bio artificial liver

theoretical studies on photophysics and photochemistry of...

artificial life and artificial intelligence

artificial skin and artificial cartilage

artificial...

mineral potential mapping using artificial neural networks...

oxford foundation for theoretical neuroscience and ... ·...

theoretical basis of supply management: theoretical and...

constraint (logic) programming - ijcai...

theoretical studies of biexciton binding energy in cdse/cdte...

alan turing - math genius, father of theoretical computer...

b111 artificial insemination. lesson outline advantages of...

deep learning, echo state networks and the edge of chaos...

confused about theoretical sampling? engaging theoretical

artificial intelligence -...

optical lattices and artificial gauge potentials...part 1...

artificial morality and artificial law

theoretical studies of biexciton binding energy in cdse...

seminar on artificial intelligence...