artificial intelligence - department of theoretical...
Post on 14-Aug-2019
216 Views
Preview:
TRANSCRIPT
ArtificialIntelligence
Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic
Rationaldecisions
Wearedesigningrationalagentsthatmaximizeexpectedutility.Probabilitytheoryisatoolfordealingwithdegreesofbelief(aboutworldstates,actioneffectsetc.).Now,weexploreutilitytheorytorepresentandreasonwithpreferences.
Finally,wecombinepreferences(asexpressedbyutilities)withprobabilitiesinthegeneraltheoryofrationaldecisions– decisiontheory.
Utilitytheory
Theagent‘spreferencesarecapturedbyautilityfunction,U(s),whichassignsasinglenumbertoexpressdesirabilityofastate.Theexpectedutilityofanactiongiventheevidenceisjusttheaveragevalueofoutcomes,weightedbytheirprobabilitiesEU(a|e)=∑s P(Result(a)=s|a,e)U(s)
Arationalagentshouldchoosetheactionthatmaximizes theagent’sexpectedutility(MEU)action=argmaxa EU(a|e)
TheMEUprincipleformalizesthegeneralnotionthattheagentshould“dotherightthing”,butweneedmakeitoperational.
Rationalpreferences
Frequently,itiseasierforanagenttoexpresspreferencesbetweenstates:
– A>B:theagentprefersAoverB– A<B:theagentprefersBoverA– A~B:theagentisindifferentbetweenAandB
WhatsortofthingsareAandB?– Theycouldbestatesofoftheworld,butmoreoftentannotthere
isuncertaintyaboutwhatisreallybeingoffered.– Wecanthinkofthesetofoutcomesforeachactionasalottery
(possibleoutcomesS1,…,Sn thatoccurwithprobabilitiesp1,…,pn)• [p1,S1;…;pn,Sn]
Anexampleoflottery(foodinairplanes)Chickenorpasta?
– [0.8,juicychicken;0.2,overcooked chicken]– [0.7,deliciouspasta;0.3,congealedpasta]
Propertiesofrationalpreferences
Rationalpreferencesshouldleadtomaximizingexpectedutility(iftheagentviolatesthemitwillexhibitpatentlyirrationalbehaviorinsomesituations.Werequireseveralconstraints(theaxiomsofutilitytheory)thatrationalpreferencesshouldobey.
– orderability:exactlyoneof(A>B)or(A<B) or(A~B)holds
– transitivity:(A<B)∧ (B<C)⇒ (A<C)
– continuity:(A>B>C)⇒ ∃p[p,A;1-p,C] ~B
– substitutability:A~B⇒ [p,A;1-p,C] ~ [p,B;1-p,C]
– monotonicity:A>B⇒ (p>q⇔ [p,A;1-p,B] >[q,A;1-q,B]
– decomposability:[p,A;1-p,[q,B;1-q,C]] ~ [p,A;(1-p)q,B;(1-p)(1-q),C]
Preferencesleadtoutility
Teh axiomsofutilitytheoryareaxiomsaboutpreferencesbutwecanderivethefollowingconsequencesformthem..
Existenceofutilityfunctionsuchthat:U(A)<U(B)⇔ A<BU(A)=U(B)⇔ A~B
Expectedutilityofalottery:U([p1,S1;…;pn,Sn] )=∑i pi U(Si)
Autilityfunctionexistsforanyrationalagentbutitisnotunique:
U‘(S)=aU(S)+bExistenceofautilityfunctiondoesnotnecessarilymeanthattheagentisexplicitlymaximizingthatutilityfunction.Byobservingitspreferencesanobservercanconstructtheutilityfunction(eveniftheagentdoesnotknowit).
Utilityfunctions
Utilityisafunctionthatmapsfromlotteriestorealnumbers.Wemustfirstworkoutwhattheagent‘sutilityfunctionis(preferenceelicitation).
– Wewillbelookingforanormalized utilityfunction.– Wefixtheutilityofa“bestpossibleprize”Smax to1,U(Smax)=1.
– Similarly,a“worstpossiblecatastrophe”Smin ismappedto0,U(Smin)=0.
– Now,toassesstheutilityofanyparticularprizeSweasktheagenttochoosebetweenSandastandard lottery[p, Smax;1-p, Smin]
– Theprobabilitypisadjusteduntiltheagentisindifferentbetween Sandthestandardlottery.
– ThentheutilityofSisgivenby,U(S)=p.
Theutilityofmoney
Universalexchangeabilityofmoneyforallkindsofgoodsandservicessuggeststhatmoneyplaysasignificantroleinhumanutilityfunctions.
– Anagentprefersmoremoney toless,allotherthingsbeingequal.
Butthisdoesnotmeanthatmoneybehavesasautilityfunction(becauseitsaysnothingaboutpreferencesbetweenlotteriesinvolvingmoney).Assumethatyouwonacompetitionandthehostoffersyouachoice:eitheryoucantakethe1mil.USDpriceoryoucangambleitontheflipofcoin.Ifthecoincomesupheads, youendupwithnothing, butifitcomesuptails,youget2.5mil.USD.Whatisyourchoice?
– Expectedmonetary valueofthegambleis1.250.000USD.– Mostpeopledeclinethegambleandpocketthemillion.Are
theybeingirrational?
the utility of money
Theutilityofmoney
Thedecisioninthepreviousgamedoesnotdependontheprizeonlybutalsoonthewealthoftheplayer!LetSn denoteastateofpossessingtotalwealthnUSD,andthecurrentwealthiskUSD.Thetheexpectedutilitiesoftwoactionsare:
– EU(Accept)=½U(Sk)+½U(Sk+2.500.000)– EU(Decline)=U(Sk+1.000.000)
SupposeweassignU(Sk)=5,U(Sk+1.000.000)=8,U(Sk+2.500.000)=9.Thentherationaldecisionwouldbetodecline!
risk-averse area (prefer a sure thing with a payoff less than the expected monetary value of a gamble)
risk-seeking behavior in this area
an agent that has a linear curve is said to be risk-neutral.
Humanjudgment(certaintyeffect)
Theevidencesuggeststhathumansare“predictableirrational“.Allaisparadox
– A:80%chanceof4000USD– B:100%chanceof3000USD
Whatisyourchoice?• MostpeopleconsistentlypreferBoverA(takingthesurething!)
– C:20%chanceof4000USD– D:25%chanceof3000USD
Whatisyourchoice?• MostpeoplepreferCoverD(higherexpectedmonetaryvalue)
Certaintyeffect– peoplearestronglyattractedtogainsthatarecertain
Humanjudgment(ambiguityaversion)
EllsbergparadoxTheurncontains1/3redballs,and2/3eitherblackoryellowballs.– A:100USDforaredball– B:100USDforablackball
Whatisyourchoice?• MostpeoplepreferAoverB(Agivesa1/3chanceofwinning,whileBcouldbeanywherebetween0and2/3)
– C:100USDforaredoryellowball– D:100USDforablackoryellowball
Whatisyourchoice?• MostpeoplepreferDoverC(Dgivesyoua2/3chance,whileCcouldanywherebetween1/3and3/3)
However,ifyouthinktherearemoreredthanblackballsthenyourshouldpreferAoverBandCoverD.
Ambiguityaversion– mostpeopleelecttheknownprobabilityratherthantheunknownunknown.
Humanjudgment
Framingeffect– theexactwordingofadecisionproblemcanhaveabigimpactontheagent‘schoices
– medicalprocedureAhas90%survivalrate– medicalprocedureBhas10%deathrate
Whatisyourchoice?• MostpeoplepreferAoverBthoughbothchoicesareidentical
Anchoringeffect– peoplefeelmorecomfortablemakingrelativeutilityjudgmentsratherthanabsoluteones
– Therestauranttakesadvantageofthisbyofferinga$200bottlethatitknowsnobodywillbuy,butwhichservestoskewupwardthecustomer’sestimateofthevalueofallwinesandmakethe$55bottleseemlikeabargain.
Multi-attributeutilitytheory
Inreal-lifetheoutcomesarecharacterizedbytwoormoreattributessuchascostandsafetyissues–multi-attributeutilitytheory.Wewillassumethathighervaluesofanattributecorrespondtohigherutilities.Thequestionishowtogetpreferencesformoreattributes– withoutcombiningtheattributevaluesintoasingleutilityvalue– dominance
– combiningtheattributevaluesintoasingleutilityvalue
Dominance
Ifanoptionisoflowervalueonallattributesthatsomeotheroption,itneednotbeconsideredfurther– strictdominance.
Strictdominancecanbedefinedforuncertainoutcomestoo.– ifallpossibleoutcomesofBstrictlydominateallpossibleoutcomesofA
Strictdominancewillprobablyoccurlessoftenthaninthedeterministiccase.
Stochasticdominance
Stochasticdominanceoccursmorefrequentlyinrealproblems.Itiseasiertounderstandinthecontextofasinglevariable.Stochasticdominance isbestseenbyexaminingthecumulativedistribution thatmeasurestheprobabilitythatthatthecostislessthanorequalanygivenamount(itintegratestheoriginaldistribution).
probability distribution cumulative distribution
Preferencestructure
Tospecifythecompleteutilityfunctionfornattributeseachhavingdvalues,weneeddnvaluesintheworstcase.– Thiscorrespondstoasituationinwhichagent‘spreferenceshavenoregularityatall.
Preferencesoftypicalagentshavemuchmorestructuresothetheutilityfunctioncanbeexpressedas:
U(x1,…,xn)=F[f1(x1),…,fn(xn)]
Preferencestructure(withoutuncertainty)
Thebasicregularityiscalledpreferenceindependence.TwoattributesX1 andX2 arepreferentiallyindependentofathirdattributeX3 ifthepreferencebetweenoutcomes� x1,x2,x3�and� x’1,x’2,x3�doesnotdependontheparticularvaluex3.Ifeachpairofattributesispreferentiallyindependentofanyotherattribute,wetalkaboutmutualpreferentialindependence(MPI).Ifattributesaremutuallypreferentiallyindependentthentheagent’spreferencebehaviorcanbedescribedasmaximizingthefunction:
U(x1,…,xn)=∑i Ui(xi)Avaluefunctionofthistypeiscalledanadditivevaluefunction.
Preferencestructure(withuncertainty)
Whenuncertaintyispresentweneedtoconsiderthestructureofpreferencesbetweenlotteries.Formutuallyutilityindependent(MUI)attributeswecanusemultiplicativeutilityfunction:U=k1U1 +k2U2 +k3U3+k1k2U1U2 +k2k3U2U3 +k1k3U1U3+k1k2k3U1U2U3
FornattributesexhibitingMUIwecanrepresenttheutilityfunctionusingnconstantsandnsingle-attributeutilities.
Thevalueofinformation
Sofarwehaveassumedthatallrelevantinformationisprovidedtotheagentbeforeitmakesitsdecision.Inpractice,thisishardlyeverthecase.Forexample,adoctorcannotexpecttobeprovidedwiththeresultofallpossiblediagnostictests.Oneofthemostimportantpartsofdecisionmakingisknowingwhatquestionstoask.
Wewillnowlookatinformationvaluetheory,whichenablesanagenttochoosewhichinformationtoacquire.
Thevalueofinformation(example)
Supposeanoilcompanyishopingtobuyoneofthenindistinguishableblocksofocean-drillingrights.Letusassumefurther thatexactlyoneoftheblockscontainoilworthCdollars,whileothersareworthless.TheaskingpriceofeachblockisC/n.
ExpectedmonetaryvalueofbuyingoneblockisC/n – C/n =0.Nowsuppose thataseismologistoffers thecompanytheresultofasurveyofonespecificblock,whichindicatesdefinitelywhethertheblockcontainsoil.
Howmuchshouldthecompanytopayforthatinformation?– Withprobability1/n,thesurveywillindicateoilinagivenblockandthethe
companywillbuyitandmakeaprofitC– C/n.– Withprobability(n-1)/n,thesurveywillshowthattheblockcontainsnooil,
inwhichcasethecompanywillbuyanotherblock.Nowtheprobabilityoffindingoilinthatotherblockis1/(n-1),sotheexpectedprofitisC/(n-1)–C/n.
– Togethertheexpectedprofitgiventhesurveyinformationis:1/n(C– C/n)+(n-1)/n(C/(n-1)– C/n)=C/n
ThereforethecompanyshouldbewillingtopaytheseismologistuptoC/n dollarsfortheinformation:theinformationisworthasmuchastheblockitself.
Thevalueofinformation(ageneralformula)
WeassumethatexactevidencecanbeobtainedaboutthevalueofsomerandomvariableEj – thisiscalledvalueofperfectinformation (VPI).Thevalueofthecurrentbestaction& (withtheinitialevidencee)isdefinedby:
EU(&|e)=maxa ∑s‘ P(Result(a)=s‘|a,e)U(s‘)
Thevalueofthebestaction&jk afterthenewevidenceEj =ejk isobtainedisdefinedby:
EU(&jk|e,Ej=ejk)=maxa∑s‘ P(Result(a)=s‘|a,e, Ej=ejk)U(s‘)
ButthevalueofEj iscurrentlyunknownsowemustaverageoverallpossiblevaluesthatwemightdiscoverforEj:
VPIe(Ej)=(∑k P(Ej = ejk|e)EU(&jk|e,Ej=ejk))- EU(&|e)
Thevalueofinformation(qualitatively)
Whenisitbeneficialtoobtainnewinformation?
Informationhasvaluetotheextendthat– itislikelytocauseachangeofaplanand– thenewplanwillbesignificantlybetterthattheoldplan.
Clear choice, the information is not needed
The choice is unclear and the information is crucial
The choice is unclear, the information is less valuable
Propertiesofthevalueofinformation
Isitpossibleforinformationtobedeleterious?Theexpectedvalueofinformationisnonnegative.∀e,Ej VPIe(Ej)≥ 0
Thevalueofinformationisnotadditive.VPIe(Ej,Ek)≠ VPIe(Ej)+VPIe(Ek)
Theexpectedvalueofinformationisorderindependent.
VPIe(Ej,Ek)=VPIe(Ej)+VPIe,ej(Ek)=VPIe(Ek)+VPIe,ek(Ej)
Informationgathering
Asensibleagentshould– askquestionsinareasonableorder– avoidaskingquestionsthatareirrelevant– takeintoaccounttheimportanceofeachpieceofinformationinrelation
itscost– stopaskingquestionswhenthatisappropriate
Weassumethatwitheachobservableevidence variableEj,thereisanassociatedcost,Cost(Ej).Information-gatheringagentcanselect(greedily)themostefficientobservationuntilnoobservationworthitscosts(myopicapproach)
© 2016 Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic
bartak@ktiml.mff.cuni.cz
top related