cs 343: artificial intelligencesniekum/classes/343-f18/lectures/lectur… · cs 343: artificial...
TRANSCRIPT
![Page 1: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/1.jpg)
CS343:ArtificialIntelligenceDecisionNetworksandValueofPerfectInformation
Prof.ScottNiekum—TheUniversityofTexasatAustin[TheseslidesbasedonthoseofDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]
![Page 2: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/2.jpg)
DecisionNetworks
![Page 3: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/3.jpg)
DecisionNetworks
Weather
Forecast
Umbrella
U
![Page 4: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/4.jpg)
DecisionNetworks
▪ MEU:choosetheactionwhichmaximizestheexpectedutilitygiventheevidence
Weather
Forecast
Umbrella
U
▪ Candirectlyoperationalizethiswithdecisionnetworks▪ Bayesnetswithnodesforutilityandactions▪ Letsuscalculatetheexpectedutilityforeachaction
▪ Newnodetypes:
▪ Chancenodes(justlikeBNs)
▪ Actions(rectangles,cannothaveparents,actasobservedevidence)
▪ Utilitynode(diamond,dependsonactionandchancenodes)
![Page 5: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/5.jpg)
DecisionNetworks
Weather
Forecast
Umbrella
U
▪ Actionselection
▪ Instantiateallevidence
▪ Setactionnode(s)eachpossibleway
▪ Calculateposteriorforallparentsofutilitynode,giventheevidence
▪ Calculateexpectedutilityforeachaction
▪ Choosemaximizingaction
![Page 6: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/6.jpg)
DecisionNetworks
Weather
Umbrella
U
W P(W)
sun 0.7
rain 0.3
Umbrella=leave
Umbrella=take
Optimaldecision=leave
A W U(A,W)
leave sun 100
leave rain 0
take sun 20
take rain 70
![Page 7: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/7.jpg)
DecisionsasOutcomeTrees
▪ Almostexactlylikeexpectimax/MDPs
▪ What’schanged?
U(t,s)
Weather|{} Weather|{}
takeleave
{}
sun
U(t,r)
rain
U(l,s) U(l,r)
rainsunWeather
Umbrella
U
![Page 8: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/8.jpg)
Example:DecisionNetworks
Weather
Forecast=bad
Umbrella
U
A W U(A,W)
leave sun 100
leave rain 0
take sun 20
take rain 70
W P(W|F=bad)
sun 0.34
rain 0.66
Umbrella=leave
Umbrella=take
Optimaldecision=take
![Page 9: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/9.jpg)
DecisionsasOutcomeTrees
U(t,s)
W|{b} W|{b}
takeleave
sun
U(t,r)
rain
U(l,s) U(l,r)
rainsun
{b}
Weather
Forecast=bad
Umbrella
U
![Page 10: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/10.jpg)
GhostbustersDecisionNetwork
GhostLocation
Sensor(1,1)
Bust
U
Sensor(1,2) Sensor(1,3) Sensor(1,n)
Sensor(2,1)
Sensor(m,1) Sensor(m,n)…
…
…
…
![Page 11: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/11.jpg)
Ghostbusters—Wheretomeasure?
![Page 12: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/12.jpg)
ValueofInformation
![Page 13: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/13.jpg)
ValueofInformation
▪ Idea:computevalueofacquiringevidence▪ Canbedonedirectlyfromdecisionnetwork
▪ Example:buyingoildrillingrights▪ TwoblocksAandB,exactlyonehasoil,worthk▪ Youcandrillinonelocation▪ Priorprobabilities0.5each,&mutuallyexclusive▪ DrillingineitherAorBhasEU=k/2,MEU=k/2
▪ Question:what’sthevalueofinformationofO?▪ ValueofknowingwhichofAorBhasoil▪ ValueisexpectedgaininMEUfromnewinfo▪ Surveymaysay“oilina”or“oilinb,”prob0.5each▪ IfweknowOilLoc,MEUisk(eitherway)▪ GaininMEUfromknowingOilLoc?▪ VPI(OilLoc)=k/2▪ Fairpriceofinformation:k/2
OilLoc
DrillLoc
U
D O U
a a k
a b 0
b a 0
b b k
O P
a 1/2
b 1/2
![Page 14: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/14.jpg)
VPIExample:Weather
Weather
Forecast
Umbrella
U
A W U
leave sun 100
leave rain 0
take sun 20
take rain 70
MEUwithnoevidence
MEUifforecastisbad
MEUifforecastisgood
F P(F)
good 0.59
bad 0.41
Forecastdistribution
W P(W)
sun 0.7
rain 0.3
W P(W|F=bad)
sun 0.34
rain 0.66
W P(W|F=good)
sun 0.95
rain 0.05
![Page 15: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/15.jpg)
ValueofInformation
▪ AssumewehaveevidenceE=e.Valueifweactnow:
▪ AssumeweseethatE’=e’.Valueifweactthen:
▪ BUTE’isarandomvariablewhosevalueis unknown,sowedon’tknowwhate’willbe
▪ ExpectedvalueifE’isrevealedandthenweact:
▪ Valueofinformation:howmuchMEUgoesupbyrevealingE’firstthenacting,overactingnow:
![Page 16: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/16.jpg)
VPIProperties
▪ Nonnegative
▪ NonadditiveTypically(butnotalways):
▪ Order-independent
![Page 17: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/17.jpg)
QuickVPIQuestions
▪ Thesoupofthedayiseitherclamchowderorsplitpea,butyouwouldn’tordereitherone.What’sthevalueofknowingwhichitis?
▪ Therearetwokindsofplasticforksatapicnic.Onekindisslightlysturdier.What’sthevalueofknowingwhich?
▪ You’replayingthelottery.Theprizewillbe$0or$100.Youcanplayanynumberbetween1and100(chanceofwinningis1%).Whatisthevalueofknowingthewinningnumber?
![Page 18: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/18.jpg)
ValueofImperfectInformation?
▪ Nosuchthing
▪ Informationcorrespondstotheobservationofanodeinthedecisionnetwork
▪ Ifdatais“noisy”thatjustmeanswedon’tobservetheoriginalvariable,butanothervariablewhichisanoisyversionoftheoriginalone
![Page 19: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/19.jpg)
VPIQuestion
▪ VPI(OilLoc)=k/2
▪ VPI(ScoutingReport)?
▪ VPI(Scout)?
▪ VPI(Scout|ScoutingReport)?
OilLoc
DrillLoc
U
ScoutingReport
Scout
![Page 20: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/20.jpg)
POMDPs
![Page 21: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/21.jpg)
POMDPs
▪ MDPshave:▪ StatesS▪ ActionsA▪ TransitionfunctionP(s’|s,a)(orT(s,a,s’))▪ RewardsR(s,a,s’)
▪ POMDPsadd:▪ ObservationsO▪ ObservationfunctionP(o|s)(orO(s,o))
▪ POMDPsareMDPsoverbelief statesb(distributionsoverS)
a
s
s,a
s,a,s’s’
a
b
b,a
ob’
![Page 22: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/22.jpg)
Example:Ghostbusters
▪ In(static)Ghostbusters:▪ Beliefstatedeterminedbyevidence
todate{e}▪ Treereallyoverevidencesets▪ Probabilisticreasoningneededto
predictwhatnewevidencewillbegained,givenpastevidenceandtheactiontaken
▪ SolvingPOMDPs▪ Oneway:usetruncatedexpectimax
tocomputeapproximatevalueofactions
▪ Whatifyouonlyconsideredbustingoronesensefollowedbyabust?
▪ YougetaVPI-basedagent!
a
{e}
e,a
e’{e,e’}
a
b
b,a
b’
abust
{e}
{e},asense
e’{e,e’}
asense
U(abust,{e})
abust
U(abust,{e,e’})
e’
![Page 23: CS 343: Artificial Intelligencesniekum/classes/343-F18/lectures/lectur… · CS 343: Artificial Intelligence Decision Networks and Value of Perfect Information Prof. Scott Niekum](https://reader035.vdocuments.us/reader035/viewer/2022071219/6054c971a50126182c6d0afb/html5/thumbnails/23.jpg)
MoreGenerally
▪ Generalsolutionsmapbelieffunctionstoactions▪ Candivideregionsofbeliefspace(setof
belieffunctions)intopolicyregions(getscomplexquickly)
▪ Canbuildapproximatepoliciesusingdiscretizationmethods
▪ Canfactorbelieffunctionsinvariousways
▪ Overall,POMDPsarevery(actuallyPSPACE)hard
▪ MostrealproblemsarePOMDPs,butwecanrarelysolvetheningeneral!