Decision Making Under Decision Making Under UncertaintyUncertainty
Russell and Norvig: ch 16
CMSC421 – Fall 2006
Non-deterministic vs. Non-deterministic vs. Probabilistic UncertaintyProbabilistic Uncertainty
?
ba c
{a,b,c}
decision that is best for worst case
?
ba c
{a(pa),b(pb),c(pc)}
decision that maximizes expected utility valueNon-deterministic
modelProbabilistic model~ Adversarial search
Expected UtilityExpected UtilityRandom variable X with n values x1,…,xn and distribution (p1,…,pn)E.g.: Xi is Resulti(A)|Do(A), E, the state reached after doing an action A given E, what we know about the current state Function U of XE.g., U is the utility of a stateThe expected utility of A is EU[A|E] = i=1,…,n p(xi|A)U(xi)
= i=1,…,n p(Resulti(A)|Do(A),E)U(Resulti(A))
s0
s3s2s1
A1
0.2 0.7 0.1100 50 70
U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62
One State/One Action One State/One Action ExampleExample
s0
s3s2s1
A1
0.2 0.7 0.1100 50 70
A2
s40.2 0.8
80
• U1(S0) = 62• U2(S0) = 74• U(S0) = max{U1(S0),U2(S0)} = 74
One State/Two Actions One State/Two Actions ExampleExample
s0
s3s2s1
A1
0.2 0.7 0.1100 50 70
A2
s40.2 0.8
80
• U1(S0) = 62 – 5 = 57• U2(S0) = 74 – 25 = 49• U(S0) = max{U1(S0),U2(S0)} = 57
-5 -25
Introducing Action CostsIntroducing Action Costs
MEU Principle
rational agent should choose the action that maximizes agent’s expected utilitythis is the basis of the field of decision theorynormative criterion for rational choice of action
Not quite…
Must have complete model of: Actions Utilities States
Even if you have a complete model, will be computationally intractableIn fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationalityNevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision theoretic problems than ever before
We’ll look at
Decision Theoretic Reasoning Simple decision making (ch. 16) Sequential decision making (ch. 17)
Preferences
An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations with uncertain prizes
Lottery L = [p, A; (1 – p), B]
Notation:A > B: A preferred to BA B : indifference between A and
BA ≥ B : B not preferred to A
Rational Preferences
Idea: preferences of a rational agent must obey constraints
Axioms of Utility Theory1. Orderability:
(A > B) v (B > A) v (A B)2. Transitivity:
(A > B) ^ (B > C) (A > C)3. Contitnuity:
A > B > C p [p, A; 1-p,C] B4. Substitutability:
A B [p, A; 1-p,C] [p, B; 1-p,C] 5. Monotonicity:
A > B (p ≥ q [p, A; 1-p, B] ≥ [q, A; 1-q, B])
Rational Preferences
Violating the constraints leads to irrational behavior
E.g: an agent with intransitive preferences can be induced to give away all its money
if B > C, than an agent who has C would pay some amount, say $1, to get B
if A > B, then an agent who has B would pay, say, $1 to get A
if C > A, then an agent who has A would pay, say, $1 to get C
….oh, oh!
Rational Preferences Utility
Theorem (Ramsey, 1931, von Neumann and Morgenstern, 1944): Given preferences satisfying the constraints, there exists a real-valued function U such that
U(A) ≥ U(B) A ≥BU([p1,S1;…,pn,Sn])=i piU(Si)
MEU principle: Choose the action that maximizes expected utility
Utility Assessment
Standard approach to assessment of human utilites:compare a given state A to a standard lottery Lp that has
best possible prize w/ prob. pworst possible catastrophy w/
prob. (1-p)adjust lottery probability p until ALpcontinue as before
instant death
A Lp
p
1 - p
Aside: Money Utility function
Given a lottery L with expected monetrary value EMV(L), usually U(L) < U(EMV(L)) e.g., people are risk-averse
Would you rather have $1,000,000 for sure, or a lottery with [0.5, $0; 0.5, $3,000,000]?
Decision Networks
Extend BNs to handle actions and utilitiesAlso called Influence diagramsMake use of BN inferenceCan do Value of Information calculations
Decision Networks cont.
Chance nodes: random variables, as in BNsDecision nodes: actions that decision maker can takeUtility/value nodes: the utility of the outcome state.
Umbrella Network
rainTake Umbrella
happiness
take/don’t takeP(rain) = 0.4
U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25
umbrella
P(umb|take) = 1.0P(~umb|~take)=1.0
Evaluating Decision Networks
Set the evidence variables for current stateFor each possible value of the decision node: Set decision node to that value Calculate the posterior probability of the
parent nodes of the utility node, using BN inference
Calculate the resulting utility for action
return the action with the highest utility
Umbrella Network
rainTake Umbrella
happiness
take/don’t takeP(rain) = 0.4
U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25
umbrella
P(umb|take) = 1.0P(umb|~take)= 0
Umbrella Network
rainTake Umbrella
happiness
take/don’t takeP(rain) = 0.4
U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25
umbrella
P(umb|take) = 0.8P(umb|~take)=0.1
umb rain P(umb,rain | take)
0 0 0.2 x 0.6
0 1 0.2 x 0.4
1 0 0.8 x 0.6
1 1 0.8 x 0.4
#1
#1: EU(take) = 100 x .12 + -100 x 0.08 + 0 x 0.48 + -25 x .32 = ???
Umbrella Network
rainTake Umbrella
happiness
take/don’t takeP(rain) = 0.4
U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25
umbrella
P(umb|take) = 0.8P(umb|~take)=0.1
umb rain P(umb,rain | ~take)
0 0 0. 9 x 0.6
0 1 0.9 x 0.4
1 0 0.1 x 0.6
1 1 0.1 x 0.4
#2
#2: EU(~take) = 100 x .54 + -100 x 0.36 + 0 x 0.06 + -25 x .04 = ???
So, in this case I would…?
Value of Information
Idea: Compute the expected value of acquiring possible evidence
Example: buying oil drilling rights Two blocks A and B, exactly one of them has oil, worth k Prior probability 0.5 Current price of block is k/2
What is the value of getting a survey of A done? Survey will say ‘oil in A’ or ‘no oil in A’ w/ prob. 0.5
Compute expected value of information (VOI) expected value of best action given the infromation minus expected
value of best action without informationVOI(Survey) = [0.5 x value of buy A given oil in A] + [0.5 x value of buy B given no oil in A] – 0
= ??
Value of Information (VOI)
suppose agent’s current knowledge is E. The value of the current best action is
i
iiA
))A(Do,E|)A(sult(ReP))A(sult(ReUmax)E|(EU
the value of the new best action (after new evidence E’ is obtained):
i
iiA
))A(Do,E,E|)A(sult(ReP))A(sult(ReUmax)E,E|(EU
the value of information for E’ is:
)E|(EU)E,e|(EU)E|e(P)E(VOIk
kekk
Umbrella Network
rain
forecast
Take Umbrella
happiness
take/don’t takeP(rain) = 0.4
U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25
umbrella
P(umb|take) = 0.8P(umb|~take)=0.1
R P(F=rainy|R)
0 0.2
1 0.7
umb rain P(umb,rain | take, rainy)
0 0
0 1
1 0
1 1
#1: EU(take|rainy)
umb rain P(umb,rain | take, ~rainy)
0 0
0 1
1 0
1 1
#3: EU(take|~rainy)
umb rain P(umb,rain | ~take, rainy)
0 0
0 1
1 0
1 1
#2: EU(~take|rainy)
umb rain P(umb,rain |~take, ~rainy)
0 0
0 1
1 0
1 1
#4: EU(~take|~rainy)
Umbrella Network
rain
forecast
Take Umbrella
happiness
take/don’t take
P(F=rainy) = 0.4
U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25
umbrella
P(umb|take) = 0.8P(umb|~take)=0.1
F P(R=rain|F)
0 0.2
1 0.7
umb rain P(umb,rain | take, rainy)
0 0
0 1
1 0
1 1
#1: EU(take|rainy)
umb rain P(umb,rain | take, ~rainy)
0 0
0 1
1 0
1 1
#3: EU(take|~rainy)
umb rain P(umb,rain | ~take, rainy)
0 0
0 1
1 0
1 1
#2: EU(~take|rainy)
umb rain P(umb,rain |~take, ~rainy)
0 0
0 1
1 0
1 1
#4: EU(~take|~rainy)