intelligent environments

Intelligent Environments 1

Intelligent Environments

Computer Science and Engineering

University of Texas at Arlington


Decision-Making forIntelligent Environments Motivation Techniques Issues


Motivation An intelligent environment

acquires and applies knowledge about you and your surroundings in order to improve your experience. “acquires” prediction “applies” decision making


Motivation Why do we need decision-making?

“Improve our experience” Usually alternative actions

Which one to take? Example (Bob scenario: bedroom

?) Turn on bathroom light? Turn on kitchen light? Turn off bedroom light?


Example Should I turn on the bathroom light? Issues

Inhabitant’s location (current and future) Inhabitant’s task Inhabitant’s preferences Energy efficiency Security Other inhabitants


Qualities of a Decision Maker Ideal

Complete: always makes a decision Correct: decision is always right Natural: knowledge easily expressed Efficient

Rational Decisions made to maximize

performance


Agent-based Decision Maker Russell & Norvig “AI: A Modern

Approach” Rational agent

Agent chooses an action to maximize its performance based on percept sequence


Agent Types Reflex agent Reflex agent with state Goal-based agent Utility-based agent


Reflex Agent


Reflex Agent with State


Goal-based Agent


Utility-based Agent


Intelligent Environments

Decision-Making Techniques


Decision-Making Techniques Logic Planning Decision theory Markov decision process Reinforcement learning


Logical Decision Making If Equal(?Day,Monday)

& GreaterThan(?CurrentTime,0600)& LessThan(?CurrentTime,0700)& Location(Bob,bedroom,?

CurrentTime)& Increment(?CurrentTime,?NextTime)

Then Location(Bob,bathroom,?NextTime)

Query: Location(Bob,?Room,0800)


Logical Decision Making Rules and facts

First-order predicate logic Inference mechanism

Deduction: {A, A B} B Systems

Prolog (PROgramming in LOGic) OTTER Theorem Prover


Prolog location(bob,bathroom,NextTime) :- dayofweek(Day), Day = monday, currenttime(CurrentTime), CurrentTime > 0600, CurrentTime < 0700, location(bob,bedroom,CurrentTime), increment(CurrentTime,NextTime). Facts: dayofweek(monday), ... Query: location(bob,Room,0800).


OTTER (all d all t1 all t2

((DayofWeek(d) & Equal(d,Monday) &

CurrentTime(t1) &

GreaterThan(t1,0600) &

LessThan(t1,0700) & NextTime(t1,t2)

& Location(Bob,Bedroom,t1)) ->

Location(Bob,Bathroom,t2))). Facts: DayofWeek(Monday), ... Query: (exists r (Location(Bob,r,0800)))


Actions If Location(Bob,Bathroom,t1) Then

Action(TurnOnBathRoomLight,t1) Preferences among actions

If RecommendedAction(a1,t1) & RecommendedAction(a2,t1) & ActionPriority(a1) > ActionPriority(a2) Then Action(a1,t1)


Persistence Over Time If Location(Bob,room1,t1) & not Move(Bob,t1) & NextTime(t1,t2) Then Location(Bob,room1,t2)

One for each attribute of Bob!


Logical Decision Making Assessment

Complete? Yes Correct? Yes Efficient? No Natural? No Rational?


Decision Making as Planning Search for a sequence of actions to

achieve some goal Requires

Initial state of the environment Goal state Actions (operators)

Conditions Effects (implied connection to effectors)


Example Initial: location(Bob,Bathroom) &

light(Bathroom,off) Goal: happy(Bob) Action 1

Condition: location(Bob,?r) & light(?r,on)Effect: Add: happy(Bob)

Action 2 Condition: light(?r,off) Effect: Delete: light(?r,off), Add: light(?r,on)

Plan: Action 2, Action 1


Requirements Where do goals come from?

System design Users

Where do actions come from? Device “drivers” Learned macros

E.g., SecureHome action


Planning Systems UCPOP (Univ. of Washington)

Partial Order Planner with Universal quanitification and Conditional effects

GraphPlan (CMU) Builds and prunes graph of possible

plans


GraphPlan Example

(:action lighton :parameters (?r) :precondition

(light ?r off)) :effects

(and (light ?r on) (not (light ?r off))))


Planning Assessment

Complete? Yes Correct? Yes Efficient? No Natural? Better Rational?


Decision Theory Logical and planning approaches

typically assume no uncertainty Decision theory = probability theory +

utility theory Maximum Expected Utility principle

Rational agent chooses actions yielding highest expected utility

Averaged over all possible action outcomes Weight utility of an outcome by its probability of

occurring


Probability Theory Random variables: X, Y, … Prior probability: P(X) Conditional probability: P(X|Y) Joint probability distribution

P(X1,…,Xn) is an n-dimensional table of probabilities

Complete table allows computation of any probability

Complete table typically infeasible


Probability Theory Bayes rule

Example

More likely to know P(wet|rain) In general, P(X|Y) = * P(Y|X) * P(Y)

chosen so that P(X|Y) = 1

)(

)()|()|(

YP

XPXYPYXP

)(

)()|()|(

wetP

rainPrainwetPwetrainP


Where Do Probabilities Come From?

Statistical sampling Universal principles Individual beliefs


Representation of Uncertain Knowledge

Complete joint probability distribution

Conditional probabilities and Bayes rule Assuming conditional independence

Belief networks


Belief Networks Nodes represent random variables Directed link between X and Y implies

that X “directly influences” Y Each node has a conditional

probability table (CPT) quantifying the effects that the parents (incoming links) have on the node

Network is a DAG (no directed cycles)


Belief Networks: Example


Belief Networks: Semantics Network represents the joint probability

distribution

Network encodes conditional independence knowledge Node conditionally independent of all other

nodes except parents E.g., MaryCalls and Earthquake are

conditionally independent

n

iiinnn XParentsxPxxPxXxXP

1111 ))(|(),...,(),...,(


Belief Networks: Inference Given network, compute

P(Query | Evidence) Evidence obtained from sensory percepts

Possible inferences Diagnostic: P(Burglary | JohnCalls) = 0.016 Causal: P(JohnCalls | Burglary) P(Burglary | Alarm & Earthquake)


Belief Network Construction Choose variables

Discretize continuous variables Order variables from causes to effects CPTs

Specify each table entry Define as a function (e.g., sum, Gaussian)

Learning Variables (evidential and hidden) Links (causation) CPTs


Combining Beliefs with Desires Maximum expected utility

Rational agent chooses action maximizing expected utility

Expected utility EU(A|E) of action A given evidence E

EU(A|E) = i P(Resulti(A) | E, Do(A)) * U(Resulti(A)) Resulti(A) are possible outcome states after

executing action A U(S) is the agent’s utility for state S Do(A) is the proposition that action A is executed in

the current state


Maximum Expected Utility Assumptions

Knowing evidence E completely requires significant sensory information

P(Result | E, Do(A)) requires complete causal model of the environment

U(Result) requires complete specification of state utilities

One-shot vs. sequential decisions


Utility Theory Any set of preferences over possible outcomes

can be expressed by a utility function Lottery L = [p1,S1; p2,S2; ...; pn,Sn]

pi is the probability of possible outcome Si

Si can be another lottery

Utility principle U(A) > U(B) A preferred to B U(A) = U(B) agent indifferent to A and B

Maximum expected utility principle U([p1,S1; p2,S2; ...; pn,Sn]) = i pi * U(Si)


Utility Functions Possible outcomes

[1.0, $1000; 0.0, $0] [0.5, $3000; 0.5, $0]

Expected monetary value $1000 vs. $1500

But depends on value $k Sk = state of possessing wealth $k EU(accept) = 0.5 * U(Sk+3000) + 0.5 * U(Sk) EU(decline) = U(Sk+1000) Will decline for some values of U, accept for

others


Utility Functions (cont.)

Studies show U(Sk+n) = log2n Risk-adverse agents in positive part of curve Risk-seeking agents in negative part of curve


Decision Networks Also called influence diagrams Decision networks = belief

networks + actions and utilities Describes agent’s

Current state Possible actions State resulting from agent’s action Utility of resulting state


Example Decision Network


Decision Network Chance node (oval)

Random variable and CPT Same as belief network node

Decision node (rectangle) Can take on a value for each possible action

Utility node (diamond) Parents are those chance nodes affecting utility Contains utility function mapping parents to

utility value or lottery


Evaluating Decision Networks Set evidence variables according to

current state For each action value of decision

node Set value of decision node to action Use belief-net inference to calculate

posteriors for parents of utility node Calculate utility for action

Return action with highest utility


Sequential Decision Problems No intermediate utility on the way to the

goal Transition model

Probability of reaching state j after taking action a in state i

Policy = complete mapping from states to actions Want policy maximizing expected utility Computed from transition model and state

utilities

aijM


Example

P(intended direction) = 0.8 P(right angle to intended) = 0.1 U(sequence) = terminal state’s value -

(1/25)*length(sequence)


Example (cont.)

Optimal Policy Utilities


Markov Decision Process (MDP) Calculating optimal policy in fully-

observable, stochastic environment with known transition model

Markov property satisfied depends only on i and not

previous states Partially-observable environments

addressed by POMDPs

aijM

aijM


Value Iteration for MDPs Iterate the following for each state i

until little change

R(i) is the reward for entering state i -0.04 for all states except (4,3) and (4,2) +1 for (4,3) -1 for (4,2)

Best policy policy*(i) is

j

aij

ajUMiRiU )(max)()(

j

aij

ajUMipolicy )(maxarg)(*


Reinforcement Learning Basically MDP, but learns policy without

the need for transition model Q-learning with temporal difference

Assigns values Q(a,i) to action-state pairs Utility U(i) = maxa Q(a,i) Update Q(a,i) after each observed transition

from state i to state jQ(a,i) = Q(a,i) + * (R(i) + maxa’ Q(a’,j) - Q(a,i))

action in state i = argmaxa Q(a,i)

aijM


Decision-Theoretic Agent Given

Percept (sensor) information Maintain

Decision network with beliefs, actions and utilities

Do Update probabilities for current state Compute outcome probabilities for actions Select action with highest expected utility

Return action


Decision-Theoretic Agent Modeling sensors


Sensor Modeling Combining evidence from multiple

sensors


Sensor Modeling Detailed model of lane-position

sensor


Dynamic Belief Network (DBN)

Reasoning over time Big for lots of states But really only need two slices at a time


Dynamic Belief Network (DBN)


DBN for Lane Positioning


Dynamic Decision Network (DDN)


DDN-based Agent Capabilities

Handles uncertainty Handles unexpected events (no fixed plan) Handles noisy and failed sensors Acts to obtain relevant information

Needs Properties from first-order logic

DDNs are propositional Goal directedness


Decision-Theoretic Agent Assessment

Complete? No Correct? No Efficient? Better Natural? Yes Rational? Yes


Netica www.norsys.com Decision network simulator

Chance nodes Decision nodes Utility nodes

Learns probabilities from cases


Bob Scenario in Netica


Issues in Decision Making Rational agent design

Dynamic decision-theoretic agent Knowledge engineering effort Efficiency vs. completeness

Monolithic vs. distributed intelligence

Degrees of autonomy

intelligent environments

Documents

t1 locationbob

t2 locationbob

r locationbob

timeif locationbob

nexttimethen locationbob

bathroom light

decision makeridealcomplete

t1 nexttimet1