markov models introduction to artificial intelligence cos302 michael l. littman fall 2001

Markov ModelsMarkov Models

Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence

COS302COS302

Michael L. LittmanMichael L. Littman

Fall 2001Fall 2001

AdministrationAdministration

Hope your midterm week is going Hope your midterm week is going well.well.

Breaking the BankBreaking the Bank

Assistant professor: $20k/year.Assistant professor: $20k/year.

How much total cash?How much total cash?

20+20+20+20+… = infinity!20+20+20+20+… = infinity!

Nice, eh?Nice, eh?

Discounted RewardsDiscounted Rewards

Idea: The promise of future payment Idea: The promise of future payment is not worth is not worth quitequite as much as as much as payment now.payment now.

• Inflation / investingInflation / investing• Chance of game “ending”Chance of game “ending”

Ex. $10k next year might be worth Ex. $10k next year might be worth $10k x 0.9 today.$10k x 0.9 today.

Infinite SumInfinite Sum

Assuming a discount rate of 0.9, how Assuming a discount rate of 0.9, how much does the assistant professor much does the assistant professor get in total?get in total?

20 + .9 20 + .920 + .9 20 + .922 20 + .9 20 + .933 20 + … 20 + …

= 20 + .9 (20 + .9 20 + .9= 20 + .9 (20 + .9 20 + .922 20 + …) 20 + …)

x = 20 + .9 xx = 20 + .9 x

x = 20/(.1) = 200x = 20/(.1) = 200

Academic LifeAcademic Life

A. AssistantProf.: 20

B. AssociateProf.: 60

T. TenuredProf.: 100

S. Out on theStreet: 10 D. Dead: 0

1.0

0.60.2

0.2

0.7

0.3

0.6

0.2

0.20.7

0.3

Solving for Total RewardSolving for Total Reward

L(i) is expected total reward received L(i) is expected total reward received starting in state i.starting in state i.

How could we compute L(A)?How could we compute L(A)?

Would it help to compute L(B), L(T), Would it help to compute L(B), L(T), L(S), and L(D) also?L(S), and L(D) also?

Working BackwardsWorking Backwards





1.0

0.60.2

0.2

0.7

0.3

0.6

0.2

0.20.7

0.3

Discountfactor 0.9

0

270

27

247151

Reincarnation?Reincarnation?





0.5

0.60.2

0.2

0.7

0.3

0.6

0.2

0.20.7

0.3

Discountfactor 0.90.5

System of EquationsSystem of Equations

L(A) = 20 + .9(.6 L(A) + .2 L(B) + .2 L(S))L(A) = 20 + .9(.6 L(A) + .2 L(B) + .2 L(S))

L(B) = 60 + .9(.6 L(B) + .2 L(S) + .2 L(T))L(B) = 60 + .9(.6 L(B) + .2 L(S) + .2 L(T))

L(S) = 10 + .9(.7 L(S) + .3 L(D))L(S) = 10 + .9(.7 L(S) + .3 L(D))

L(T) = 100 + .9(.7 L(T) + .3 L(D))L(T) = 100 + .9(.7 L(T) + .3 L(D))

L(D) = 0 + .9 (.5 L(D) + .5 L(A))L(D) = 0 + .9 (.5 L(D) + .5 L(A))

Transition MatrixTransition Matrix

Let P be the matrix of probs: Pij = Let P be the matrix of probs: Pij = Pr(next = j | current = i)Pr(next = j | current = i)

0.60.6 0.20.2 0.20.2

0.60.6 0.20.2 0.20.2

0.70.7 0.30.3

0.70.7 0.30.3

0.50.5 0.50.5

AABBSSTTDD

A B S T DA B S T D

FromFrom

ToTo

Matrix EquationMatrix Equation

L = R + L = R + P L P L

.6.6 .2.2 .2.2

.6.6 .2.2 .2.2

.7.7 .3.3

.7.7 .3.3

.5.5 .5.5

L(A)L(A)L(B)L(B)L(S)L(S)L(T)L(T)L(D)L(D)

L(A)L(A)L(B)L(B)L(S)L(S)L(T)L(T)L(D)L(D)

20+.920+.960+.960+.910+.910+.9

100+.9100+.90+.90+.9

==========

Solving the EquationSolving the Equation

L = R + L = R + P L P L

L - L - P L = R P L = R

I L - I L - P L = R (introduce identity) P L = R (introduce identity)

(I - (I - P) L = R P) L = R

(I - (I - P) P)-1-1 (I - (I - P) L = (I - P) L = (I - P) P)-1-1 R R

L = (I - L = (I - P) P)-1-1 R R

Matrix inversion, matrix-vec mult.Matrix inversion, matrix-vec mult.

Markov ChainMarkov Chain

Set of states, transitions from state Set of states, transitions from state to state.to state.

Transitions only depend on current Transitions only depend on current state, not the history: state, not the history: Markov Markov propertyproperty..

What Does a MC Do?What Does a MC Do?

A: 20 B: 80

D: 100

C: 10E: 0

0.5

0.60.2

0.2

0.7

0.3

0.6

0.2

0.20.7

0.3

0.5

MC ProblemsMC Problems

• Probability of going from s to s’ in t Probability of going from s to s’ in t steps.steps.

• Probability of going from s to s’ in t or Probability of going from s to s’ in t or fewer steps.fewer steps.

• Averaged over t steps (in the limit), how Averaged over t steps (in the limit), how often in state s’ starting at s?often in state s’ starting at s?

• How many steps from s to s’, on How many steps from s to s’, on average?average?

• Given reward values, expected Given reward values, expected discounted reward starting at s.discounted reward starting at s.

ExamplesExamples

Queuing system: Expected queue length, Queuing system: Expected queue length, time until queue fills up.time until queue fills up.

Chutes & Ladders: Avg game time.Chutes & Ladders: Avg game time.

Genetic Algs: Time to find opt.Genetic Algs: Time to find opt.

Statistics: Time to “mix”.Statistics: Time to “mix”.

Blackjack: Expected winnings.Blackjack: Expected winnings.

Robot: Prob. of crash given controller.Robot: Prob. of crash given controller.

Gamblers ruin: Time until money gone.Gamblers ruin: Time until money gone.

Gambler’s RuinGambler’s Ruin

Gambler has 3 dollars.Gambler has 3 dollars.Win a dollar with prob. 1/3.Win a dollar with prob. 1/3.Lose a dollar with prob. 2/3.Lose a dollar with prob. 2/3.Fail: no dollars.Fail: no dollars.Succeed: Have 5 dollars.Succeed: Have 5 dollars.

Probability of success?Probability of success?Average time until success or failure?Average time until success or failure?Set up Markov chains.Set up Markov chains.

Ruin ChainRuin Chain

Solve for L(3) using Solve for L(3) using =1.=1.Gives probability of success.Gives probability of success.

00 11 22 33 44 55

1/3

2/3

1

1

1+1

Gambling Time ChainGambling Time Chain

Solve for L(3) using Solve for L(3) using =1.=1.L(3) will be the number of steps in 1, L(3) will be the number of steps in 1,

2, 3, 4 states before finishing.2, 3, 4 states before finishing.

00 11 22 33 44 55

1/3

2/3 1

1

+1

Stationary DistributionStationary Distribution

What fraction of the time is spent in D?What fraction of the time is spent in D?

L(D) = L(D) = 0.7 L(D) + 0.2 L(B)0.7 L(D) + 0.2 L(B)Also: L(A)+L(B)+L(C)+L(D)+L(E) = 1Also: L(A)+L(B)+L(C)+L(D)+L(E) = 1

A B

D

C E

0.5

0.60.2

0.2

0.7

0.3

0.6

0.20.2

0.7

0.3

0.5

Other Uses: UncertaintyOther Uses: Uncertainty

Can add a notion of “actions” to create a Can add a notion of “actions” to create a Markov decision processMarkov decision process. Useful for AI . Useful for AI planning.planning.

Can add a notion of “observation” to Can add a notion of “observation” to create create hidden Markov modelhidden Markov model (we’ll see (we’ll see these later).these later).

Add both to get Add both to get partially observable partially observable Markov decision processMarkov decision process (POMDP). (POMDP).

What to LearnWhat to Learn

Solving for the value of Markov Solving for the value of Markov processes using matrix equations.processes using matrix equations.

Setting up problems as Markov Setting up problems as Markov chains.chains.

Homework 5 (due 11/7)Homework 5 (due 11/7)

1.1. The value iteration algorithm from the The value iteration algorithm from the Games of ChanceGames of Chance lecture can be lecture can be applied to deterministic games with applied to deterministic games with loops. Argue that it produces the same loops. Argue that it produces the same answer as the “Loopy” algorithm from answer as the “Loopy” algorithm from the the Game TreeGame Tree lecture. lecture.

2.2. Write the matrix form of the game tree Write the matrix form of the game tree below.below.

Game TreeGame Tree

X-1

+2

-1 +4

Y-2 Y-3

X-4

L R

L R

L R

+5L

+2R

ContinuedContinued

3. How many times (on average) do 3. How many times (on average) do you need to flip a coin before you you need to flip a coin before you flip 3 heads in a row? (a) Set this flip 3 heads in a row? (a) Set this up as a Markov chain, and (b) solve up as a Markov chain, and (b) solve it.it.

4. More soon…4. More soon…

markov models introduction to artificial intelligence cos302 michael l. littman fall 2001

Documents

state s

la slide

r i p

t steps

b s t d

fewer steps

r matrix inversion

reward values