ece-517: reinforcement learning in artificial intelligence lecture 4: discrete-time markov chains

15
1 ECE-517: Reinforcement Learning in ECE-517: Reinforcement Learning in Artificial Intelligence Artificial Intelligence Lecture 4: Discrete-Time Markov Chains Lecture 4: Discrete-Time Markov Chains Dr. Itamar Arel Dr. Itamar Arel College of Engineering College of Engineering Department of Electrical Engineering & Computer Science Department of Electrical Engineering & Computer Science The University of Tennessee The University of Tennessee Fall 2010 Fall 2010 September 1, 2010 September 1, 2010

Upload: jerold

Post on 19-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 4: Discrete-Time Markov Chains. September 1, 2010. Dr. Itamar Arel College of Engineering Department of Electrical Engineering & Computer Science The University of Tennessee Fall 2010. p. 1 -p. 1 -q. 0. 1. q. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

11

ECE-517: Reinforcement Learning in ECE-517: Reinforcement Learning in Artificial IntelligenceArtificial Intelligence

Lecture 4: Discrete-Time Markov ChainsLecture 4: Discrete-Time Markov Chains

Dr. Itamar ArelDr. Itamar Arel

College of EngineeringCollege of EngineeringDepartment of Electrical Engineering & Computer ScienceDepartment of Electrical Engineering & Computer Science

The University of TennesseeThe University of TennesseeFall 2010Fall 2010

September 1, 2010September 1, 2010

Page 2: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 22

Simple DTMCsSimple DTMCs

““States” can be labeled (0,)1,2,3,…States” can be labeled (0,)1,2,3,…

At every time slot a “jump” decision is At every time slot a “jump” decision is made randomly based on the made randomly based on the currentcurrent statestate

10

p

q

1-q1-p

10

2

a

d fc b

e

(Sometimes the arrow pointing back to the same state is omitted)

Page 3: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 33

1-D Random Walk1-D Random Walk

Time is slottedTime is slotted

The walker flips a coin every time slot to The walker flips a coin every time slot to decide which way to godecide which way to go

X(t)

p1-p

Page 4: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 44

Single Server QueueSingle Server Queue

Consider a queue at a supermarketConsider a queue at a supermarket

In every time slot:In every time slot: A customer arrives with probability A customer arrives with probability pp The HoL customer leaves with probability The HoL customer leaves with probability qq

We’d like to learn about the behavior of such a We’d like to learn about the behavior of such a systemsystem

Bernoulli(p)Geom(q)

Page 5: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 55

Birth-Death ChainBirth-Death Chain

Our queue can be modeled by a Our queue can be modeled by a Birth-DeathBirth-Death Chain (a.k.a. Chain (a.k.a. Geom/Geom/1 queue)Geom/Geom/1 queue)

Want to know: Want to know: Queue size distribution Queue size distribution Average waiting time, etc.Average waiting time, etc.

Markov PropertyMarkov Property The “The “FutureFuture” is independent of the “” is independent of the “PastPast” given the ” given the

““PresentPresent”” In other words: In other words: MemorylessMemoryless We’ve mentioned memoryless distributions: We’ve mentioned memoryless distributions: ExponentialExponential and and

GeometricGeometric Useful for modeling and analyzing real systemsUseful for modeling and analyzing real systems

0 1 2 3

Page 6: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 66

Discrete Time Random Process (DTRP)Discrete Time Random Process (DTRP)

Random Process:Random Process: An indexed family of random An indexed family of random variablesvariables

Let Let XXnn be a DTRP consisting of a sequence of be a DTRP consisting of a sequence of independent, identically distributed (i.i.d.) random independent, identically distributed (i.i.d.) random variables with common cdf variables with common cdf FFXX((xx)). This sequence is . This sequence is

called the called the i.i.d. random processi.i.d. random process.. Example:Example: Sequence of Bernoulli trials (flip of coin) Sequence of Bernoulli trials (flip of coin) In networking: traffic may obey a Bernoulli i.i.d. arrival In networking: traffic may obey a Bernoulli i.i.d. arrival

PatternPattern

In reality, some degree of dependency/correlation In reality, some degree of dependency/correlation exists between consecutive elements in a DTRPexists between consecutive elements in a DTRP

Example:Example: Correlated packet arrivals (video/audio stream) Correlated packet arrivals (video/audio stream)

Page 7: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 77

Discrete Time Markov ChainsDiscrete Time Markov Chains

A sequence of random variables A sequence of random variables {{XXnn}} is called a is called a

Markov ChainMarkov Chain if it has the if it has the Markov propertyMarkov property ::

States are usually labeled {(0,)1,2,…}States are usually labeled {(0,)1,2,…} State space can be finite or infiniteState space can be finite or infinite

Transition Probability MatrixTransition Probability Matrix

Probability of transitioning from state Probability of transitioning from state ii to state to state jj We will assume the MC is We will assume the MC is homogeneous/stationaryhomogeneous/stationary: :

independent of timeindependent of time Transition probability matrix: Transition probability matrix: PP = { = {ppijij}} Two state MC: Two state MC:

Page 8: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 88

Stationary DistributionStationary Distribution

Define then Define then kk+1+1 = = k k P P (( is a row vector) is a row vector)

Stationary distributionStationary distribution: if the : if the limit existslimit exists

If If exists, we can solve it by exists, we can solve it by

These are called These are called balance equationsbalance equations Transitions in and out of state Transitions in and out of state ii are balanced are balanced

Page 9: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 99

General Comment & Conditions for General Comment & Conditions for to Exist (I) to Exist (I)

If we partition all the states into two sets, then If we partition all the states into two sets, then transitions between the two sets must be transitions between the two sets must be “balanced”.“balanced”. This can be easily derived from the balance This can be easily derived from the balance

equationsequationsDefinitions:Definitions: State State jj is is reachablereachable by state by state ii ifif

State State ii and and jj commutecommute if they are reachable by if they are reachable by each othereach other

The Markov chain is The Markov chain is irreducibleirreducible if all states if all states commutecommute

Page 10: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 1010

Conditions for Conditions for to Exist (I) (cont’d) to Exist (I) (cont’d)

Condition: The Markov chain is Condition: The Markov chain is irreducibleirreducible Counter-examples:Counter-examples:

Aperiodic Aperiodic Markov chain …Markov chain … Counter-example:Counter-example:

21 43 32p=1

1

10

2

1

0 01 1

0

Page 11: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 1111

Conditions for Conditions for to Exist (II) to Exist (II)

For the Markov chain to be For the Markov chain to be recurrent…recurrent… All states All states ii must be must be recurrentrecurrent, i.e. , i.e.

Otherwise Otherwise transienttransient With regards to a recurrent MC …With regards to a recurrent MC …

State State ii is is recurrentrecurrent if if E(TE(Tii)<)<11, where , where TTii is time is time

between visits to state between visits to state iiOtherwise the state is considered Otherwise the state is considered nullnull--recurrentrecurrent

Page 12: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 1212

Solving for Solving for Example for two-state Markov Chain Example for two-state Markov Chain

10

p

q

1-q1-p

p

Page 13: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 1313

Birth-Death ChainBirth-Death Chain

Arrival w.p. Arrival w.p. pp ; departure w.p. ; departure w.p. qq

Let Let uu = = pp(1-(1-qq), ), dd = = qq(1-(1-pp),), = u/d= u/dBalance equations:Balance equations:

u u u u

d d d d

0 1 2 31-u

1-u-d 1-u-d 1-u-d

Page 14: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 1414

Birth-Death Chain (cont’d)Birth-Death Chain (cont’d)

Continuing this pattern, we observe that:Continuing this pattern, we observe that:

((ii-1)-1)uu = = ((ii))ddEquivalently, we can draw a bi-section Equivalently, we can draw a bi-section between statebetween state ii and state and state ii-1-1Therefore, we haveTherefore, we have

where where = = u/d.u/d.What we are interested in is the stationary What we are interested in is the stationary distribution of the states, so …distribution of the states, so …

Page 15: ECE-517: Reinforcement Learning in  Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE 517 – Reinforcement Learning in AI 1515

Birth-Death Chain (cont’d)Birth-Death Chain (cont’d)