ece-517: reinforcement learning in artificial intelligence lecture 4: discrete-time markov chains
DESCRIPTION
ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 4: Discrete-Time Markov Chains. September 1, 2010. Dr. Itamar Arel College of Engineering Department of Electrical Engineering & Computer Science The University of Tennessee Fall 2010. p. 1 -p. 1 -q. 0. 1. q. - PowerPoint PPT PresentationTRANSCRIPT
11
ECE-517: Reinforcement Learning in ECE-517: Reinforcement Learning in Artificial IntelligenceArtificial Intelligence
Lecture 4: Discrete-Time Markov ChainsLecture 4: Discrete-Time Markov Chains
Dr. Itamar ArelDr. Itamar Arel
College of EngineeringCollege of EngineeringDepartment of Electrical Engineering & Computer ScienceDepartment of Electrical Engineering & Computer Science
The University of TennesseeThe University of TennesseeFall 2010Fall 2010
September 1, 2010September 1, 2010
ECE 517 – Reinforcement Learning in AI 22
Simple DTMCsSimple DTMCs
““States” can be labeled (0,)1,2,3,…States” can be labeled (0,)1,2,3,…
At every time slot a “jump” decision is At every time slot a “jump” decision is made randomly based on the made randomly based on the currentcurrent statestate
10
p
q
1-q1-p
10
2
a
d fc b
e
(Sometimes the arrow pointing back to the same state is omitted)
ECE 517 – Reinforcement Learning in AI 33
1-D Random Walk1-D Random Walk
Time is slottedTime is slotted
The walker flips a coin every time slot to The walker flips a coin every time slot to decide which way to godecide which way to go
X(t)
p1-p
ECE 517 – Reinforcement Learning in AI 44
Single Server QueueSingle Server Queue
Consider a queue at a supermarketConsider a queue at a supermarket
In every time slot:In every time slot: A customer arrives with probability A customer arrives with probability pp The HoL customer leaves with probability The HoL customer leaves with probability qq
We’d like to learn about the behavior of such a We’d like to learn about the behavior of such a systemsystem
Bernoulli(p)Geom(q)
ECE 517 – Reinforcement Learning in AI 55
Birth-Death ChainBirth-Death Chain
Our queue can be modeled by a Our queue can be modeled by a Birth-DeathBirth-Death Chain (a.k.a. Chain (a.k.a. Geom/Geom/1 queue)Geom/Geom/1 queue)
Want to know: Want to know: Queue size distribution Queue size distribution Average waiting time, etc.Average waiting time, etc.
Markov PropertyMarkov Property The “The “FutureFuture” is independent of the “” is independent of the “PastPast” given the ” given the
““PresentPresent”” In other words: In other words: MemorylessMemoryless We’ve mentioned memoryless distributions: We’ve mentioned memoryless distributions: ExponentialExponential and and
GeometricGeometric Useful for modeling and analyzing real systemsUseful for modeling and analyzing real systems
0 1 2 3
ECE 517 – Reinforcement Learning in AI 66
Discrete Time Random Process (DTRP)Discrete Time Random Process (DTRP)
Random Process:Random Process: An indexed family of random An indexed family of random variablesvariables
Let Let XXnn be a DTRP consisting of a sequence of be a DTRP consisting of a sequence of independent, identically distributed (i.i.d.) random independent, identically distributed (i.i.d.) random variables with common cdf variables with common cdf FFXX((xx)). This sequence is . This sequence is
called the called the i.i.d. random processi.i.d. random process.. Example:Example: Sequence of Bernoulli trials (flip of coin) Sequence of Bernoulli trials (flip of coin) In networking: traffic may obey a Bernoulli i.i.d. arrival In networking: traffic may obey a Bernoulli i.i.d. arrival
PatternPattern
In reality, some degree of dependency/correlation In reality, some degree of dependency/correlation exists between consecutive elements in a DTRPexists between consecutive elements in a DTRP
Example:Example: Correlated packet arrivals (video/audio stream) Correlated packet arrivals (video/audio stream)
ECE 517 – Reinforcement Learning in AI 77
Discrete Time Markov ChainsDiscrete Time Markov Chains
A sequence of random variables A sequence of random variables {{XXnn}} is called a is called a
Markov ChainMarkov Chain if it has the if it has the Markov propertyMarkov property ::
States are usually labeled {(0,)1,2,…}States are usually labeled {(0,)1,2,…} State space can be finite or infiniteState space can be finite or infinite
Transition Probability MatrixTransition Probability Matrix
Probability of transitioning from state Probability of transitioning from state ii to state to state jj We will assume the MC is We will assume the MC is homogeneous/stationaryhomogeneous/stationary: :
independent of timeindependent of time Transition probability matrix: Transition probability matrix: PP = { = {ppijij}} Two state MC: Two state MC:
ECE 517 – Reinforcement Learning in AI 88
Stationary DistributionStationary Distribution
Define then Define then kk+1+1 = = k k P P (( is a row vector) is a row vector)
Stationary distributionStationary distribution: if the : if the limit existslimit exists
If If exists, we can solve it by exists, we can solve it by
These are called These are called balance equationsbalance equations Transitions in and out of state Transitions in and out of state ii are balanced are balanced
ECE 517 – Reinforcement Learning in AI 99
General Comment & Conditions for General Comment & Conditions for to Exist (I) to Exist (I)
If we partition all the states into two sets, then If we partition all the states into two sets, then transitions between the two sets must be transitions between the two sets must be “balanced”.“balanced”. This can be easily derived from the balance This can be easily derived from the balance
equationsequationsDefinitions:Definitions: State State jj is is reachablereachable by state by state ii ifif
State State ii and and jj commutecommute if they are reachable by if they are reachable by each othereach other
The Markov chain is The Markov chain is irreducibleirreducible if all states if all states commutecommute
ECE 517 – Reinforcement Learning in AI 1010
Conditions for Conditions for to Exist (I) (cont’d) to Exist (I) (cont’d)
Condition: The Markov chain is Condition: The Markov chain is irreducibleirreducible Counter-examples:Counter-examples:
Aperiodic Aperiodic Markov chain …Markov chain … Counter-example:Counter-example:
21 43 32p=1
1
10
2
1
0 01 1
0
ECE 517 – Reinforcement Learning in AI 1111
Conditions for Conditions for to Exist (II) to Exist (II)
For the Markov chain to be For the Markov chain to be recurrent…recurrent… All states All states ii must be must be recurrentrecurrent, i.e. , i.e.
Otherwise Otherwise transienttransient With regards to a recurrent MC …With regards to a recurrent MC …
State State ii is is recurrentrecurrent if if E(TE(Tii)<)<11, where , where TTii is time is time
between visits to state between visits to state iiOtherwise the state is considered Otherwise the state is considered nullnull--recurrentrecurrent
ECE 517 – Reinforcement Learning in AI 1212
Solving for Solving for Example for two-state Markov Chain Example for two-state Markov Chain
10
p
q
1-q1-p
p
ECE 517 – Reinforcement Learning in AI 1313
Birth-Death ChainBirth-Death Chain
Arrival w.p. Arrival w.p. pp ; departure w.p. ; departure w.p. qq
Let Let uu = = pp(1-(1-qq), ), dd = = qq(1-(1-pp),), = u/d= u/dBalance equations:Balance equations:
u u u u
d d d d
0 1 2 31-u
1-u-d 1-u-d 1-u-d
ECE 517 – Reinforcement Learning in AI 1414
Birth-Death Chain (cont’d)Birth-Death Chain (cont’d)
Continuing this pattern, we observe that:Continuing this pattern, we observe that:
((ii-1)-1)uu = = ((ii))ddEquivalently, we can draw a bi-section Equivalently, we can draw a bi-section between statebetween state ii and state and state ii-1-1Therefore, we haveTherefore, we have
where where = = u/d.u/d.What we are interested in is the stationary What we are interested in is the stationary distribution of the states, so …distribution of the states, so …
ECE 517 – Reinforcement Learning in AI 1515
Birth-Death Chain (cont’d)Birth-Death Chain (cont’d)