ece-517: reinforcement learning in artificial intelligence...
TRANSCRIPT
1
ECE-517: Reinforcement Learning in Artificial Intelligence
Lecture 4: Discrete-Time Markov Chains
Dr. Itamar Arel
College of EngineeringDepartment of Electrical Engineering & Computer Science
The University of TennesseeFall 2015
September 1, 2015
ECE 517 – Reinforcement Learning in AI 2
Simple DTMCs
“States” can be labeled (0,)1,2,3,…
At every time slot a “jump” decision is made randomly based on the current state
10
p
q
1-q1-p
10
2
a
d fc b
e
(Sometimes the arrow pointing back to the same state is omitted)
ECE 517 – Reinforcement Learning in AI 3
1-D Random Walk
Time is slotted
The walker flips a coin every time slot to decide which way to go
X(t)
p1-p
ECE 517 – Reinforcement Learning in AI 4
Single Server Queue
Consider a queue at a supermarket
In every time slot: A customer arrives with probability p The HoL customer leaves with probability q
We’d like to learn about the behavior of such a system
Bernoulli(p)
Geom(q)
ECE 517 – Reinforcement Learning in AI 5
Birth-Death Chain
Our queue can be modeled by a Birth-Death Chain (a.k.a. Geom/Geom/1 queue)
Want to know: Queue size distribution
Average waiting time, etc.
Markov Property The “Future” is independent of the “Past” given the “Present”
In other words: Memoryless
We’ve mentioned memoryless distributions: Exponential and Geometric
Useful for modeling and analyzing real systems
0 1 2 3
ECE 517 – Reinforcement Learning in AI 6
Discrete Time Random Process (DTRP)
Random Process: An indexed family of random variables
Let Xn be a DTRP consisting of a sequence of independent, identically distributed (i.i.d.) random
variables with common cdf FX(x). This sequence is called the i.i.d. random process. Example: Sequence of Bernoulli trials (flip of coin)
In networking: traffic may obey a Bernoulli i.i.d. arrival Pattern
In reality, some degree of dependency/correlation exists between consecutive elements in a DTRP Example: Correlated packet arrivals (video/audio stream)
ECE 517 – Reinforcement Learning in AI 7
Discrete Time Markov Chains
A sequence of random variables {Xn} is called a Markov Chain if it has the Markov property:
States are usually labeled {(0,)1,2,…}
State space can be finite or infinite
Transition Probability Matrix
Probability of transitioning from state i to state j
We will assume the MC is homogeneous/stationary: independent of time
Transition probability matrix: P = {pij}
Two state MC:
ECE 517 – Reinforcement Learning in AI 8
Stationary Distribution
Define then pk+1 = pk P
(p is a row vector)
Stationary distribution: if the limit exists
If p exists, we can solve it by
These are called balance equations
Transitions in and out of state i are balanced
ECE 517 – Reinforcement Learning in AI 9
General Comment & Conditions for p to Exist (I)
If we partition all the states into two sets, then transitions between the two sets must be “balanced”. This can be easily derived from the balance equations
Definitions:
State j is reachable by state i if
State i and j commute if they are reachable by each other
The Markov chain is irreducible if all states commute
ECE 517 – Reinforcement Learning in AI 10
Conditions for p to Exist (I) (cont’d)
Condition: The Markov chain is irreducible Counter-examples:
Aperiodic Markov chain … Counter-example:
21 43 32p=1
1
10
2
1
0 01 1
0
ECE 517 – Reinforcement Learning in AI 11
Conditions for p to Exist (II)
For the Markov chain to be recurrent…
All states i must be recurrent, i.e.
Otherwise transient
With regards to a recurrent MC …
State i is recurrent if E(Ti)<1, where Ti is time
between visits to state i
Otherwise the state is considered null-recurrent
ECE 517 – Reinforcement Learning in AI 12
Solving for p: Example for two-state Markov Chain
10
p
q
1-q1-p
p
ECE 517 – Reinforcement Learning in AI 13
Birth-Death Chain
Arrival w.p. p ; departure w.p. q
Let u = p(1-q), d = q(1-p), r = u/d
Balance equations:
u u u u
d d d d
0 1 2 31-u
1-u-d 1-u-d 1-u-d
ECE 517 – Reinforcement Learning in AI 14
Birth-Death Chain (cont’d)
Continuing this pattern, we observe that:
p(i-1)u = p(i)d
Equivalently, we can draw a bi-section between state i and state i-1
Therefore, we have
where r = u/d.
What we are interested in is the stationary distribution of the states, so …
ECE 517 – Reinforcement Learning in AI 15
Birth-Death Chain (cont’d)