information and communication theory lecture 3

21
Information and Communication Theory Lecture 3 Markov Sources ario A. T. Figueiredo DEEC, Instituto Superior T´ ecnico, University of Lisbon, Portugal 2021 Lecture 3 (Markov Sources) Information and Communication Theory 2021 1/

Upload: others

Post on 31-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information and Communication Theory Lecture 3

Information and Communication Theory

Lecture 3

Markov Sources

Mario A. T. Figueiredo

DEEC, Instituto Superior Tecnico, University of Lisbon, Portugal

2021

Lecture 3 (Markov Sources) Information and Communication Theory 2021 1/∞

Page 2: Information and Communication Theory Lecture 3

Discrete Sources

Memoryless assumption is dropped.

Sequence of random variables: discrete-time stochastic process.

Full characterization: for any L ∈ N and any {t1, ..., tL}

fXt1 ,...,XtL(x1, ..., xL) = P(Xt1 = x1, ...., XtL = xL)

must be known.

Without some structure, essentially impossible in general.

Lecture 3 (Markov Sources) Information and Communication Theory 2021 2/∞

Page 3: Information and Communication Theory Lecture 3

Stationary Sources

Stationary source: for any L ∈ N and any {t1, ..., tL},

fXt1 ,...,XtL(x1, ..., xL) = fXt1+s,...,XtL+s(x1, ..., xL),

for any shift s ∈ Z such that all t1 + s ≥ 1, ..., tL + s ≥ 1.

Example, with X = {a, b, c, d}, L = 3,

fX2,X5,X7(b, c, a) = fX32,X35,X37(b, c, a) = fX1,X4,X6(b, c, a)

...in other notation:

P(X2 = b,X5 = c,X7 = a) = P(X32 = b,X35 = c,X37 = a)

= P(X1 = b,X4 = c,X6 = a)

Lecture 3 (Markov Sources) Information and Communication Theory 2021 3/∞

Page 4: Information and Communication Theory Lecture 3

Memoryless Sources

Memoryless source: for any L ∈ L and any {t1, ..., tL},

fXt1 ,...,XtL(x1, ..., xL) =

L∏i=1

fXti(xi)

...that is, symbols are independent.

Example: fX2,X5,X7(b, c, a) = fX2(b) fX5(c) fX7(a)

Memoryless stationary source:

fXt1 ,...,XtL(x1, ..., xL)

(memoryless)=

L∏i=1

fXti(xi)

(stationary)=

L∏i=1

fX1(xi)

Example: fX2,X5,X7(b, c, a) = fX1(b) fX1(c) fX1(a)

Lecture 3 (Markov Sources) Information and Communication Theory 2021 4/∞

Page 5: Information and Communication Theory Lecture 3

Markov Sources

Markov source: for any t ∈ N,

fXt+1|Xt,...,X1(xt+1|xt, ..., x1) = fXt+1|Xt

(xt+1|xt)

In other notation

P(Xt+1 = xt+1|Xt = xt, ..., X1 = x1

)= P

(Xt+1 = xt+1|Xt = xt

)Time-invariant Markov source: for any t and any a, b ∈ X

fXt+1|Xt(b|a) = fX2|X1

(b|a)

Also required: the initial distribution: fX1(x) = P(X1 = x).

Lecture 3 (Markov Sources) Information and Communication Theory 2021 5/∞

Page 6: Information and Communication Theory Lecture 3

Markov Sources: Probability Transition Matrix

Time-invariant Markov source: for any t and any a, b ∈ X

fXt+1|Xt(b|a) = fX2|X1

(b|a) = Pa,b P =

P1,1 · · · P1,N...

. . ....

PN,1 · · · PN,N

Stochastic matrix (a.k.a. Markov matrix):

Pa,b ≥ 0, for all a, b ∈ {1, ..., N} andN∑b=1

Pa,b = 1.

Non-consecutive conditionals: Chapman-Kolmogorov equations,

fXt+1|Xt−1(b|a) =

∑xt

fXt+1|Xt(b|xt)fXt|Xt−1

(xt|a)

=∑xt

Pa,xtPxt,b = (P2)a,b

...generalizing:fXt+L|Xt

(b, a) = (PL)a,bLecture 3 (Markov Sources) Information and Communication Theory 2021 6/∞

Page 7: Information and Communication Theory Lecture 3

Markov Sources: Graph Representation

Consider the probability transition matrix

P =

0.4 0.6 00.5 0.3 0.20.2 0.7 0.1

...its graph representation (node = symbol = state) is

Exercise: compute P(X4 = 3|X2 = 1) = fX4|X2(3, 1).

Lecture 3 (Markov Sources) Information and Communication Theory 2021 7/∞

Page 8: Information and Communication Theory Lecture 3

Markov Sources: Computing Probabilities

The pair (P, fX1) provide a complete characterization of the source.

Probability of a sequence of consecutive symbols starting at t = 1:

fX1,X2,...,XL(x1, x2, ..., xL) = fX1(x1)Px1,x2Px2,x3 · · ·PxL−1,xL

Example: P(X1 = 4, X2 = 1, X3 = 8, X4 = 5) = P(X1 = 4)P4,1P1,8P8,5.

For non-consecutive symbols, just marginalize. Example:

fX2,X5,X7(8, 3, 9) =∑

x1,x3,x4,x6

fX1,X2,X3,X4,X5,X6,X7(x1, 8, x3, x4, 3, x6, 9)

=∑

x1,x3,x4,x6

fX1(x1)Px1,8P8,x3Px3,x4Px4,3P3,x6Px6,9

Exercise: considering P in slide 7 and fX1 uniform, computefX1,X2,X3(2, 3, 1), fX1,X2,X3(2, 1, 3), and fX2,X4(1, 3).

Lecture 3 (Markov Sources) Information and Communication Theory 2021 8/∞

Page 9: Information and Communication Theory Lecture 3

Markov Sources: Symbol/State Distribution

Distribution at time t+ 1:

fXt+1(xt+1) =∑xt∈X

fXt+1,Xt(xt+1, xt) (marginalization)

=∑xt∈X

fXt+1|Xt(xt+1|xt)︸ ︷︷ ︸

Pxt,xt+1

fXt(xt) (Bayes)

In matrix notation (recall that (Av)j =∑

iAj,ivi)

fXt+1=

fXt+1(1)...

fXt+1(N)

=

P1,1 · · · PN,1

.... . .

...P1,N · · · PN,N

fXt(1)

...fXt

(N)

= P′fXt

Generalizing: fXt+1 = P′P′ · · ·P′︸ ︷︷ ︸t times

fX1 = (P′)t fX1 = (Pt)′ fX1

Exercise: taking P in slide 7 and fX1 uniform, compute fX2 and fX3

Lecture 3 (Markov Sources) Information and Communication Theory 2021 9/∞

Page 10: Information and Communication Theory Lecture 3

Markov Sources: Stationary Distribution

Consider P from slide 7 and three different initial distributions

Clearly, the distribution fXt converges to the same limit

Stationary distribution: fixed point of its evolution (fXt+1 = fXt)

fXt+1 = P′fXt = fXt ⇔ fXt is eigenvector of P′ with eigenvalue 1

Notation: µ, where µ = P′µ

Example: for the matrix P in slide 7, µ = [49, 54, 12]T /115.

Lecture 3 (Markov Sources) Information and Communication Theory 2021 10/∞

Page 11: Information and Communication Theory Lecture 3

Irreducible and Aperiodic Sources

Irreducible Markov process: for any x, y ∈ X ,

there exists L ∈ N such that (PL)x,y > 0,

...i.e., it is possible to go from any state to any state, in a finitenumber of steps, with non-zero probability.

Aperiodic Markov process: if, for any x, gcd{L : (PL)x,x > 0} = 1.

Examples: a non-irreducible and a non-aperiodic source:

Exercise: is the source with P in slide 7 irreducible and aperiodic?

Lecture 3 (Markov Sources) Information and Communication Theory 2021 11/∞

Page 12: Information and Communication Theory Lecture 3

Perron-Frobenius Theorem

If a Markov process is irreducible and aperiodic, then

X matrix P′ has a simple eigenvalue 1.

X for any initial distribution fX1 ,

limt→∞

fXt= lim

t→∞(P′)tfX1

= µ, where µ = P′µ

An irreducible and aperiodic source is stationary if and only if fX1 = µ

Exercise: find the stationary distribution of the following Markovprocess, where α, β ≥ 0:

Is it irreducible and aperiodic for any α, β?

Lecture 3 (Markov Sources) Information and Communication Theory 2021 12/∞

Page 13: Information and Communication Theory Lecture 3

Entropy Rate

Random source/process X = (X1, X2, ..., Xt, ...)

The entropy rate is (if the limit exists)

H(X) = limt→∞

H(X1, X2, ..., Xt)

t

Particular case: stationary memoryless source:

H(X) = limt→∞

H(X1, X2, ..., Xt)

t= lim

t→∞

tH(X1)

t= H(X1)

Exercise: determine H(X) for the process in the exercise of slide 12.

Lecture 3 (Markov Sources) Information and Communication Theory 2021 13/∞

Page 14: Information and Communication Theory Lecture 3

Conditional Entropy Rate

The conditional entropy rate is (if the limit exists)

H ′(X) = limt→∞

H(Xt|Xt−1, ..., X1)

Particular case: memoryless (a) and stationary (b) source:

H ′(X) = limt→∞

H(Xt|Xt−1, ..., X1)(a)= lim

t→∞H(Xt)

(b)= H(X1)

Time-invariant (b), irreducible, aperiodic Markov (a) source:

H ′(X) = limt→∞

H(Xt|Xt−1, ..., X1)(a)= lim

t→∞H(Xt|Xt−1)

(b)= lim

t→∞

∑x

H(X2|X1 = x)fXt(x) =∑x

H(X2|X1 = x)µx

= −∑x

∑y

µxPx,y logPx,y

Exercise: determine H ′(X) for the process in the exercise of slide 12.

Lecture 3 (Markov Sources) Information and Communication Theory 2021 14/∞

Page 15: Information and Communication Theory Lecture 3

Entropy Rates of Stationary Processes

If X is stationary, H ′(X) exists:

H(Xt|Xt−1, ..., X2, X1) ≤ H(Xt|Xt−1, ..., X2) = H(Xt−1|Xt−2, ..., X1)

i.e., H(Xt|Xt−1, ..., X1) is a decreasing non-negative sequence, thusit converges.

Cesaro mean theorem: limt→∞

at = a ⇒ limn→∞

1

n

n∑t=1

at = a

If X is stationary, H(X) = H ′(X):

H(X) = limt→∞

H(X1, X2, ..., Xt)

t

= limt→∞

1

t

t∑n=1

H(Xn|Xn−1, ..., X1) (chain rule)

= limt→∞

H(Xn|Xn−1, ..., X1) (Cesaro mean)

= H ′(X)Lecture 3 (Markov Sources) Information and Communication Theory 2021 15/∞

Page 16: Information and Communication Theory Lecture 3

Exercises

Consider a time-invariant Markov source producing symbols from thealphabet X = {1, 2, 3}, with probability transition matrix:

P =

1/2 1/4 1/41/4 1/4 1/20 1/2 1/2

Find the stationary distribution of this source and its conditionalentropy rate. Compare with the entropy of the stationary distributionand comment the result.

Repeat the previous problem, now with the following transition matrix:

P =

1/2 1/4 1/40 0 11 0 0

Explain why a Markov source is stationary if and only if fX1 = µ.

Lecture 3 (Markov Sources) Information and Communication Theory 2021 16/∞

Page 17: Information and Communication Theory Lecture 3

Higher-Order Markov Sources

This slide uses a more compact notation: simply p(·) = fX(·).

Order-n Markov source: for any t ∈ N,

p(xt+1|xt, xt−1, ..., x1︸ ︷︷ ︸all the past

) = p(xt+1|xt, ..., xt−n+1︸ ︷︷ ︸n previous

)

Consider p(xt+1, xt, ..., xt−n+2︸ ︷︷ ︸conditionally

deterministic

|xt, xt−1, ..., xt−n+1).

Lifting: defining zt = (xt, xt−1, ..., xt−n+1) ∈ X n,

p(xt+1, xt, ..., xt−n+2|xt, xt−1, ..., xt−n+1) = p(zt+1|zt)

... the lifted source is order-1 Markov

Probability transition matrix of the lifted source: P ∈ Nn ×Nn

Lecture 3 (Markov Sources) Information and Communication Theory 2021 17/∞

Page 18: Information and Communication Theory Lecture 3

Higher-Order Markov Sources

Example of order-2 Markov source, with X = {1, 2}, and thefollowing conditional probabilities

p(xt+1|xt, xt−1) xt+1

(xt, xt−1) 1 2

(1, 1) 0.1 0.9(1, 2) 0.6 0.4(2, 1) 0.3 0.7(2, 2) 1 0

After lifting, zt = (xt, xt−1)

p(zt+1|zt) zt+1 = (xt+1, xt)zt = (xt, xt−1) (1, 1) (1, 2) (2, 1) (2, 2)

(1, 1) 0.1 0 0.9 0(1, 2) 0.6 0 0.4 0(2, 1) 0 0.3 0 0.7(2, 2) 0 1 0 0

Probability transition matrix of the lifted source: P ∈ 22× 22 = 4× 4.

Lecture 3 (Markov Sources) Information and Communication Theory 2021 18/∞

Page 19: Information and Communication Theory Lecture 3

Markov Models of English

Uniform distribution over X = {A,B, ..., Z, } (N=27):H(X) = log2 27 ' 4.75 bits/symbol

XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD

QPAAMKBZAACIBZLHJQD

Memoryless model w/ estimated prob.: H(X) ' 4.07 bits/symbol

OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA

OOBTTVA NAH BRL

Order-1 Markov model: H(X) ' 3.36 bits/symbol

ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D

ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TO BE SEACE

Order-2 Markov model: H(X) ' 2.77 bits/symbol

IN NO IS LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF

DEMONSTRURES OF THE REPTAGIN IS REGOACTIONA OF CRE

Lecture 3 (Markov Sources) Information and Communication Theory 2021 19/∞

Page 20: Information and Communication Theory Lecture 3

Exercises

Consider a source with alphabet X = {a, b, c}, restricted not to generate thesame symbol in two consecutive times, but apart from that the symbols areequiprobable. Is this source memoryless? Write a model for the source,compute its stationary distribution µ and conditional entropy rate H ′(X).

Repeat the previous but now the restriction is that the source does notgenerate the same symbol in three consecutive times.

What are the entropies of memoryless sources with distributions equal to thestationary distributions of the sources above?

Compute the stationary distribution of a source with transition matrix

P =

1/3 1/2 1/61/6 1/3 1/21/2 1/6 1/3

,as well as its conditional entropy rate.

Lecture 3 (Markov Sources) Information and Communication Theory 2021 20/∞

Page 21: Information and Communication Theory Lecture 3

Recommended Reading

T. Cover and J. Thomas, “Elements of Information Theory”, JohnWiley & Sons, 2006 (Chapter 3).

Lecture 3 (Markov Sources) Information and Communication Theory 2021 21/∞