exercises information theory

Information Theory Exercise

Anil Mengi, M.Sc.

Source Coding: Source CodingProblem 1: Source Coding

We consider 64 squares on a chess board.

(a) How many bits do you need to represent each square?

(b) In a game on a chessboard one player has to guess where his opponent has placed theQueen. You are allowed to ask six questions which must be answered truthfully by ayes/no reply. Design a strategy by which you can always find the Queen. Show thatyou can not ensure the exact position when you are allowed to ask five questions.

(c) How do you interpret your result in (b) together with your result in (a)?

Suggest solution:a. As ld64 = 6, so we need at least 6 bits to represent each square.b. Considering the worst situation, in which we always make the wrong guess. For everyyes/no reply, we can still eliminate half possible positions, so after 6 questions, there isonly 6426 = 1 position left. When we are allowed to ask only 5 questions, there shouldbe 64 25 = 2 probable position left.c. As each sequare has 6 bits to represent, 6 questions are just related to each bit, that isalso why we cant be sure with only 5 questions.

Problem 2: Source Coding

A language has an alphabet of five letters, xi, i = 1, 2, ..., 5, each occurring with proba-bility 1/5. Find the number of bits needed of a fixed-length binary code in which

(a) Each letter is encoded separately into a binary sequence.

(b) Two letters at a time are encoded into a binary sequence.

(c) Three letters at a time are encoded into a binary sequence.

Which method is efficient in the sense of bit per letter?

Suggest solution:a. we have 5 letters to code, so we need ld5 = 3 bits. B/L = 3/5 = 0.6b. we have 52 = 25 probable letters to code, so we need ld25 = 5 bits. B/L = 5/25 = 0.2c. we have 53 = 125 probable letters to code, so we need ld125 = 9 bits. B/L = 9/125 =0.72Obviously C is the most efficient method.

1

Problem 3: Entropy

Let p(x, y) be given by the following figure

1/4

YX 0 21

0 0

1/4

1/4

1/8

0

0

0

1/8

1

2

Find

H(X), H(Y ), H(X|Y ), H(Y |X), H(X, Y ), H(Y )H(Y |X)

Draw a Venn diagram for the quantities you found.

H(X) = P (X = 0)ldP (X = 0) P (X = 1)ldP (X = 1) P (X = 2)ldP (X = 2)= (P (X = 0, Y = 0) + P (X = 0, Y = 2))ld(P (X = 0, Y = 0) + P (X = 0, Y =2)) P (X = 1, Y = 1)ldP (X = 1, Y = 1) (P (X = 2, Y = 0) + P (X = 2, Y =2))ld(P (X = 2, Y = 0) + P (X = 2, Y = 2)))= (3/8)ld(3/8) (1/4)ld(1/4) (3/8)ld(3/8)= 11/4 9ld3

4

As the given diagram is symetric, which means we can get the same result of H(Y) easilyby changing the position of X and Y in the above fomular.H(Y ) = H(X) = 11/4 9ld3

4

There are 5 probable positions for point P, soH(X, Y ) = 3 1

4ld4 + 2 1

8ld8 = 9

4

As H(X) = H(Y ), H(X|Y ) = H(Y |X) = H(X, Y )H(X) = 9ld34 1

2

H(Y )H(Y |X) = 11/4 9ld34 (9ld3

4 1

2) = 13

4 9ld3

2

2

Problem 4: Source Coding

A language has an alphabet of eight letters, xi, i = 1, 2, ..., 8, with probabilities0.25, 0.20, 0.15, 0.12, 0.10, 0.08, 0.05, and 0.05.

(a) Determine a binary code for the source output.

(b) Determine the average number of binary digits per source letter.

(c) Calculate the entropy for the language given above.

(d) Check your result in (b) with the entropy calculated in (c). Is the code determinedin (a) optimum?

Suggest solution:a. we can consider many codes, ont all of them are necessarily optimum.For example, we distribute each letter with equally 3 bits.b. 3.c. H(X) = /sumPldP = 2.7979d. Obviously not, we can improve it by using Hoffman code.

Problem 5: Entropy

Consider the outcome of a fair dice throw and let D be the number of dots on the topface. The random variables X, Y, and Z are defined over this sample space as follows:

X{1, 2, 3, 4, 5, 6} with {X =i} = {the number is i}

Y {e, o} with {Y = e/o} = {the number is even / odd}

Z{s, b} with {Z = s/b} = {the number is small / big},

where small means 6 3 and big means > 3 Make a table showing the mapping ofrandom variables.

Determine H(X),H(Y ), H(Z), H(X, Y ), H(Y, Z).

Determine H(X|Y ), H(Y |X), H(Y |Z).

Suggest solution:H(X) = /sumPldP = 6 1

6ld6 = ld6

H(Y ) = H(Z) = 2 12ld2 = 1

Because both Y and Z are decided by X, H(XY ) = H(X) = ld6H(Y Z) = /sumPldP = 2 (1

3ld3 + 1

6ld6) = ld3 + 1

3

H(X|Y ) = H(XY )H(Y ) = ld3H(Y |X) = H(XY )H(X) = 0H(Y |Z) = H(Y Z)H(Z) = ld3 2

3

Problem 6: Entropy

3

yaxis

xaxis

Consider the following figure.The random pair (X, Y ) can only take values (0,0), (3,3), (4,0), (1,1), (1,3), and (3,2)

equally likely. Determine H(X), H(Y ), H(X|Y ), and H(X, Y ).

4

Problem 7: Entropy

X2

X1

Z

Y

X1 {0, 1} and X2 {0, 1} are the inputs, where P (X1 = 1) = 1 P (X1 = 0) = p1and P (X2 = 1) = 1 P (X2 = 0) = p2. Z represents the noise, where Z {0, 1} andPr{Z = 1} = 1 Pr{Z = 0} = 1/3. represents the real multiplication and repre-sents the XOR operation. The channel output Y {0, 1} is given as

Y = (X1 Z)X2.

(Q1) Compute H(X2|Y ) and H(Y ).

(Q2) Let p(X1 = 1) = p(X1 = 0) = p(X2 = 0) = p(X2 = 1) = 1/2. Calculate the mutualinformation I(X2;Y ).

Problem 8: Entropy

Consider the following transmission channel

Observer

BSCX Y

Z

A binary symmetric channel (BSC) with crossover probability p has input X andoutput Y . The input X = 0 is used with probability q. The observer indicates Z = 0whenever X = Y and Z = 1 otherwise.

(Q1) What is the uncertainty H(Z) in the observer output?

(Q2) What is the capacity and capacity achieving input distribution if the receiver isprovided with both Y and Z?

5

Problem 9: Fano Inequality

Let the following channel X to Y be given, see figure. Pr{X = 0} = 1 Pr{X =1} = 2/3.

1/2

3/4

1/4

1/2

0

1 1

0

? YX

1. Compute H(X|Y ).

2. Calculate the error probability Pe.

3. Compute the Fano inequality for H(X|Y ).

Problem 10: Connection probabilities

Consider the relation between height L and weight W shown in the figure, indicatingtall people tend to be heavier than short people.

1/21/2

1/2

1/2

1/2

1/2

1/2

1/4

1/41/4

1/4

1/4

1/4

Very tall 1/8

Tall 1/4

Average 1/4

Short 1/4

Very short 1/8

Very heavy

Heavy

Average

Light

Very light

Weight(W)Height (L)

(a) What is the entropy of L?

(b) What is the conditional entropy H(W |L)?

(c) Find the probabilities of the weight categories?

(d) Flip the channel to find the reverse transition probabilities?

(e) Find the mutual information I(L;W ), which is how much information, on average,about a persons height is given by his or her weight.

6

Problem 11: Mutual Information

Consider the points A, B, C, and D in the figure below.

B(0,0,0)

D (0,0,1)

C

(0,1,0)

A(1,0,0)

X

Y

Z

A point P = (X, Y, Z) is selected with probabilities Pr{P = A} = Pr{P = B} =Pr{P = C} = Pr{P = D} = 1/4. So X, Y and Z are the coordinates of P . ComputeI(X;Y ), I(X;Y |Z), I(X;Y, Z).

Problem 12: Inequalities

Let X, Y and Z be joint random variables. Prove the following inequalities and findconditions for equality.

H(X, Y |Z) H(X|Z).

I(X, Y ;Z) I(X;Z).

H(X, Y, Z)H(X, Y ) H(X,Z)H(X)

I(X;Z|Y ) = I(Z;Y |X) I(Z;Y ) + I(X;Z)

Problem 13: Fano Inequality

Let the random variables X and Y denote the input and the output of a channel, whereX, Y {0, 1, 2, 3, 4}. All input values X are equally probable. The channel is character-ized by the transition probabilities

pY |X(y|x) =

{1/2 y = x,1/8 y 6= x,

(1)

for all x. Apply Fano inequality to this example and interpret the results as an askingstrategy.

7

Problem 14: Typical Sequences

A binary memoryless source U {0, 1} defined by the probability pu(0) = 0.98, pu(1) =0.02 generates sequences of length L. Let L = 100 and = = ld7

50.

Which sequence is the most likely sequence?

Is the most likely sequence an element of the typical set A? Prove your resultnumerically.

How many typical sequences exist?

Problem 15: Given the following channel with two inputs X1 and X2 and theoutput Y.

Y

X1 : {0, 1}X2 : {0, 1}Y : {0, 1, 2}+ = real addition* = real multiplication

X1

X2

Also we have

Pr(X1 = 1) = 1 Pr(X1 = 0) = p1, 0 p1 1Pr(X2 = 1) = 1 Pr(X2 = 0) = p2, 0 p2 1.

(a) Compute H(Y ), H(Y |X1), H(Y |X2), I(X1;Y |X2) in bits.

(b) Determine the input probabilities for X1 and X2 that maximizes H(Y ).

Problem 16: Typical sequences

An information source produces independent binary symbols with p(0) = p and p(1) =1-p with p > 0.5 and an information sequence of 16 binary symbols. A typical sequenceis defined to have two or fewer symbols of 1.

(Q1) What is the most probable sequence that can be generated by this source and whatis its probability?

(Q2) What is the number of typical sequences that can be generated by this source?

We assign a unique binary codeword for each typical sequence and neglect the non-typical sequences.

(Q3) If the assigned codewords are all of the same length, find the minimum codewordlength required to provide the above set with distinct codewords.

(Q4) Determine the probability that a sequence is not assigned with a codeword.

8

Problem 17: Channel Capacity

Determine the channel capacity of the following channels.

Channel1:

X

0

2

1 Y

0

1

1/2

1/2

1

1

Channel2:

00

X Y

1

1/2

1/21

1

9

Problem 18: Cascaded Channel Capacity

Consider the given two discrete memoryless channel (DMC) models. Two channels canbe cascaded such that the output of the first one is the input of the second one. Let Xdenote the input of the first channel, Y the output of the first and the input of the secondchannel, and Z the output of the second channel.

1

0

2

1X Y

0

Y

0

1 2

1

0p

p

Z

DMC2

1/2

1/2

DMC1

Determine the missing transition probabilities.

Determine the transition probabilities of the concatenated DMC with input X andoutput Z.

DMC1 and DMC2 are split up again and a channel encoder is used between DMC1and DMC2, which controls the input distribution of DMC2 such that the channelcapacity of DMC2 is achieved.

Determine the channel capacity of this system.

Problem 19: Which of the following models has/have a channel capacity differentfrom C=1 bit/channel symbol?

X

1

0

Y

1

0

X

X

0

Y

0

1

0

1

X Y00

10

X Y0

1

01

0

22

1

0

X Y

33

Y

10

Problem 20: Capacity

Given the following channel.

noise

Z : {1,0,1}p(z=1)=p(z=0)=p(z=1)=1/3

Y=X+Z

| Z | :{0,1}side information

receiverchannel

transmitter

X : {0,1}

real addition

(Q1) If the receiver uses side information, i.e. the absolute value of Z, what is the ca-pacity C1 of the channel in bits per transmission?

(Q2) If the receiver can not access side information, i.e. the receiver does not know theabsolute value of Z, what is the capacity C2 of the channel in bits per transmission?

(Q3) Now, let the transmitter change its alphabet to {0, 2}. Determine again C1 and C2in this case.

Problem 21: Huffman Code

Consider a random source with statistically independent source symbols qi, 1 i 8.The distribution of the source is given as follows:

Q q1 q2 q3 q4 q5 q6 q7 q8p(q) 0.5 0.1 0.1 0.1 0.1 0.05 0.025 0.025

a) Determine the entropy of the source and compare the result to a source with eightidentically distributed symbols. (Hint: ld10 3.32).

b) Construct an optimal binary prefix-free code for the given source.

c) Determine the average code word length of the constructed code by means of pathlength lemma. Compare the result to the entropy.

d) Determine the sequence of code bits for the following sequence of source symbols:q = [q1q4q8q6q1q7].

e) Determine the code word length for the given sequence. Compare the result to theaverage code word length.

11

Problem 22: Lempel-Ziv

An alphabet {a, b, c} is given. The code table is indexed with (#1, #2, #3).

a) Decode the message #1#2#3#4#5 and construct a code table.

b) Encode the string aacbacaca of an alphabet {a, b, c} with the Lempel-Ziv algorithm.Show the construction of the code table and the coded string in detail.

c) Decode the code which you have gained and show how the code table is built updynamically.

Problem 23: Shannon Code

Consider the following method for generating a code for a random variable X whichtakes on m values {1, 2, ..., m} with probabilities p1, p2, ..., pm. Assume that the probabil-ities are ordered so that p1 p2 ... pm. Define

Fi =i1k=1

pi, (2)

the sum of the probabilities of all symbols less than i. Then the codeword for i is thenumber Fi [0, 1] rounded off to li bits, where li = log

1

pi.

a) Show that the code constructed by this process is prefix-free and the average lengthsatisfies

H(X) L H(X) + 1. (3)

b) Construct the code for the probability distribution (0.5, 0.25, 0.125, 0.125).

Problem 24: Enumerative Coding

Let S be a set of binary sequences of length 13 with 3 ones. What is the sequence forindex 95 in the lower lexicographical ordering of S? Hint: Apply Enumerative decoding.

12

Problem 25: QuantizationLet X denote a source which produces given values with a distribution:

x 0.3 0.7 0.5 1.8 1.1 0.45 1.2 0.1p(x) 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8

Assume that the source X is followed by a quantizer which uses four level of quantizationgiven as

quantized value interval0.45 0 < x 0.50.7 0.5 < x 11.15 1 < x 1.51.8 1.5 < x 2

Find the entropy of the quantized source?

Problem 26: Data Reduction

Recall the chapter data reduction. Apply the given system to the input:

b b g gb b g gb b o ob b o o

where b represents blue, g represents green and o represents orange. Use thetransform matrix T given as:

1

2

1 1 1 11 1 1 11 1 1 11 1 1 1

Use the same quantization levels given in the slide and show all your steps.

Problem 27: Error Detection

A binary code has block length 6 and given as:

A: 000000B: 001111C: 111100D: 111111

The information is transmitted over a binary symmetric channel with cross-over prob-ability given as p. Calculate the probability of a detection error for A, B, C, and D.

Problem 28: Data Reduction

Check the slide number 17 in chapter error detection. Why is the number 1s in C(x)is even?

13

Problem 29: Error Detection

The information packet (1 0 1 1) is written as A(x) = 1 + x2 + x3. Given that A(x)divides xi + 1, what is the smallest i?


Consider a random source with statistically independent source symbols qi, 1 i 8.The distribution of the source is given as follows:

Q q1 q2 q3 q4 q5 q6 q7 q8p(q) 0.5 0.1 0.1 0.1 0.1 0.05 0.025 0.025

a) Determine the entropy of the source and compare the result to a source with eightidentically distributed symbols. (Hint: ld10 3.32).

b) Construct an optimal binary prefix-free code for the given source.

c) Determine the average code word length of the constructed code by means of pathlength lemma. Compare the result to the entropy.

d) Determine the sequence of code bits for the following sequence of source symbols:q = [q1q4q8q6q1q7].

e) Determine the code word length for the given sequence. Compare the result to theaverage code word length.


Let Q denote a source with the following distribution:

Q q1 q2 q3 q4 q5 q6 q7p(q) 0.3 0.2 0.1 0.1 0.1 0.1 0.1

a) Construct a binary Huffman code.

b) Determine the sequence of code bits for the following sequence of source symbols:q = [q2q1q4q1q1q3q7].

c) Decode the resulting code bit sequence.

d) Introduce a bit error in the sequence of code bits by flipping the 4th code bit. Decodethe resulting code bit sequence.

Problem 32: Lempel-Ziv Code

A source bit sequence is given as

[00101010011001001100111111100100]

Assume that codebook is initialized with 0 and 1 and limited to 16 entries at thetransmitter as well as at the receiver.

Encode this sequence according to LZ78-Lempel-Zivs algorithm.

14

The coded bits are transmitted error-free. Recover the original sequence of sourcebits back from the sequence of code bits.

Introduce a bit error in the code bit sequence by flipping the 5th code bit. Decodethe resulting (erroneous) code bit sequence.

Problem 1: Capacity

Given the following channel.

noise

Z : {1,0,1}p(z=1)=p(z=0)=p(z=1)=1/3

Y=X+Z

| Z | :{0,1}side information

receiverchannel

transmitter

X : {0,1}

real addition

(Q1) If the receiver uses side information, i.e. the absolute value of Z, what is the ca-pacity C1 of the channel in bits per transmission?

(Q2) If the receiver can not access side information, i.e. the receiver does not know theabsolute value of Z, what is the capacity C2 of the channel in bits per transmission?

(Q3) Now, let the transmitter change its alphabet to {0, 2}. Determine again C1 and C2in this case.

15

exercises information theory

Documents