channel coding.pdf

ECE 544 – Digital Communications

Besma SMIDA

Chapter 5

Fall 2014

B. Smida (ECE) ECE 544 Fall 2014 1 / 39

Differential Entropy

Definition:

The differential entropy h(X ) of a continuous random variable X withdensity f (x) is defined as

h(X ) = −∫

S

f (x) log f (x)dx = −E[log f (x)],

where S is the support set of the random variable.

Remember that we use the convention 0 log 0 = 0


Properties of Differential Entropy

h(X ) does not give the amount of information in X

h(X ) is not necessarily positive

h(X ) changes with a change of coordinate systemExamples:

1 Uniform distribution: X ∼ U(a, b)

h(X ) = −∫ b

a

1

b − alog

1

b − adx = log(b − a)

2 Gaussian Distribution: X ∼ N(µ, σ2), f (x) = 1√2πσ2

exp(

−(x−µ)2

2σ2

)

h(X ) = − log(e)

∫ ∞

−∞

f (x) ln f (x)dx

= − log(e)

∫ ∞

−∞

f (x)(− ln(2πσ2)

2− (x − µ)2

2σ2)dx

=1

2log(e)(ln(2πσ2) + 1) =

1

2log(2πeσ2) bits


Two Statements

Consider information transmission over an additive white Gaussiannoise (AWGN) channel. The average transmitted signal power is P ,the noise power spectral density is N0/2 and the bandwidth is W .Two statements are:

1 For each transmission rate rb, there is a corresponding limit on theprobability of bit error one can achieve.

2 For some appropriate signalling rate rb , there is no limit on theprobability of bit error one can achieve, i.e., one can achieve error-freetransmission.

Which statement sounds reasonable to you?


Shannon’s Channel Capacity

Shannon proved that it is theoretically possible to transmitinformation at any rate rb, where rb < C , with an arbitrarily smallerror probability by using a sufficiently complicated coding scheme.For rb > C , it is not possible to achieve an arbitrarily small errorprobability.

Shannon’s work showed that the values of P , N0 and W set a limiton transmission rate, not on error probability!


Discrete-time Gaussian Channel Capacity

Definition of Gaussian Channel

Xi Yi

Zi

Definition:

(Gaussian channel) Discrete-time channel with input Xi , noise Zi , andoutput Yi at time i . This is

Yi = Xi + Zi ,

where the noise Zi is drawn i.i.d. from N (0,N) and assumed to beindependent of the signal Xi .

Average power constraint∑

n

i=1 x2i

n≤ P ⇒ E [X 2] ≤ P .

E [Y 2] = E [(X + Z )2] = E [X 2] + 2E [X ]E [Z ] + E [Z 2] ≤ P + N



Information Capacity

Definition:

The information capacity with power constraint P is

C = maxE [X 2]≤P

I (X ;Y ).

I (X ;Y ) = h(Y )− h(Y |X ) = h(Y )− h(X + Z |X )

= h(Y )− h(Z |X ) = h(Y )− h(Z )

≤ 1

2log(2πe(P + N)) − 1

2log(2πeN)

=1

2log(1 +

P

N)

The optimum input is Gaussian and the worst noise is Gaussian



Sphere Packing

Each transmitted xi is received as a probabilistic cloud yiCloud ’radius’ =

√Var(Y |X ) =

√nN

Energy of yi constrained to n(P + N) so clouds must fit into ahypersphere of radius

√n(P + N)

Volume of hypersphere ∝ rn

Max number of non-overlapping clouds:

(nP + nN)n

2

(nN)n

2

= 2n12log(1+ P

N)

Max rate is 12 log(1 +

P

N)


Bandlimited Channel

Bandlimited continuous-time channel

Definition:

A bandlimited continuous-time channel with white noise :

Y (t) = (X (t) + Z (t)) ∗ h(t),

where X (t) is the signal waveform, Z (t) is the waveform for the whiteGaussian noise, and h(t) is the impulse response of an ideal bandpassfilter, which cuts out all frequencies greater than W .

Theorem:

Suppose that a function f (t) is bandlimited to W , namely, the spectrumof the function is 0 for all frequencies greater than W . Then the functionis completely determined by samples of the function spaced 1/2W secondsapart.


Bandlimited Channel

Bandlimited Channel

Channel bandlimited to f ∈ [−W ,W ] and signal duration T

Nyquist: Signal is completely defined by 2WT samples

Can represent as a n = 2WT -dimensional vector space with prolatespheroidal functions as an orthogonal basis

White noise with double-sided psd 12N0 becomes i.i.d. Gaussian

N (0, 12N0)

Signal power constraint P ⇒ Signal energy ≤ PTEnergy constraint per sample = 1

2P/W

Capacity:

C =1

2log(1 +

1

2P/W (

1

2N0)

−1)2W

= W log(1 +P

WN0) bits/second


Bandlimited Channel

Summary

Capacity of discrete-time Gaussian channel:

C =1

2log

(1 +

P

N

)bits per transmission.

Capacity per sample:

C =1

2log

(1 +

P/2W

N0/2

)=

1

2log

(1 +

P

N0W

)bits per sample.

Capacity of a bandlimited continuous-time channel with white noise:

C = W log

(1 +

P

N0W

)bits per second.

W → ∞ gives

C =P

N0log2 e bits per second.


Bandlimited Channel

Shannon Capacity

Bandwidth W Hz, signal variance 12P/W , Noise variance 1

2No

Signal power=P, Noise power=NoW , min bit energy Eb = P/C

Capacity= C = W log(1 + P

WN0)

Define W0 = P/N0 ⇒ C

W0= W

W0log(1 + W0

W) →W→∞= log e

Eb

N0= W0

C→W→∞ ln 2 = −1.6 dB


Bandlimited Channel

Shannon Capacity

Since, C = W log(1 + P

WN0) then C

W= log(1 + C

W

Eb

N0) and

Eb

N0=

2C/W − 1

C/W


Practical Coding Schemes

Formal definition of Channel Coding

Definition: Channel code

An (M, n) code for the channel (X , p(y |x),Y) consists of the following:

1 An index set {1, 2, . . . ,M} over messages W .

2 An encoding function X n : {1, 2, . . . ,M} → X n, yielding codewordsxn(1), xn(2), . . . , xn(M). (This set is called the codebook C.)xn(W ) passes through the channel and is received as a randomsequence Y n ∼ p(yn | xn).

3 A (deterministic) decoding function

g : Yn → {1, 2, ., , , .M},

which is an estimator W = g(Y n) of W ∈ {1, 2, . . . ,M}. It declaresan error if W 6= W .




From a constructive point of view, the object of coding is to introduceredundancy in such a way that errors can be detected and corrected.

Repetition code: rate of the code goes to zero with block length.

Error-detecting code: adding one parity bit enables the detection ofan odd number of errors.

Linear error-correcting code: vector space structure allows for aparity-check matrix to detect, locate, and correct multiple errors.



Linear Block Codes

The encoder is mapping

f : {0, 1}k → {0, 1}n

called (n, k) codes

A channel code maps 2k messages into a larger space of cardinality 2n.

A good channel code would have 2k messages all evenly and sparselydistributed in the space.

A block code is called a linear block code iff its 2k codewords form ak-dimensional subspace of the n-dimensional vector space

Linear block codes:x = f (s) = s · G

where G is an k × n matrix with binary entries.



Properties of Linear Codes

The codebook of a linear code is

C ={x ∈ {0, 1}n : x = s · G for some s ∈ {0, 1}k

}

0n = G · 0k is always a codeword

if x1, x2 ∈ C , then x1 + x2 = (s1 + s2) · G is also a codeword

For any (n, k) linear code, there exists a parity check matrix H ofdimension (n − k)× n such that

x · HT = 01×(n−k) iff x ∈ C

This implies that G · HT = 0k×(n−k) .



Properties of Linear Codes

If C is a code with a generating matrix G in standard form,

G = [Ik |A],

where Ik denotes the identity matrix and A is some k × (n − k)matrix, then

H = [AT |In−k ]

is a check matrix for C.

dmin = minx∈C |x | is closely related to the error correcting ability ofthe code.

If a code has a minimum distance dmin, it is guaranteed to correct⌊(dmin − 1)/2⌋ errors.



Syndrome: Definition and properties

The generating matrix G is used in the encoder at the transmitter.On the other hand, the parity check matrix H is used in the decodingoperation at the receiver.

If we express the received vector r as the sum of of the original code xand a vector e (r = x + e) then,

r · HT = x · HT + e · HT = 01×(n−k) + s

The syndrome depends only on the error pattern, and not on thetransmitted codeword.

All error patterns that differ by a code word have the same syndrome.



Reed-Muller Codes

Reed-Muller codes are listed as RM(d , r), where d is the order of thecode, and r is parameter related to the length of code, n = 2r .

Reed-Muller codes can be defined very simply by in terms of Booleanfunction. To define codes of length n = 2r , we need r vectors(v1, v2, . . . vr ).

A generating matrix for a Reed-Muller code of length n and order dcan be constructed like this:

1, v1, v2, . . . , vr , v1v2, vr−1vr , . . . (up to degree d)

RM(d, r) codes are parity check codes of length n = 2r , rate Rc = k

n

and minimum distance 2r−d , where

k = 1 +

(r1

)+ . . .+

(rd

)



Reed-Muller Codes: Examples

RM(1, r) codes are parity check codes of length n = 2r , rateRc = r+1

nand minimum distance n

2 .RM(1,3) has the following generating matrix:

G =

1 1 1 1 1 1 1 10 0 0 0 1 1 1 10 0 1 1 0 0 1 10 1 0 1 0 1 0 1

RM(2,3) has the following generating matrix:

G =

1 1 1 1 1 1 1 10 0 0 0 1 1 1 10 0 1 1 0 0 1 10 1 0 1 0 1 0 10 0 0 0 0 0 1 10 0 0 0 0 1 0 10 0 0 1 0 0 0 1



Construction of new codes from old

Dual code: If C is an (n, k) linear code, it is dual or orthogonal codeC⊥ is the set of the vectors which are

C⊥ = {u|u.v = 0 for all v ∈ C}.

If C has generator matrix G and parity check matrix H, then C⊥ hasgenerator matrix H and parity check matrix G .

Product Codes: A simple method of combining two codes.

If we have two systematic linear block codes C1(n1, k1) and C2(n2, k2),the product of these codes is an (n1n2, k1k2) linear block code.The minimum distance of the product code is dmin = dmin1dmin2

Concatenated Codes: In concatenated coding two codes, one binaryand one non-binary are concatenated such that the codewords of thebinary code are treated a symbols of the the non-binary code.



A first definition for cyclic codes

Definition: Cyclic code

A (n, k) linear code is called cyclic if any cyclic shift of a codeword is acodeword, i.e.

(c0, . . . , cn−1) ∈ C ⇒ (cn−1, c0, . . . , cn−2) ∈ C

C = {(0000), (1111)}C = {(000), (110), (101), (011)}C = {(000), (100), (011), (111)}



Cyclic codes: an algebraic correspondence

There is a bijective correspondence between the codewords andresidue class of polynomials.

(c0, . . . , cn−1) ↔ c(x) = c0 + c1x + . . .+ cn−1xn−1

Multiplying by x modulo xn − 1 corresponds to cyclic shift:

(c0, . . . , cn−1) ∈ C ⇒ (cn−1, c0, . . . , cn−2) ∈ C

x(c0+c1x+. . .+cn−1xn−1) mod(xn−1) = cn−1+c0x+. . .+cn−2x

n−1

hence, we say that

c(1)(x) = xc(x)mod(xn − 1)



Cyclic codes: an algebraic correspondence

In a similar manner, if c(x) represents a codeword then x ic(x)modulo xn − 1 is also a codeword

c(i)(x) = x ic(x)mod(xn − 1) ⇒ c(i)(x) = q(x)(xn − 1) + x ic(x) (1)

The polynomial xn − 1 and its factor play a major role in thegeneration of cyclic code. Let g(x) be a polynomial of degree n− kthat is a factor of (xn − 1)

g(x) = 1 +

n−k−1∑

i=1

gixi + xn−k

A cyclic code is uniquely determined by the generator polynomialg(x) in that each code polynomial in the code can be expressed inthe form of

c(x) = a(x)g(x)

c(x) satisfies the condition of (1) since g(x) is factor of xn − 1



Generator Polynomial

Let the message polynomial m(x) = m0 +m1x + . . .+mk−1xk−1 and

the parity polynomial b(x) = b0 + b1x + . . .+ bn−k−1xn−k−1

we want the code polynomial to be in the form (systematic)

c(x) = b(x) + xn−km(x)

hence,a(x)g(x) = b(x) + xn−km(x)

we may write:xn−km(x) = a(x)g(x) + b(x)



Systematic Cyclic Encoder

1 Multiplying the message polynomial m(x) by xn−k

2 Dividing xn−km(x) by g(x) to obtain the remainder r(x)

3 Adding r(x) to xn−km(x)



Polynomial encoding: Systematic method

Let C be an [7, 4] cyclic code, g(x) = x3 + x2 + 1 and m = (1010).

find the associated polynomial: m(x) = x2 + 1multiply it by xn−k : (x2 + 1)x3 = x5 + x3

divide this results by g(x) and take the remainder r(x):

x5 + x3 = (x2 + x)(x3 + x2 + 1) + x2 + x

r(x) as an (n − k)−vector: r = (011)encoding using r , followed by the message (1010):

(1010) ⇒ (0111010)



Parity-Check Polynomial

A (n, k) cyclic code is uniquely specified by its generator polynomialg(x) of order n− k , it is also uniquely specified by the parity-checkpolynomial of degree k

h(x) = 1 +

k−1∑

i=1

hixi + xk

The generator polynomial g(x) is equivalent to the generator matrixG as a description of the code. The parity-check polynomial h(x) isan equivalent representation of the parity-check matrix H.

The matrix relation GHT = 0 for linear block codes corresponds tothe relationship

g(x)h(x)mod(xn − 1) = 0 or g(x)h(x) = xn − 1



Generator and Parity-check Matrices

The generator polynomial g(x) is equivalent to the generator matrixG as a description of the code.

We may construct the generator matrix G by the k-polynomials:

g(x), xg(x), x2g(x), x3g(x), . . . , xk−1g(x)

The parity-check polynomial h(x) is an equivalent representation ofthe parity-check matrix H.

We define the reciprocal of the parity-check polynomial as:

xkh(x−1) = 1 +

k−1∑

i=1

hk−ixi + xk

which is also a factor of xn − 1

The following (n − k) polynomials may now be used in rows of the Hmatrix:

xkh(x−1), xk+1h(x−1), . . . , xn−1h(x−1)



Encoder for cyclic codes: feedback register



Calculation of the Syndrome

Let the received word be represented by the polynomial r(x).

We may express r(x) as :

r(x) = q(x)g(x) + s(x)



Hamming code revisited

For n = 7 we start by factorizing x7 − 1

x7 − 1 = (1 + x)(1 + x2 + x3)(1 + x + x3)

For example, let us take

g(x) = 1 + x + x3

andh(x) = 1 + x + x2 + x4

Find G and H.



Polynomial decoding: Systematic method

Suppose that the decoder receive exactly the sent word (but is doesnot know it!)

v = (1010011) ⇒ v(x) = x6 + x5 + x2 + 1the decoder divides v(x) by g(x):

x6 + x5 + x2 + 1 = (x3 + x2 + 1)× (x3 + 1) + 0

(probably) no errors have occurredthe message is formed by the first 4 components of the received word,i.e. m = (0011)



Probability of Block Error with/without coding

The probability of m errors in a block of n bits is:

P(m, n) =

(nm

)pm(1− p)n−m

The probability of block error for uncoded system:

Puncodede = 1− (1− Pe)

n

where Pe is the probability of bit error

The probability of block error for coded system (if we can correct mbits only):

Pcodede = 1− (1− P ′

e)n −

m∑

k=1

(nk

)(P ′

e)k(1− P ′

e)n−k

Note that the energy for coded bits is lower: Ec = k

nEb therefore

Pe 6= P ′e


Joint source and channel coding

Source and Channel Coding

Out

Channel Coding

Source Coding

Compress Encode Decode DecompressNoisy

Channel

In

Source Coding

Source Coding is the process of compressing the data using fewer bitsto remove redundancy.Shannon’s source coding theorem establishes the limits to possible datacompression.

Channel Coding

Channel Coding adds redundancy to protect against channel errors.The Shannon limit or Shannon capacity of a communications channelis the theoretical maximum information transfer rate of the channel, fora particular noise level.


Joint source and channel coding

Source-Channel Separation Theorem

So far we have R > H (compression) and R < C (transmission). It isnatural to ask whether the condition H < C is necessary and sufficient fortransmitting a source over a channel:

Theorem: Source-channel coding theorem

If V1,V2, . . . ,Vn is a finite-alphabet stochastic process satisfying the AEPand the condition H(V) < C , then there exists a source-channel code withprobability of error Pr(V n 6= V n) → 0.Conversely, if for any stationary stochastic process we have thatH(V) > C , then the probability of error is bounded away from zero.

Important result: source coding and channel coding might as well bedone separately since same capacity


channel coding.pdf

Documents