ee 5345/7345 medical signal analysis introduction to ...lyle.smu.edu/~cd/ee5345/lectures/it.pdfgiven...

74
EE 5345/7345 Medical Signal Analysis Introduction to Information Theory and Entropy Coding Carlos E. Davila [email protected] Electrical Engineering Dept. Southern Methodist University EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 1

Upload: others

Post on 01-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

EE 5345/7345 Medical SignalAnalysis

Introduction to Information Theoryand Entropy Coding

Carlos E. Davila

[email protected]

Electrical Engineering Dept.

Southern Methodist University

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 1

Page 2: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Data Compression System

sourceencoder

channelencoder

sourcedecoder

channeldecoder

storagemedium

source

(channel)

encoder

decoder

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 2

Page 3: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

source: output of a quantizer, generates a finite numberof symbols. Each symbol may be represented by abinary code (as in the output of an A/D converter).

source encoder: generates a code word for each of thesymbols. Each code word consists of binary digits andis generated so as to minimize the average number ofbits per symbol, i.e. the bit rate.

channel encoder: introduces redundancy in thecodewords. This redundancy compensates for bit errorsthat may result from noise or distortion in the channel.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 3

Page 4: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

How do we measure information?

Suppose

and

are discrete random variables havingoutcomes � ��

� � �� � � � � , ��

� � �� � � � � .

If we observe

� � � , we wish to determine the amountof information this observation provides about the event

� � � ��� � �� � � � � .

If

and

are statistically independent, then the event

� � � provides no information about the event

� � � �.

However if

and

are completely dependent, then theevent

� � � determines the event

� � � �. Observing

� � � gives us the information in the event

� � � �.

A measure satisfying these conditions is the mutualinformation.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 4

Page 5: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Mutual Information

The mutual information between � � and � :

� � � � � � � � �� �� � � � � � �

� � � � �where

� � � � � � � � � � � � � � � � � � � �

� � � � � � �

� � � �

� � � � ��

� � � � � � � � � � � � �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 5

Page 6: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Units for � � �

The units depend on the base of the logarithm (usually2 or �).

When the base is 2, the units for

� � � � � � � are bits.

When the base is �, the units are nats (natural units)

The information measured in nats is�� �

times theinformation measured in bits:

� � � � �� � �� � � � � ��

� � � �� � � �

i.e.

�� � �� � � � � �� � � � �� � � �� � � � � � � �� �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 6

Page 7: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Independent Events

When

and

are independent random variables,

� � � � � � � � � � � � �

So

� � � � � � � � �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 7

Page 8: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

ex) Consider an experiment involving two successive cointosses. Let

= outcome of first toss and

= outcome ofsecond toss. Suppose

� � � � � � � � � � � � � � ��

, i.e. thecoin is "fair". Also assume

and

are independentevents. Then

� � � � � � � � � � � � � � ��

�, is independent

of the outcome of the first toss and the mutual information iszero for all possibilities,

� � � � � � � � � � � � � � � � � � � � � � � � � � � � �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 8

Page 9: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Dependent Events

If

� � � uniquely determines the event

� � � �, then

� � � � � � � � �

and the mutual information is

� � � � � � � � �� �

�� � � � �

� ��� � � � � � �

In this case, the mutual information is the information in theevent

� � � �, called the self-information:

� � � � � � ��� � � � � � �

Low probability events have more information than highprobability events.

Events having a probability of 1 have zero information.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 9

Page 10: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

ex) Consider an experiment involving a single roll of a fairdie. Let

� � �

if either a "1", "3" or "4" was rolled,otherwise,

� � �

. Let

� � �

if either a "1" or "3" was rolled,otherise

� � �

. Then

� � � � � � ��

��

��

� �� � � � � � �

��

��

and

� � � � � � � � � � �

� � � � ��

� � � �

� � � � � �

� � � ��

��

� � � � ��

� � �

� � � ��

� � � � �

(1)

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 10

Page 11: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

ex) suppose a source produces a binary digit ( � � � �or

)with equal probability every

seconds. The informationcontent of the source is

� � � � � � ��� � � � � � � � � ��� � �

�� � �

bit

The information rate is

� � �

bits/sec.Assume the source generates a block of

independentbinary digits every

seconds. There are

� �

possible blocks,each equally likely. i.e. the probability of each block is

� � � � � � � �. The information content for each block is:

� � � � � � � ��� � � � � � � �

bits

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 11

Page 12: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

It is straight-forward to show that:

� � � � � � �

� � � � �

� � � � � � �

� � � �therefore

� � � � � � � � � � � � � � �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 12

Page 13: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

ex) Let

and

be binary random variables (0,1),�

is theinput and

is the output of a channel. The inputs areequally likely and the outputs depend on the inputsaccording to:

� � � � � � � � � � � �� � �

� � � � � � � � � � � � �

� � � � � � � � � � � �� � �

� � � � � � � � � � � � �

(2)

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 13

Page 14: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

First, lets find

� � � � � � � � � � �� � � � � � � � � �

� � � � � � :

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

��

�� �� � � � � �

(3)

Then

� � � � � � � � � � � �� � �� � � � � � � � � �

� � � � � �

� �� � �� � �� � �

�� � � � � �

(4)

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 14

Page 15: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

If � � � � � � �

, then

� � � � � � � � � � � �� � � � � �

bit, thechannel is noiseless.

If � � � � � ��

� , then

� � � � � � � � � � � �� � � � � �

bits, thechannel is useless.

It can similarly be shown that:

� � � � � � ��

�� �� � � � � �

� � � � � � � � � � � �� � �

� � �� � �

� �� � � � � �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 15

Page 16: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Average Mutual Information

Mutual information has to do with a pair of events.

The average mutual information is defined for tworandom variables:

� � � � � � �

�� � �

�� � �

� � � �� � � � � � � � � �

�� � �

�� � �

� � � �� � � �� �

� � � �� � �

� � � � � � � � �

(5)

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 16

Page 17: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Average Self-Information: Entropy

� � � � �

�� � �

� � � � � � � � � �

� �

�� � �

� � � � � �� � � � � � �(6)

Maximum entropy occurs when all symbols are equallyprobable:

� � � � � �

�� � �

�� � �

� �� �

(7)

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 17

Page 18: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example

Suppose that X is a discrete random variable. Let

� � � � � � � � and

� � � � � � � �� �. Then

� � � � � � ��� � � � �� �� �

� �� � � � � � ��

If � ��

� , then

� � � � � �

is maximized.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 18

Page 19: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Average Mutual Information

Suppose

takes on values of � � � � � � � � � with probability

� � � � � �� � �� � � � � and

takes on values of � � � � � � � withprobability

� � � � �� � �� � � � � �. Then the average mutual

information is defined as:

� � � � � � �

�� � �

�� � �

� � � � � � � � � � �� � �

�� � �

�� � �

� � � � � � � �� �

� � � � � � �

� � � � � � � � �

(8)

� � � � � � � �

and is zero only when

and

areindependent.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 19

Page 20: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Conditional Entropy

Suppose

takes on values of � � � � � � � � � with probability

� � � � � �� � �� � � � � and

takes on values of � � � � � � � withprobability

� � � � �� � �� � � � � �. Then the conditional entropy

is defined as:

� � � � � � �

�� � �

�� � �

� � � �� � � �� �

� � � � � � � (9)

Expanding (8), then using (9) and (7) gives:

� � � � � � � � � � ��

� � � � � �

Since

� � � � � � � �

, we have

� � � � � � � � � � �

.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 20

Page 21: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Entropy of a block of random variables

Assume there are

random variables,

�� �

� � � � � � �

��. Each

one can take on a finite alphabet, � � � � � � � � � � � �. The jointprobability is

� � � � � � � � � � � � � � � � � � �� � � � �

� � � � �� � � � �

�� � � � � . The

entropy of the block of random variables is defined as:

� � �� �

� � � � � � �

��

� �

��� � �

�� � � �

� � �

��� � �

� � � �� � � � � � � � � � � ��

� �� � � � � �� � � � � � � � � � � ��

(10)

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 21

Page 22: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

The joint probability can be factored as

� � � �� � � � � � � � � � � ��

� �

� � � ��

� � � � � �� � �

� � � � ��

� � ��

� � ��� � �

� � � ��

� � ��

� � � � � �

� ��

��

(11)

Therefore

� � �� �

� � � � � � �

��

� � � � ��

� � � � � � � ��

� � � � ��

� ��

� � �

�� � �

� � � ��

� ��

� � � � �

����

(12)

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 22

Page 23: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

since

� � � � � � � � � � �

if we let

� � �� and

� � ��

� � � � �

�� � � we get

� � ��

� � � � �

��

� �

�� � �

� � ��

with equality when the

�� �

� � � � � � �

�� are independent RV’s.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 23

Page 24: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Discrete Memoryless Source (DMS)

A source is a discrete random variable.

The values this random variable can take on is calledthe alphabet or source symbols.

In a DMS, successive symbols are statisticallyindependent.

Few actual models fit the idealized model.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 24

Page 25: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Sampling and Quantizing

Given a continuous-time band-limited random process,

� � � �

, if we sample it at the Nyquist rate or higher, thereis no loss of information.

In order to be able to express each sample using a finitealphabet, we must quantize the samples.

This quantization introduces distortion but also leads tosignal compression since we can represent thesamples using a fiinite number of bits, rather than aninfinite number of bits for the unquantized samples.

In an A/D converter, the number of quantization levelscorrespond to the number of bits in the A/D converter.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 25

Page 26: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Encoding

In the encoding problem, we seek binary codes for eachpossible quantizer output (symbol).

We may wish to minimize the source bit rate for a givenlevel of tolerable distortion (beyond the distortion due toquantization error).

Or, we can minimize the distortion for a given bit rate.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 26

Page 27: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Entropy of a DMS

Assume a DMS

generates a symbol overy�

seconds.

Each symbol comes from a finite alphabet,

� ��� � ��

�� � � � �

and has probability� � � � � .

The entropy of the DMS is:

� � � � � �

�� � �

� � � � � �� � � � � � � � � �� � � �

Equality holds when the symbols are equally probable.

Average information per symbol is

� � � �

bits.

Information rate is

� � � � � �

bits per second.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 27

Page 28: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Fixed-Length Encoding

Suppose we use the same number of bits for each symbol.Since there are

symbols, the number of bits for each codeword is:

When

is a power of 2:

� � �� � � �

.

When

is not a power of 2:

� � � �� � � � � � �

� � �" means the largest integer contained in �.

Bit rate is

bits per symbol. Since

� � � � � �� � � �

,

� � � � � �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 28

Page 29: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Coding Efficiency of a DMS

The efficiency of the encoding is defined as� � � � � �

.

When

is a power of 2 and the symbols are equallyprobable, then

� � � � � �

, and the code is 100%efficient.

When

is not a power of 2 and the symbols are equallyprobable, then

differs from� � � � by at most 1 bit per

symbol.

ex) suppose

� � � �

, then� � � �� � � � � � � � � �

, but theentropy is

�� � � � � � ��

� � � �� � � . So the coding efficiency is

� ��

.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 29

Page 30: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

If

�� � � � �

then the coding efficiency is high.

Otherwise, if

is small, we can increase the codingefficiency by encoding a block of

symbols at a time.

We then require

� �

unique code words.

If we use

binary digits to encode each block, then wewould need for

� � � �� � � �

.

Minimum number of bits required is

� � � � �� � � � � � �

.

Since the entropy for a block of RV’s increases by

, sodoes the efficiency.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 30

Page 31: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example

Suppose

� � � �

, then

� � � �� � � � � � � � � �

, if weencode one symbol at a time, as we saw above thecoding efficiency is

� ��

.

Now suppose we find codes for blocks of 2 symbols at atime. Then we must come up with

� � � � � � �

possiblecodewords, requiring

� � � � �� � � � � � � � � � �

bits toencode.

Since the symbols are independent, the entropy is now� �� � � � � � � ��

� � � �

.

So the coding efficiency is now

� �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 31

Page 32: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Entropy Coding: Variable-Length Codes

Variable length coding is useful when the symbolprobabilities are not equal.

Symbols that occur more often are given shorter-lengthcodes than symbols occuring less frequently.

This reduces the average number of bits per symbol.

Morse code is an example of this.

Codes must be chosen so that there are no ambiguitiesin selecting codewords.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 32

Page 33: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example (Proakis)

Symbol

� � � � � Code 1 Code 2 Code 3

� �

�� 1 0 0

� �

�� 00 10 01

� �

�� 01 110 011

� �

�� 10 111 111

Code 1 has a flaw, if we see 001001, this could mean

� � � � � � or � � � � � � � � .

Viewing additional bits may remove this ambiguity, butwe are only interested in codes that are instantaneouslydecodable.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 33

Page 34: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Symbol

� � � � � Code 1 Code 2 Code 3

� �

�� 1 0 0

� �

�� 00 10 01

� �

�� 01 110 011

� �

�� 10 111 111

Code 2 is uniquely and instantaneously decodable.

Code 3 is uniquely decodable, but is notinstantaneously decodable (we have to wait until all thebits come in before we can resolve ambiguities).

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 34

Page 35: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Code Tree for Code 1

a1

a2

a3

a4

1

0 1

0

0

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 35

Page 36: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Code Tree for Code 2

a1

a2

a4

a3

1

0

0

1 0

1

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 36

Page 37: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Code Tree for Code 3

a1

a4

a3a2

1

0

1

1

1

1

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 37

Page 38: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Prefix Condition

The key to having a code that is uniquely andinstantaneously decodable is that no one code be aprefix to another code: the prefix condition.

Code 1 does not satisfy the prefix condition and is notuniquely decodable.

Code 3 does not satisfy the prefix condition either andis not instantaneously decodable.

Code 2 does satisfy the prefix condition and is bothuniquely and instantaneously decodable.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 38

Page 39: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

The Kraft Inequality

A binary code with code words having lengths

� � � �� � �

� � satisfies the prefix condition if and only if

�� � �

� � � � � �A note about proofs:

if and only if

" is equivalent to "

implies and isimplied by

" or “

� � �".

To prove “

� � � �

, we must prove both “

� � �

" and“

� � �

".

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 39

Page 40: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

In order to satisfy the prefix condition, each time an order

� node is selected in a code tree, we eliminate� � � � of the

total number of terminal nodes:

a1

a2

a5

a3

a4

1

0

0

10

1

0

1

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 40

Page 41: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Proof of

Assuming we have assigned code words of lengths

� � � �� � �

� � �� � �

, the fraction of the totalnumber of terminal nodes that has been eliminated is

�� � �

� � � �By the Kraft inequality:

�� � �

� � � � �

�� � �

� � � � � �

So we’ve not yet eliminated all possible code wordssatisfying the prefix condition.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 41

Page 42: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Proof of

Once all

codewords that satisfy the prefix conditionhave been chosen, the total number of terminal nodesthat have been eliminated is:

�� � �

� � � � � � � � � �

where

� � �

is the total number of terminal nodes.

Therefore:�

� � �� � � � � �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 42

Page 43: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Source Coding Theorem

Let

be a DMS with finite entropy

� � � �

. The sourcesymbols are � ��

� � �� � � � �

and have correspondingprobabilities � ��

� � �� � � � �

. It is possible to construct acode that satifsfies the prefix condition and has an averagelength

that satisfies:

� � � � � � � � � � � � �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 43

Page 44: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Proof of lower bound:

We have:

� � � ��

� �

�� � �

� ��� � �

�� �

�� � �

� � �

�� � �

� ��� � �

� � � �� �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 44

Page 45: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Using the fact that

�� � � � ��

:

� � � ��

� � � �� � � � ��

� � �� �

� � � �� �

��

� � �� � � � �

�� � �

� � � ��� � �

The last inequality is due to the Kraft inequality.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 45

Page 46: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Using the fact that

�� � � � ��

:

� � � ��

� � � �� � � � ��

� � �� �

� � � �� �

��

� � �� � � � �

�� � �

� � � ��� � �

The last inequality is due to the Kraft inequality.

Equality holds when � � � � � � ��

� � �� � � � �

.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 45

Page 47: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Proof of upper bound:

Since the ��� � �� � � � �

are integers, we must selectthe � such that:

� � � � � � � � �

This simply says that � � � �

.

Taking the logarithm of both sides gives:

�� � � � � � � � � �

or

� � ��

�� � � � � (13)

Multiplying both sides of (13) by � � and summing from

� � �� � � � �

gives the upper bound.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 46

Page 48: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Huffman Encoding

Huffman in 1952 devised an algorithm that assigns variablelength codewords that minimize the average number of bitsper symbol subject to the codewords satisfying the prefixcondition.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 47

Page 49: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 1

0.35

0.30

0.20

0.10

0.04

0.005

0.005

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 48

Page 50: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 1

0.35

0.30

0.20

0.10

0.04

0.005

0.005

0.01

1

0

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 48

Page 51: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 1

0.35

0.30

0.20

0.10

0.04

0.005

0.005

0.01

0.05

1

0

1

0

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 48

Page 52: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 1

0.35

0.30

0.20

0.10

0.04

0.005

0.005

0.01

0.05

1

0

1

0 0.15

1

0

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 48

Page 53: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 1

0.35

0.30

0.20

0.10

0.04

0.005

0.005

0.01

0.05

1

0

1

0 0.15

1

0

1

0

0.35

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 48

Page 54: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 1

0.35

0.30

0.20

0.10

0.04

0.005

0.005

0.01

0.05

1

0

1

0 0.15

1

0

1

0

0.351

0

0.65

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 48

Page 55: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 1

0.35

0.30

0.20

0.10

0.04

0.005

0.005

0.01

0.05

1

0

1

0 0.15

1

0

1

0

0.351

0

0.65

0

1

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 48

Page 56: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Huffman Codes

Symbol Probability Code

� � 0.35 0

� � 0.30 10

� � 0.20 110

� � 0.10 1110

� � 0.04 11110

� � 0.005 111110

� � 0.005 1111110

� � � � � ��

� �

,

� � ��

� �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 49

Page 57: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Codes are not unique

0.35

0.30

0.20

0.10

0.04

0.005

0.005

0.01

0.05

1

0

1

0 0.15

1

0

1

0

0.35

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 50

Page 58: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Codes are not unique

0.35

0.30

0.20

0.10

0.04

0.005

0.005

0.01

0.05

1

0

1

0 0.15

1

0

1

0.35

0

0

1

0.65

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 50

Page 59: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Codes are not unique

0.35

0.30

0.20

0.10

0.04

0.005

0.005

0.01

0.05

1

0

1

0 0.15

1

0

1

0.351

0

0

1

0.65

0

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 50

Page 60: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Symbol Probability Code

� � 0.35 00

� � 0.30 01

� � 0.20 10

� � 0.10 110

� � 0.04 1110

� � 0.005 11110

� � 0.005 11111

� � � � � ��

� �

,

� � ��

� �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 51

Page 61: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 2

0.36

0.14

0.13

0.12

0.10

0.09

0.04

0.02

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 52

Page 62: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 2

0.36

0.14

0.13

0.12

0.10

0.09

0.04

0.02

0

1

0.06

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 52

Page 63: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 2

0.36

0.14

0.13

0.12

0.10

0.09

0.04

0.02

0

1

1

0

0.06

0.15

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 52

Page 64: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 2

0.36

0.14

0.13

0.12

0.10

0.09

0.04

0.02

0

1

1

0

0

1

0.06

0.22

0.15

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 52

Page 65: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 2

0.36

0.14

0.13

0.12

0.10

0.09

0.04

0.02

0

1

1

0

0

0

1

1

0.06

0.22

0.27

0.15

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 52

Page 66: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 2

0.36

0.14

0.13

0.12

0.10

0.09

0.04

0.02

0

1

1

1

0

0

0

1

1

0.06

0.15

00.22

0.27

0.37

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 52

Page 67: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 2

0.36

0.14

0.13

0.12

0.10

0.09

0.04

0.02

0

1

1

1

0

0

0

0

11

1

0.06

0.15

00.22

0.27

0.63

0.37

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 52

Page 68: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Example 2

0.36

0.14

0.13

0.12

0.10

0.09

0.04

0.02

0

1

1

1

10

0

0

0

11

1

0.06

0.15

00.22

0.27 0

0.63

0.37

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 52

Page 69: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Symbol Probability Code

� � 0.36 00

� � 0.14 010

� � 0.13 111

� � 0.12 100

� � 0.10 101

� � 0.09 110

� � 0.04 1110

� � 0.02 1111

� � � � � ��

,

� � ��

� �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 53

Page 70: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Increasing Coding Efficiency for a DMS

If instead of encoding one symbol at a time, we canencode blocks of

symbols at a time. The sourcecoding theorem becomes:

� � � � � � � � � � � � � � � �

since the entropy of a block of symbols from a DMS is

� � � � �

.

Dividing through by

gives

� � � � �

� ��

� � � � � �

��

The average number of bits per source symbols is

� ��

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 54

Page 71: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Encoding Discrete Stationary Sources

If the source does not produce independent symbols,then the source is not a DMS.

The source coding theorem does not apply.

In this case, if we encode a block of�

symbols, theaverage bit rate for the block becomes:

� � �� � � �

� � � � � � � � � �� � � �

� � � � �

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 55

Page 72: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

The average number of bits per source symbols isbounded by

� � � � � � � � � � � � � �

��

where

� � � � � ��

�� � �� � � �

� � �

By making

sufficiently large we can tighten thebounds arbitrarily

��

� � � � � � ��

� � � � �

The limit��

� � �can be shown to exist.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 56

Page 73: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Efficient encoding of stationary sources is done byencoding large blocks of symbols.

This requires that the joint probability be known for each

-symbol block.

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 57

Page 74: EE 5345/7345 Medical Signal Analysis Introduction to ...lyle.smu.edu/~cd/EE5345/lectures/it.pdfGiven a continuous-time band-limited random process,, if we sample it at the Nyquist

Matlab Functions

Huffmanenco

Huffmandeco

Huffmandict

EE 5345/7345, Medical Signal Analysis, Southern Methodist Unversity – p. 58