data compression basics

8/3/2019 Data Compression Basics

1/41

EE465: Introduction to Digital Image Processing 1

One-Minute Survey Result Thank you for your responses

Kristen, Anusha, Ian, Christofer, Bernard, Greg, Michael, Shalini,

Brian and Justin

Valentines challenge

Min: 30-45 minutes, Max: 5 hours, Ave: 2-3 hours Muddiest points

Regular tree grammar (CS410 compiler or CS422: Automata)

Fractal geometry (The fractal geometry of nature by Mandelbrot)

Seeing the Connection

Remember the first story in Steve Jobs speech Staying Hungry,

Staying Foolish?

In addition to Jobs and Shannon, I have two more examples:

Charles Darwin and Bruce Lee


2/41


Data Compression Basics

Discrete source

Information=uncertainty

Quantification of uncertainty

Source entropy

Variable length codes

Motivation

Prefix condition

Huffman coding algorithm


3/41


Information

What do we mean by information?

A numerical measure of the uncertainty of anexperimental outcome Webster Dictionary

How to quantitatively measure and representinformation?

Shannon proposes a statistical-mechanics inspiredapproach

Let us first look at how we assess the amount ofinformation in our daily lives using commonsense


4/41


Information = Uncertainty

Zero information

Pittsburgh Steelers won the Superbowl XL (past news, nouncertainty)

Yao Ming plays for Houston Rocket (celebrity fact, no uncertainty)

Little information It will be very cold in Chicago tomorrow (not much uncertainty

since this is winter time)

It is going to rain in Seattle next week (not much uncertainty sinceit rains nine months a year in NW)

Large information An earthquake is going to hit CA in July 2006 (are you sure? an

unlikely event)

Someone has shown P=NP (Wow! Really? Who did it?)


5/41


Shannons Picture on Communication (1948)

sourceencoder

channel

sourcedecoder

source destination

Examples of source:Human speeches, photos, text messages, computer programs

Examples of channel:

storage media, telephone lines, wireless transmission

super-channel

channelencoder

channeldecoder

The goal of communication is to move informationfrom here to there and from now to then


6/41


The role of source coding (data compression):

Facilitate storage and transmission by eliminating source redundancy

Our goal is to maximally remove the source redundancyby intelligent designing source encoder/decoder

Source-Channel Separation Principle*

The role of channel coding:

Fight against channel errors for reliable transmission of information

(design of channel encoder/decoder is considered in EE461)

We simply assume the super-channel achieves error-free transmission


7/41


Discrete Source

A discrete source is characterized by a discrete

random variableX

Examples Coin flipping: P(X=H)=P(X=T)=1/2

Dice tossing: P(X=k)=1/6, k=1-6

Playing-card drawing:

P(X=S)=P(X=H)=P(X=D)=P(X=C)=1/4

What is the redundancy with a discrete source?


8/41


Two Extreme Cases

sourceencoder

channelsourcedecoder

tossinga fair coin

Heador

Tail?

channel duplicationtossing a coin withtwo identical sides

P(X=H)=P(X=T)=1/2: (maximum uncertainty)Minimum (zero) redundancy, compression impossible

P(X=H)=1,P(X=T)=0: (minimum redundancy)Maximum redundancy, compression trivial (1bit is enough)

HHHH

TTTT

Redundancy is the opposite of uncertainty


9/41


Quantifying Uncertainty of an Event

ppI 2log)( p - probability of the eventx

(e.g.,x can beX=HorX=T)

p

1

0

)(pI

0

notes

must happen

(no uncertainty)

unlikely to happen(infinite amount of uncertainty)

Self-information

Intuitively, I(p) measures the amount of uncertainty with event x


10/41


Weighted Self-information

p

0

1

)(pI

0

1/2 1

0

0

1/2

pppIw 2log)(

Question: Which value of p maximizes Iw(p)?

)()( pIppIw

As p evolves from 0 to 1, weighted self-information

first increases and then decreases


11/41


p=1/e

2ln

1)(

epIw

Maximum of Weighted Self-information*


12/41


},...,2,1{ Nx

Niixprobpi ,...,2,1),(

N

i

ip1

1

To quantify the uncertainty of a discrete source,we simply take the summation ofweighted self-

information over the whole set

Xis a discrete random variable

Quantification of Uncertainty of a Discrete Source

A discrete source (random variable) is a collection(set) of individual events whose probabilities sum to 1


13/41


Shannons Source Entropy Formula

N

i

iw pIXH1

)()(

N

i

ii ppXH1

2log)( (bits/sample)or bps

Weighting

coefficients


14/41


Source Entropy Examples

Example 1: (binary Bernoulli source)

)1(1),0( xprobpqxprobp

)loglog()( 22 qqppXH

Flipping a coin with probability of head being p (0


15/41


Entropy of Binary Bernoulli Source


16/41


Source Entropy Examples

Example 2: (4-way random walk)

41)(,

21)( NxprobSxprob

bpsXH 75.1)81log

81

81log

81

41log

41

21log

21()( 2222

N

E

S

W

8

1)()( WxprobExprob


17/41


Source Entropy Examples (Cont)

Example 3:

2

1)(1,

2

1)( bluexprobpredxprobp

A jar contains the same number of balls with two different colors: blue and red.

Each time a ball is randomly picked out from the jar and then put back. Considerthe event that at the k-th picking, it is the first time to see a red ballwhat is the

probability of such event?

Prob(event)=Prob(blue in the first k-1 picks)Prob(red in the k-th pick )

=(1/2)k-1(1/2)=(1/2)k

(source with geometric distribution)


18/41


Source Entropy Calculation

If we consider all possible events, the sum of their probabilities will be one.

Then we can define a discrete random variable X with

1

2

1

1

k

k

Check:

k

kxP

2

1)(

Entropy:

bpskppXHk

k

k

kk 22

1log)(

11

2

Problem 1 in HW3 is slightly more complex than this example


19/41


Properties of Source Entropy

Nonnegative and concave

Achieves the maximum when the source

observes uniform distribution (i.e., P(x=k)=1/N,k=1-N)

Goes to zero (minimum) as the source becomes

more and more skewed (i.e., P(x=k)1, P(xk)

0)


20/41

History of Entropy

Origin: Greek root for transformation content

First created by Rudolf Clausius to study

thermodynamical systems in 1862 Developed by Ludwig Eduard Boltzmann in

1870s-1880s (the first serious attempt to

understand nature in a statistical language)

Borrowed by Shannon in his landmark work A

Mathematical Theory of Communication in

1948EE465: Introduction to Digital Image Processing

20


21/41

A Little Bit of Mathematics*

Entropy S is proportional to log P (P is the

relative probability of a state)

Consider an ideal gas ofNidentical particles,of whichNi are in the i-th microscopic

condition (range) of position and momentum.

Use Stirlings formula: log N! ~ NlogN-N and

note that pi = Ni/N, you will get S ~ pi log pi

EE465: Introduction to Digital Image Processing21


22/41

Entropy-related Quotes

My greatest concern was what to call it. I thought of calling it

information, but the word was overly used, so I decided to call it

uncertainty. When I discussed it with John von Neumann, he had a

better idea. Von Neumann told me, You should call it entropy, for two

reasons. In the first place your uncertainty function has been used instatistical mechanics under that name, so it already has a name. In the

second place, and more important, nobody knows what entropy really

is, so in a debate you will always have the advantage.

--Conversation between Claude Shannon and John von Neumann regardingwhat name to give to the measure of uncertainty or attenuation in

phone-line signals (1949)



23/41

Other Use of Entropy

In biology

the orderproduced within cells as they grow and

divide is more than compensated for by the

disorderthey create in their surroundings in thecourse of growth and division. A. Lehninger

Ecological entropy is a measure of biodiversity in

the study of biological ecology.

In cosmology

black holes have the maximum possible entropy of

any object of equal size Stephen Hawking



24/41


What is the use of H(X)?

Shannons first theorem (noiseless coding theorem)

For a memoryless discrete source X, its entropy H(X)

defines the minimum average code length required to

noiselessly code the source.

Notes:

1. Memoryless means that the events are independently

generated (e.g., the outcomes of flipping a coin N timesare independent events)

2. Source redundancy can be then understood as the

difference between raw data rate and source entropy


25/41


Code Redundancy*

0)( XHlr

Average code length:

Theoretical boundPractical performance

N

i ii p

pXH1

2

1log)(

N

i

iilpl1

li: the length of

codeword assigned

to the i-th symbol

Note: if we represent each symbol by q bits (fixed length codes),

Then redundancy is simply q-H(X) bps


26/41


How to achieve source entropy?

Note: The above entropy coding problem is based on simplified

assumptions are that discrete source X is memoryless and P(X)is completely known. Those assumptions often do not hold for

real-world data such as images and we will recheck them later.

entropy

codingdiscrete

source X

P(X)

binary

bit stream


27/41


Data Compression Basics

Discrete source

Information=uncertainty

Quantification of uncertainty

Source entropy

Variable length codes

Motivation

Prefix condition Huffman coding algorithm


28/41


Recall:

Variable Length Codes (VLC)

Assign a long codeword to an event with small probability

Assign a short codeword to an event with large probability

ppI 2log)( Self-information

It follows from the above formula that a small-probability event contains

much information and therefore worth many bits to represent it. Conversely,

if some event frequently occurs, it is probably a good idea to use as few bits

as possible to represent it. Such observation leads to the idea of varying the

code lengths based on the events probabilities.

)(log)( 2 xpxl


29/41


symbol k pk

S

W

N

E

0.5

0.25

0.125

fixed-length

codeword

0.125

00

01

10

11

variable-length

codeword

0

10

110

111

4-way Random Walk Example

symbol stream : S S N W S E N N N W S S S N E S S

fixed length:variable length:

00 00 01 11 00 10 01 01 11 00 00 00 01 10 00 00

0 0 10 111 0 110 10 10 111 0 0 0 10 110 0 0

32bits

28bits

4 bits savings achieved by VLC (redundancy eliminated)


30/41


=0.51+0.252+0.1253+0.1253

=1.75 bits/symbol

average code length:

Toy Example (Cont)

source entropy:

4

1

2log)(k

kk ppXH

s

b

N

Nl

Total number of bits

Total number of symbols

(bps)

)(2 XHbpsl fixed-length variable-length

)(75.1 XHbpsl


31/41


Problems with VLC

When codewords have fixed lengths, the

boundary of codewords is always identifiable.

For codewords with variable lengths, their

boundary could become ambiguous

symbol

S

W

N

E

VLC

0

1

10

11

S S N W S E

0 0 1 11 0 10

0 0 11 1 0 10 0 0 1 11 0 1 0

S S W N S E S S N W S E

e

d d


32/41


Uniquely Decodable Codes

To avoid the ambiguity in decoding, we need to

enforce certain conditions with VLC to make

them uniquely decodable

Since ambiguity arises when some codeword

becomes the prefix of the other, it is natural to

consider prefix condition

Example: p pr pre pref prefi prefix

ab: a is the prefix of b


33/41


Prefix condition

No codeword is allowed tobe the prefix of any other

codeword.

We will graphically illustrate this condition

with the aid of binary codeword tree


34/41


Binary Codeword Tree

1 0

1011 01 00

root

Level 1

Level 2

# of codewords

2

22

2kLevel k


35/41


Prefix Condition Examples

symbol x

W

E

S

N

0

1

10

11

codeword 1 codeword 2

0

10

110

111

1 0

1011 01 00

1 0

1011

codeword 1 codeword 2

111 110


36/41


How to satisfy prefix condition?

Basic rule: If a node is used as a codeword,

then all its descendants cannot be used as

codeword.

1 0

1011

111 110

Example


37/41


Krafts inequality 121

N

i

li

li: length of the i-th codeword

Property of Prefix Codes

W

E

S

N

0

1

10

11

0

10

110

111

symbol x VLC- 1 VLC-2Example

124

1

i

li 124

1

i

li

(proof skipped)


38/41


Two Goals of VLC design

log2p(x) For an eventx with probability ofp(x), the optimal

code-length is , where x denotes the

smallest integer larger thanx (e.g., 3.4=4 )

achieve optimal code length (i.e., minimal redundancy)

satisfy prefix condition

code redundancy: 0)( XHlr

Unless probabilities of events are all power of 2,

we often have r>0


39/41


Solution:

Huffman Coding (Huffman1952)

we will cover it later while studying JPEG

Arithmetic Coding (1980s)

not covered by EE465 but EE565 (F2008)


40/41


Golomb Codes for Geometric Distribution

k

1

2

3

4

5

67

8

codeword

0

10

110

1110

11110

1111101111110

11111110

Optimal VLC for geometric source: P(X=k)=(1/2)k, k=1,2,

01

1 0

1 0

1 0


41/41


Summary of Data Compression Basics

Shannons Source entropy formula (theory)

Entropy (uncertainty) is quantified by weighted

self-information

VLC thumb rule (practice)

Long codeword small-probability event Short codeword large-probability event

N

i

ii ppXH1

2log)( bps

)(log)( 2 xpxl

data compression basics

Documents