data compression basics
TRANSCRIPT
-
8/3/2019 Data Compression Basics
1/41
EE465: Introduction to Digital Image Processing 1
One-Minute Survey Result Thank you for your responses
Kristen, Anusha, Ian, Christofer, Bernard, Greg, Michael, Shalini,
Brian and Justin
Valentines challenge
Min: 30-45 minutes, Max: 5 hours, Ave: 2-3 hours Muddiest points
Regular tree grammar (CS410 compiler or CS422: Automata)
Fractal geometry (The fractal geometry of nature by Mandelbrot)
Seeing the Connection
Remember the first story in Steve Jobs speech Staying Hungry,
Staying Foolish?
In addition to Jobs and Shannon, I have two more examples:
Charles Darwin and Bruce Lee
-
8/3/2019 Data Compression Basics
2/41
EE465: Introduction to Digital Image Processing 2
Data Compression Basics
Discrete source
Information=uncertainty
Quantification of uncertainty
Source entropy
Variable length codes
Motivation
Prefix condition
Huffman coding algorithm
-
8/3/2019 Data Compression Basics
3/41
EE465: Introduction to Digital Image Processing 3
Information
What do we mean by information?
A numerical measure of the uncertainty of anexperimental outcome Webster Dictionary
How to quantitatively measure and representinformation?
Shannon proposes a statistical-mechanics inspiredapproach
Let us first look at how we assess the amount ofinformation in our daily lives using commonsense
-
8/3/2019 Data Compression Basics
4/41
EE465: Introduction to Digital Image Processing 4
Information = Uncertainty
Zero information
Pittsburgh Steelers won the Superbowl XL (past news, nouncertainty)
Yao Ming plays for Houston Rocket (celebrity fact, no uncertainty)
Little information It will be very cold in Chicago tomorrow (not much uncertainty
since this is winter time)
It is going to rain in Seattle next week (not much uncertainty sinceit rains nine months a year in NW)
Large information An earthquake is going to hit CA in July 2006 (are you sure? an
unlikely event)
Someone has shown P=NP (Wow! Really? Who did it?)
-
8/3/2019 Data Compression Basics
5/41
EE465: Introduction to Digital Image Processing 5
Shannons Picture on Communication (1948)
sourceencoder
channel
sourcedecoder
source destination
Examples of source:Human speeches, photos, text messages, computer programs
Examples of channel:
storage media, telephone lines, wireless transmission
super-channel
channelencoder
channeldecoder
The goal of communication is to move informationfrom here to there and from now to then
-
8/3/2019 Data Compression Basics
6/41
EE465: Introduction to Digital Image Processing 6
The role of source coding (data compression):
Facilitate storage and transmission by eliminating source redundancy
Our goal is to maximally remove the source redundancyby intelligent designing source encoder/decoder
Source-Channel Separation Principle*
The role of channel coding:
Fight against channel errors for reliable transmission of information
(design of channel encoder/decoder is considered in EE461)
We simply assume the super-channel achieves error-free transmission
-
8/3/2019 Data Compression Basics
7/41
EE465: Introduction to Digital Image Processing 7
Discrete Source
A discrete source is characterized by a discrete
random variableX
Examples Coin flipping: P(X=H)=P(X=T)=1/2
Dice tossing: P(X=k)=1/6, k=1-6
Playing-card drawing:
P(X=S)=P(X=H)=P(X=D)=P(X=C)=1/4
What is the redundancy with a discrete source?
-
8/3/2019 Data Compression Basics
8/41
EE465: Introduction to Digital Image Processing 8
Two Extreme Cases
sourceencoder
channelsourcedecoder
tossinga fair coin
Heador
Tail?
channel duplicationtossing a coin withtwo identical sides
P(X=H)=P(X=T)=1/2: (maximum uncertainty)Minimum (zero) redundancy, compression impossible
P(X=H)=1,P(X=T)=0: (minimum redundancy)Maximum redundancy, compression trivial (1bit is enough)
HHHH
TTTT
Redundancy is the opposite of uncertainty
-
8/3/2019 Data Compression Basics
9/41
EE465: Introduction to Digital Image Processing 9
Quantifying Uncertainty of an Event
ppI 2log)( p - probability of the eventx
(e.g.,x can beX=HorX=T)
p
1
0
)(pI
0
notes
must happen
(no uncertainty)
unlikely to happen(infinite amount of uncertainty)
Self-information
Intuitively, I(p) measures the amount of uncertainty with event x
-
8/3/2019 Data Compression Basics
10/41
EE465: Introduction to Digital Image Processing 10
Weighted Self-information
p
0
1
)(pI
0
1/2 1
0
0
1/2
pppIw 2log)(
Question: Which value of p maximizes Iw(p)?
)()( pIppIw
As p evolves from 0 to 1, weighted self-information
first increases and then decreases
-
8/3/2019 Data Compression Basics
11/41
EE465: Introduction to Digital Image Processing 11
p=1/e
2ln
1)(
epIw
Maximum of Weighted Self-information*
-
8/3/2019 Data Compression Basics
12/41
EE465: Introduction to Digital Image Processing 12
},...,2,1{ Nx
Niixprobpi ,...,2,1),(
N
i
ip1
1
To quantify the uncertainty of a discrete source,we simply take the summation ofweighted self-
information over the whole set
Xis a discrete random variable
Quantification of Uncertainty of a Discrete Source
A discrete source (random variable) is a collection(set) of individual events whose probabilities sum to 1
-
8/3/2019 Data Compression Basics
13/41
EE465: Introduction to Digital Image Processing 13
Shannons Source Entropy Formula
N
i
iw pIXH1
)()(
N
i
ii ppXH1
2log)( (bits/sample)or bps
Weighting
coefficients
-
8/3/2019 Data Compression Basics
14/41
EE465: Introduction to Digital Image Processing 14
Source Entropy Examples
Example 1: (binary Bernoulli source)
)1(1),0( xprobpqxprobp
)loglog()( 22 qqppXH
Flipping a coin with probability of head being p (0
-
8/3/2019 Data Compression Basics
15/41
EE465: Introduction to Digital Image Processing 15
Entropy of Binary Bernoulli Source
-
8/3/2019 Data Compression Basics
16/41
EE465: Introduction to Digital Image Processing 16
Source Entropy Examples
Example 2: (4-way random walk)
41)(,
21)( NxprobSxprob
bpsXH 75.1)81log
81
81log
81
41log
41
21log
21()( 2222
N
E
S
W
8
1)()( WxprobExprob
-
8/3/2019 Data Compression Basics
17/41
EE465: Introduction to Digital Image Processing 17
Source Entropy Examples (Cont)
Example 3:
2
1)(1,
2
1)( bluexprobpredxprobp
A jar contains the same number of balls with two different colors: blue and red.
Each time a ball is randomly picked out from the jar and then put back. Considerthe event that at the k-th picking, it is the first time to see a red ballwhat is the
probability of such event?
Prob(event)=Prob(blue in the first k-1 picks)Prob(red in the k-th pick )
=(1/2)k-1(1/2)=(1/2)k
(source with geometric distribution)
-
8/3/2019 Data Compression Basics
18/41
EE465: Introduction to Digital Image Processing 18
Source Entropy Calculation
If we consider all possible events, the sum of their probabilities will be one.
Then we can define a discrete random variable X with
1
2
1
1
k
k
Check:
k
kxP
2
1)(
Entropy:
bpskppXHk
k
k
kk 22
1log)(
11
2
Problem 1 in HW3 is slightly more complex than this example
-
8/3/2019 Data Compression Basics
19/41
EE465: Introduction to Digital Image Processing 19
Properties of Source Entropy
Nonnegative and concave
Achieves the maximum when the source
observes uniform distribution (i.e., P(x=k)=1/N,k=1-N)
Goes to zero (minimum) as the source becomes
more and more skewed (i.e., P(x=k)1, P(xk)
0)
-
8/3/2019 Data Compression Basics
20/41
History of Entropy
Origin: Greek root for transformation content
First created by Rudolf Clausius to study
thermodynamical systems in 1862 Developed by Ludwig Eduard Boltzmann in
1870s-1880s (the first serious attempt to
understand nature in a statistical language)
Borrowed by Shannon in his landmark work A
Mathematical Theory of Communication in
1948EE465: Introduction to Digital Image Processing
20
-
8/3/2019 Data Compression Basics
21/41
A Little Bit of Mathematics*
Entropy S is proportional to log P (P is the
relative probability of a state)
Consider an ideal gas ofNidentical particles,of whichNi are in the i-th microscopic
condition (range) of position and momentum.
Use Stirlings formula: log N! ~ NlogN-N and
note that pi = Ni/N, you will get S ~ pi log pi
EE465: Introduction to Digital Image Processing21
-
8/3/2019 Data Compression Basics
22/41
Entropy-related Quotes
My greatest concern was what to call it. I thought of calling it
information, but the word was overly used, so I decided to call it
uncertainty. When I discussed it with John von Neumann, he had a
better idea. Von Neumann told me, You should call it entropy, for two
reasons. In the first place your uncertainty function has been used instatistical mechanics under that name, so it already has a name. In the
second place, and more important, nobody knows what entropy really
is, so in a debate you will always have the advantage.
--Conversation between Claude Shannon and John von Neumann regardingwhat name to give to the measure of uncertainty or attenuation in
phone-line signals (1949)
EE465: Introduction to Digital Image Processing22
-
8/3/2019 Data Compression Basics
23/41
Other Use of Entropy
In biology
the orderproduced within cells as they grow and
divide is more than compensated for by the
disorderthey create in their surroundings in thecourse of growth and division. A. Lehninger
Ecological entropy is a measure of biodiversity in
the study of biological ecology.
In cosmology
black holes have the maximum possible entropy of
any object of equal size Stephen Hawking
EE465: Introduction to Digital Image Processing23
-
8/3/2019 Data Compression Basics
24/41
EE465: Introduction to Digital Image Processing24
What is the use of H(X)?
Shannons first theorem (noiseless coding theorem)
For a memoryless discrete source X, its entropy H(X)
defines the minimum average code length required to
noiselessly code the source.
Notes:
1. Memoryless means that the events are independently
generated (e.g., the outcomes of flipping a coin N timesare independent events)
2. Source redundancy can be then understood as the
difference between raw data rate and source entropy
-
8/3/2019 Data Compression Basics
25/41
EE465: Introduction to Digital Image Processing25
Code Redundancy*
0)( XHlr
Average code length:
Theoretical boundPractical performance
N
i ii p
pXH1
2
1log)(
N
i
iilpl1
li: the length of
codeword assigned
to the i-th symbol
Note: if we represent each symbol by q bits (fixed length codes),
Then redundancy is simply q-H(X) bps
-
8/3/2019 Data Compression Basics
26/41
EE465: Introduction to Digital Image Processing26
How to achieve source entropy?
Note: The above entropy coding problem is based on simplified
assumptions are that discrete source X is memoryless and P(X)is completely known. Those assumptions often do not hold for
real-world data such as images and we will recheck them later.
entropy
codingdiscrete
source X
P(X)
binary
bit stream
-
8/3/2019 Data Compression Basics
27/41
EE465: Introduction to Digital Image Processing27
Data Compression Basics
Discrete source
Information=uncertainty
Quantification of uncertainty
Source entropy
Variable length codes
Motivation
Prefix condition Huffman coding algorithm
-
8/3/2019 Data Compression Basics
28/41
EE465: Introduction to Digital Image Processing 28
Recall:
Variable Length Codes (VLC)
Assign a long codeword to an event with small probability
Assign a short codeword to an event with large probability
ppI 2log)( Self-information
It follows from the above formula that a small-probability event contains
much information and therefore worth many bits to represent it. Conversely,
if some event frequently occurs, it is probably a good idea to use as few bits
as possible to represent it. Such observation leads to the idea of varying the
code lengths based on the events probabilities.
)(log)( 2 xpxl
-
8/3/2019 Data Compression Basics
29/41
EE465: Introduction to Digital Image Processing 29
symbol k pk
S
W
N
E
0.5
0.25
0.125
fixed-length
codeword
0.125
00
01
10
11
variable-length
codeword
0
10
110
111
4-way Random Walk Example
symbol stream : S S N W S E N N N W S S S N E S S
fixed length:variable length:
00 00 01 11 00 10 01 01 11 00 00 00 01 10 00 00
0 0 10 111 0 110 10 10 111 0 0 0 10 110 0 0
32bits
28bits
4 bits savings achieved by VLC (redundancy eliminated)
-
8/3/2019 Data Compression Basics
30/41
EE465: Introduction to Digital Image Processing 30
=0.51+0.252+0.1253+0.1253
=1.75 bits/symbol
average code length:
Toy Example (Cont)
source entropy:
4
1
2log)(k
kk ppXH
s
b
N
Nl
Total number of bits
Total number of symbols
(bps)
)(2 XHbpsl fixed-length variable-length
)(75.1 XHbpsl
-
8/3/2019 Data Compression Basics
31/41
EE465: Introduction to Digital Image Processing 31
Problems with VLC
When codewords have fixed lengths, the
boundary of codewords is always identifiable.
For codewords with variable lengths, their
boundary could become ambiguous
symbol
S
W
N
E
VLC
0
1
10
11
S S N W S E
0 0 1 11 0 10
0 0 11 1 0 10 0 0 1 11 0 1 0
S S W N S E S S N W S E
e
d d
-
8/3/2019 Data Compression Basics
32/41
EE465: Introduction to Digital Image Processing 32
Uniquely Decodable Codes
To avoid the ambiguity in decoding, we need to
enforce certain conditions with VLC to make
them uniquely decodable
Since ambiguity arises when some codeword
becomes the prefix of the other, it is natural to
consider prefix condition
Example: p pr pre pref prefi prefix
ab: a is the prefix of b
-
8/3/2019 Data Compression Basics
33/41
EE465: Introduction to Digital Image Processing 33
Prefix condition
No codeword is allowed tobe the prefix of any other
codeword.
We will graphically illustrate this condition
with the aid of binary codeword tree
-
8/3/2019 Data Compression Basics
34/41
EE465: Introduction to Digital Image Processing 34
Binary Codeword Tree
1 0
1011 01 00
root
Level 1
Level 2
# of codewords
2
22
2kLevel k
-
8/3/2019 Data Compression Basics
35/41
EE465: Introduction to Digital Image Processing 35
Prefix Condition Examples
symbol x
W
E
S
N
0
1
10
11
codeword 1 codeword 2
0
10
110
111
1 0
1011 01 00
1 0
1011
codeword 1 codeword 2
111 110
-
8/3/2019 Data Compression Basics
36/41
EE465: Introduction to Digital Image Processing 36
How to satisfy prefix condition?
Basic rule: If a node is used as a codeword,
then all its descendants cannot be used as
codeword.
1 0
1011
111 110
Example
-
8/3/2019 Data Compression Basics
37/41
EE465: Introduction to Digital Image Processing 37
Krafts inequality 121
N
i
li
li: length of the i-th codeword
Property of Prefix Codes
W
E
S
N
0
1
10
11
0
10
110
111
symbol x VLC- 1 VLC-2Example
124
1
i
li 124
1
i
li
(proof skipped)
-
8/3/2019 Data Compression Basics
38/41
EE465: Introduction to Digital Image Processing 38
Two Goals of VLC design
log2p(x) For an eventx with probability ofp(x), the optimal
code-length is , where x denotes the
smallest integer larger thanx (e.g., 3.4=4 )
achieve optimal code length (i.e., minimal redundancy)
satisfy prefix condition
code redundancy: 0)( XHlr
Unless probabilities of events are all power of 2,
we often have r>0
-
8/3/2019 Data Compression Basics
39/41
EE465: Introduction to Digital Image Processing 39
Solution:
Huffman Coding (Huffman1952)
we will cover it later while studying JPEG
Arithmetic Coding (1980s)
not covered by EE465 but EE565 (F2008)
-
8/3/2019 Data Compression Basics
40/41
EE465: Introduction to Digital Image Processing 40
Golomb Codes for Geometric Distribution
k
1
2
3
4
5
67
8
codeword
0
10
110
1110
11110
1111101111110
11111110
Optimal VLC for geometric source: P(X=k)=(1/2)k, k=1,2,
01
1 0
1 0
1 0
-
8/3/2019 Data Compression Basics
41/41
EE465: Introduction to Digital Image Processing 41
Summary of Data Compression Basics
Shannons Source entropy formula (theory)
Entropy (uncertainty) is quantified by weighted
self-information
VLC thumb rule (practice)
Long codeword small-probability event Short codeword large-probability event
N
i
ii ppXH1
2log)( bps
)(log)( 2 xpxl