data compression intro
TRANSCRIPT
-
7/31/2019 Data Compression Intro
1/107
DATA COMPRESSIONLecture By
Kiran Kumar KVPESSE
-
7/31/2019 Data Compression Intro
2/107
Block diagram of DataCompression
-
7/31/2019 Data Compression Intro
3/107
INTRODUCTION TO LOSSLESSCOMPRESSION
Unit 1
Chapter 1
-
7/31/2019 Data Compression Intro
4/107
Preface
Introduction
Data
Need for Compression
Compression Techniques
Lossless and Lossy Compression
Performance Measure
Modeling and Coding
Problems
-
7/31/2019 Data Compression Intro
5/107
Introduction
The word data means "to give", hence "somethinggiven".
In geometry, mathematics, engineering, and so on, thetermsgiven and data are used interchangeably.
Also, data is a representation of a fact, figure, and idea.
In computer science: data are numbers, words, images,etc., accepted as they stand.
-
7/31/2019 Data Compression Intro
6/107
Data(Analog)
-
7/31/2019 Data Compression Intro
7/107
First Images sent over Atlantic usingsubmarine cable(telegraph) in 1920's
1964 Lunar Probe
-
7/31/2019 Data Compression Intro
8/107
Data (Analog)
ToThe PrincipalCollege
Respected Sir,Subject: Need a Heater in Class
Students pleasefill in other things.
Yours Sincerely
Faculty
-
7/31/2019 Data Compression Intro
9/107
Data(Digital World)
Raw Data => Digital Data011.0110101.
-
7/31/2019 Data Compression Intro
10/107
Difference b/w Data, Information andKnowledge?
Data is the lowest level ofabstraction, informationis the next level, and finally, knowledge is the highest levelamong all three.
Data on its own carries no meaning. In order fordata to become information, it must be interpreted and takeon a meaning.
For example, the height of Mt. Everest is generallyconsidered as "data", a book on Mt. Everest geological
characteristics may be considered as "information", and areport containing practical information on the best way toreach Mt. Everest's peak may be considered as "knowledge".
-
7/31/2019 Data Compression Intro
11/107
Compression
What is the need of compression?
What are the different kinds ofCompression?
Which is the better one ?
Which technique is used more often?
What the use of combining both the
technique?
-
7/31/2019 Data Compression Intro
12/107
What is the need for compression?
-
7/31/2019 Data Compression Intro
13/107
Need for Compression
Weather Forecasting
Internet data
Broadband
-
7/31/2019 Data Compression Intro
14/107
Need for Compression
Planning Cities
-
7/31/2019 Data Compression Intro
15/107
COMPRESSION TECHNIQUES
Introduction to Lossless Compression
-
7/31/2019 Data Compression Intro
16/107
Different kinds of compression
Loss-less compression
Compressed data can be reconstructedback to the exact original data itself.
Lossy Compression
Compressed data cannot bereconstructed back to the exact originaldata.
-
7/31/2019 Data Compression Intro
17/107
Loss-less Compression
Involve no loss of information.
Area: Text Compression.
Reconstructed text is identical to the original.
Do not send money Do now send money
Other Areas: Radiology, Satellite Imagery.
Main Advantage? Zero Distortion
Main Disadvantage?
Amount of Compression is less when compared to
lossy compression.
-
7/31/2019 Data Compression Intro
18/107
Lossy Compression
Disadvantage:
Data that have been compressed using lossy
techniques generally cannot be recovered or
reconstructed exactly.(Involves some loss of
information) Advantage:
Much higher compression ratios.
Areas
Audio and Video Compression.
(MP3, MPEG, JPEG)
-
7/31/2019 Data Compression Intro
19/107
MEASURE OF PERFORMANCE
Introduction to Lossless Compression
How do we measure or
-
7/31/2019 Data Compression Intro
20/107
How do we measure orquantify Compression
performance?1. Based on Relative complexity of the algorithm?
2. Memory required to implement the algorithm.
3. How fast the algorithm performs on a givenmachine, (Secondary).
or
1. The amount of compression?2. How closely the reconstruction resembles the
original. (Primary).
-
7/31/2019 Data Compression Intro
21/107
Compression Ratio
Mostly widely used measure to compute data
compressed is -- Compression Ratio.
Ratio of the number of bits required to represent the
data before compression to the number of bitsrequired to represent the data after compression.
Example : Suppose storing an image made up of asquare array of 256 X 256 pixels requires 65,536
bytes. The image is compressed and the compressedversion requires 16,384 bytes.
Compression Ratio = 4:1
-
7/31/2019 Data Compression Intro
22/107
Another Measure
Rate: It is the average number of bits required to
represent a single sample.
Consider the last example, 256*256 original imagecontains 65536 bytes. Hence each pixel contains 1
byte or 8 bits per pixel(sample).
Now the compressed image contains 16384 bytes. Somany bits does each pixel contain?
The rate is 2 bits/pixel.
Is the above 2 Measures fine for lossy compression?
-
7/31/2019 Data Compression Intro
23/107
Distortion In lossy compression, the reconstruction differs from the
original data.
In order to determine the efficiency of a compression
algorithm, we have to quantify/measure the difference.
The difference between the original and thereconstruction is called the distortion.
Lossy techniques are generally used for the compression of datathat originate as analog signals, such as speech and video.
For comparison of speech and video, the final arbiter/judge of quality is
human.(behavioral analysis)
Because human responses are difficult to modelmathematically, many approximate measures of distortion
are used to determine thefidelity/quality of the
reconstructed waveforms.
-
7/31/2019 Data Compression Intro
24/107
MODELING AND CODING
Introduction to Lossless Compression
-
7/31/2019 Data Compression Intro
25/107
Modeling and Coding
1) Compression scheme can be either Loss-lessor Lossy, based on the requirements/Appli.
2) Exact compression scheme depends on
different factors. But the main factor itdepends, is based on the characteristics of thedata.
3) Eg. Technique used for compression of a text
may not work well for compressing images.
The best approach for a given application,largely depends on the redundancies inherent
in the data.
-
7/31/2019 Data Compression Intro
26/107
Modeling and Coding
Redundancies
Redundant means that is not needed.
Or that can be omitted without any loss, or
significance.
Example: If we take an portrait image, most of thebackground is same and all of it need not beencoded at all.
The approach may work for one kind ofdata, but may not work for other kind ofdata ( a landscape or group photo).
-
7/31/2019 Data Compression Intro
27/107
Modeling and Coding
The development of data compressionalgorithms for a variety of data can be dividedinto two phases.
Modeling: Extract information about anyredundancy present in the data and model it.
Coding: Description of the model and a"description" of how the data differ from the
model are encoded (binary alphabet).
The difference between the data and themodel is often referred to as the residual.
-
7/31/2019 Data Compression Intro
28/107
DATA MODELING EXAMPLESIntroduction to Lossless Compression
-
7/31/2019 Data Compression Intro
29/107
Example 1Q. Consider the following sequence of numbers X= {x1,x2,
x3,...}: 9 11 11 11 14 13 15 17 16 17 20 21. How many bitsare required to store or transmit every sample?
Ans. 5 bits/sample or by exploiting the structure of data.
1) Model the data
Structure of a straight line.
Y=mX'+c or
Y=X'+8 X'={1 2 ......}
2) Residue
Difference b/w Model and Data
e=X-X'.
0 1 0 - 1 1 - 1 0 1 - 1 - 1 1 1
-
7/31/2019 Data Compression Intro
30/107
Example 1
The residual sequence consists of only three numbers {-1,0, 1}. Assign a code of00 to -1, 01 to 0 & 10 to
1, we need to use 2 bits to represent each element ofthe residual sequence.
Therefore, we can obtain compression by transmittingthe parameters of the model and the residual sequence.
Lossy if only model is transmitted. Lossless if bothresidue/difference and parameter are transmitted.
Q. Model the given data for compression.
{ 6 8 10 10 12 11 12 15 16}
{ 5 6 9 10 11 13 17 19 20}
-
7/31/2019 Data Compression Intro
31/107
Example 2
Q. Find the structure present in this data sequence.
27 28 29 28 26 27 29 28 30 32 34 36 38
I. Ans. No structure is found.
Hence
II. Check for closeness of the
values.
III. 27 1 1 -1 -2 1 2 -1 2 2 2 2
IV. Send First value. Thensend rest of residue .
V. Are the bits/sample reduced?
VI. Decoder adds current value to the previous decoded value to
reconstruct back the original sequence.
-
7/31/2019 Data Compression Intro
32/107
Note
Techniques that use the past values of a sequencetopredictthe current value and then encode theerror in prediction, or residual, are called
predictive coding schemes.
Note: Assuming both encoder and decoder knowthe model being used, we would still have to
send the value of the first element of thesequence.
-
7/31/2019 Data Compression Intro
33/107
Example 3
Suppose we have the following sequence:
aba ray a ranba rrayb ranbfa rbfaa rbfaaa rbaway
To repesent above 8 Symbols, 3 bits/symbol are required.
Suppose if we assign 1 bit to the symbol that occurs most often.
As there are 41 symbols in
the sequence, this works out
to approximately 2.58 bits per
symbol. This means we have
obtained a compression ratio of
1.16:1. (Huffman coding)
Dictionary Compression Scheme. (Letter/Words repeat)
-
7/31/2019 Data Compression Intro
34/107
Note
There will be situations in which it is easier to takeadvantage of the structure if we decompose the datainto a number of components. We can then study eachcomponent separately and use a model appropriate to
that component.There are a number of different ways to characterize
data. Different characterizations will lead to different
compression schemes.
We can compress something with products from onevendor and reconstruct it using the products of a
different vendor. International standards organizationshave standards for various compression applications.
-
7/31/2019 Data Compression Intro
35/107
MATHEMATICAL PRELIMINARIESFOR LOSS-LESS COMPRESSION
Unit-1
Chapter 2
Loss-Less Compression
-
7/31/2019 Data Compression Intro
36/107
Overview
This chapter deals with Lossless schememathematical framework.
Starting with Information Theory
Basic Probability Concept.
Based on the above mathematicalconcepts, modeling of the data.
Introduction to Information
-
7/31/2019 Data Compression Intro
37/107
Introduction to InformationTheory
Quantitative measure of Information.
Father of Information Theory: Claude ElwoodShannon, an electrical Engineer at Bell Labs.
He defined a quantity called Self Information. Example: Given an Random Experiment, if A is an Event
occurring in a set of outcomes. The self information associated
with A is given by
Where i(A) is the self information.
P(A) is the probability of the event A.
Introduction to Information
-
7/31/2019 Data Compression Intro
38/107
Introduction to InformationTheory
Log(1) = 0 and -log(x) increases if x decreases.
Probability of an event is low, information associated
with it is high and vice-versa.
Another Property: Information obtained from theoccurrence of 2 independent events is the sumof information obtained form occurrence of theindividual events.
Suppose A & B are 2 independent events. Theself information associated with the occurrenceof both event A & B is
-
7/31/2019 Data Compression Intro
39/107
Intro. To Information Theory
-
7/31/2019 Data Compression Intro
40/107
Intro. To Information Theory
I T I f i Th
-
7/31/2019 Data Compression Intro
41/107
Intro. To Information Theory
E t
-
7/31/2019 Data Compression Intro
42/107
Entropy
Suppose we have independent events Ai, whichare the outcomes of some experiment S.
Then the average self information associatedwith the random experiment is
This quantity is called entropy associated with
the experiment.(Shannon) Note: Entropy is also the measure of average
no. of binary symbols needed to code the output.
N t
-
7/31/2019 Data Compression Intro
43/107
Note
Most of the experiment results that we see in thissubject are independent and identicaldistributed(iid).
Above Entropy equation holds good only if the
experiment is iid.
Theorem: Shannon showed that the maximum
average no. of bits that a loss-less compression
scheme can achieve will be equal to the entropy of the
source.
The estimate of the entropy depends on ourassumptions about the structure of the sourcesequence.
E l 4
-
7/31/2019 Data Compression Intro
44/107
Example 4
Q. Consider the sequence
1 2 3 2 3 4 5 4 5 6 7 8 9 8 9 10
The probability of occurrence of each element is
P(l) = P(6) = P(7) = P(10) = 1/16
P(2) = P(3) = P(4) = P(5) = P(8) = P(9) =2/16
Assuming the sequence is iid, the entropy for this sequence is first-orderentropy calculated as
Entropy of this source is 3.25bits.
Hence w.r.t. Shannon the maximum no. of bits required tocode the sample is 3.25 bits/sample.
E l 4
-
7/31/2019 Data Compression Intro
45/107
Example 4 Step 2 : Model the given data to remove
redundancy. Solution: there is a sample-to-sample correlation between the
samples and we remove the correlation by taking differences
of neighboring sample values.
1 1 1 - 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1
The sequence is constructed using two values 1 and -1. P(1) =
13/16 and P(-1)=3/16.
The entropy in this case is 0.70 bits per symbol.
Knowing only these sequence is not enough to reconstruct the
original sequence. We must know the process by which this
sequence was generated from the original. Process depends on
the assumption about input data structure.
Assumption = Model
Note
-
7/31/2019 Data Compression Intro
46/107
Note
If the parameter rn does not changes with n, then it is called static model.
A model whose parameters does not
change or adapt with n to the changingcharacteristics of the data is called anadaptive model.
Basically, we see that knowing something
about the structure of the data can help to"reduce the entropy."
Structure
-
7/31/2019 Data Compression Intro
47/107
Structure
Consider the following sequence:1 2 1 2 3 3 3 3 1 2 3 3 3 3 1 2 3 3 12
Obviously, there is some structure to this data. However, if we
look at it one symbol at a time, the structure is difficult to
extract. Consider the probabilities: P{1) = P{2) = 1/4 , and p(3)= 1/2. The entropy is 1.5 bits/symbol. This particular sequence
consists of 20 symbols; therefore, the total number of bits
required to represent this sequence is 30.
Now let's take the same sequence and look at it in blocks of two.
Obviously, there are only two symbols, 1 2, and 3 3. Theprobabilities are P(l 2) = 1/2, P(3 3) = 1/2, and the entropy is 1
bit/symbol.
As there are 10 such symbols in the sequence, we need a total of
10 bits to represent the entire sequencea reduction of a factorof three.
-
7/31/2019 Data Compression Intro
48/107
Derivation of Average Information
Not in Syllabus
Models
-
7/31/2019 Data Compression Intro
49/107
Models
Good models for sources lead to more efficientcompression algorithms.
In general, in order to develop techniques that manipulatedata using mathematical operations, we need to have a
Mathematical model for the data. There are several approaches to build Mathematical
model.
Physical Model
Probability Model
Markov Model
Composite Source Model
Physical Model
-
7/31/2019 Data Compression Intro
50/107
Physical Model
If we know something about the physics of the data generationprocess, we can use that information to construct a model. For
example,
In speech-related applications, knowledge about the physics of
speech production can be used to construct a mathematical
model for the sampled speech process. Sampled speech can
then be encoded using this model.
If residential electrical meter readings at hourly intervals were to
be coded, knowledge about the living habits of the populace
could be used to determine when electricity usage would behigh and when the usage would be low. Then instead of the
actual readings, the difference (residual) between the actual
readings and those predicted by the model could be coded.
Physical Model
-
7/31/2019 Data Compression Intro
51/107
Physical Model
Disadvantages
In general, however, the physics of data
generation is simply too complicated to understand, letalone use to develop a model.
Since the physics of the problem is too complicated,currently we can a model based on empirical
observation of the statistics in data.
Probability Model
-
7/31/2019 Data Compression Intro
52/107
Probability Model
The simplest Mathematical model for the source is toassume that all the events are independent andidentically distributed(IID). Hence the nameignorance model.
Used when we dont know anything about the source.
Next lets assume the events are independent but notequally distributed. Using Entropy equation we can
find the entropy.
For a source that generates letters from an alphabet A={a1,a2 , . . . , aM}, can be represented by a
probability modelP = {P(a1), P(a2) ........P(aM)}
Probability Model
-
7/31/2019 Data Compression Intro
53/107
Probability Model
Next if we discard the assumption ofindependence also, we come up with a betterdata compression scheme but we have to definethe dependency of data sequence on each other.
One of the most popular ways of representingdependence in the data is through the use ofMarkov models, named after the Russian
mathematician Andrei Andrevich Markov(1856-1922).
Markov Models
-
7/31/2019 Data Compression Intro
54/107
Markov Models
For models used in loss-less compression, we use aspecific type of Markov process called a discrete time
Markov chain.
Let{Xn} be a sequence, which is said to follow a Kth-
order morkov model if
P(Xn|Xn-1,........,Xn-k) = P(Xn|Xn-1,........,Xn-k,.........)
The knowledge of the past k symbols is equivalent to
the knowledge of the entire past history of the process. The values taken on by the set process {Xn-1 . . . . . .
,........,Xn-k} are called the states of the process.
Markov Models
-
7/31/2019 Data Compression Intro
55/107
Markov Models
The most commonly used Markov model is the first-order Markov model, for which
P(Xn|Xn-1) = P(Xn|Xn-1,Xn-2,Xn-3.......,)
Markov chain property: probability of each subsequent statedepends only on what was the previous state:
The above equations indicate the existence of dependencebetween samples. However, they do not describe the form of
the dependence.
We can develop different first-order Markov models depending
on our assumption about the form of the dependence betweensamples.
To define Markov model, the following probabilities have to bespecified: transition probabilities P{X2|X1} and initial
probabilities P{X1}.
Markov Models
-
7/31/2019 Data Compression Intro
56/107
Markov Models
If we assumed that the dependence was introduced in a linearmanner, we could view the data sequence as the output of a
linear filter driven by white noise.
The output of such a filter can be given by the difference equation
En is the white noise.
This model is often used when developing coding algorithms for
speech and images.
The Markov model does not need the assumption of linearity
Markov Model Example
-
7/31/2019 Data Compression Intro
57/107
Markov Model Example
For example, consider a binary image. The image has only twotypes of pixels, white pixels and black pixels.
Q. Based on the current pixel appearance, can we predict the
appearance the next observation?
Ans. Yes, we can model the pixel process as a discrete timeMarkov chain.
Define two states Swand Sb.{Sw} would correspond to the case
where the current pixel is a white pixel, and {Sb} corresponds
to the case where the current pixel is a black pixel).
We define the transition probabilities P{w/b)and P(b/w), and the
probability of being in each state P(Sw) and P{Sb).
The Markov model can then be represented by the state diagram
shown in Figure.
Markov Model
-
7/31/2019 Data Compression Intro
58/107
Markov Model
The entropy of a finite state process with states S, issimply the average value of the entropy at each state:
Example of Markov Model
-
7/31/2019 Data Compression Intro
59/107
Rain Dry
0.70.3
0.2 0.8
Two states : Rain and Dry.
Transition probabilities: P(Rain|Rain)=0.3 ,P(Dry|Rain)=0.7 , P(Rain|Dry)=0.2, P(Dry|Dry)=0.8 Initial probabilities: say P(Rain)=0.4 , P(Dry)=0.6 .
Example of Markov Model
Markov Model Example
-
7/31/2019 Data Compression Intro
60/107
Markov Model Example
Markov Model in textC i
-
7/31/2019 Data Compression Intro
61/107
Compression
Probability of the next letter is heavily influencedby the preceding letter in English.
Current Text Compression literature, the k-orderMarkov models are widely known as finite
context models, the word context is being usedfor state. Example:
Consider the word preceding .
Suppose we have already processedprecedin
and are goingto encode the next letter.
If no account of context is taken intoconsideration and we treat the letter as asurprise, the probability of the letter goccurring is
relatively low
Example
-
7/31/2019 Data Compression Intro
62/107
p
If we use a first-order Markov model (i.e. we look atn probability model), we can see that the probabilityof g would increase substantially.
As we increase the context size (go from n to in to
din and so on), the probability of the alphabetbecomes more and entropy decreases.
Shannon used a second-order model for Englishtext consisting of the 26 letters and one space toobtain an entropy of 3.1 bits/letter . Using a model
where the output symbols were words rather thanletters brought down the entropy to 2.4 bits/letter.
Note: The longer the context, the better its predictivevalue.
Markov Model in TextCompression
-
7/31/2019 Data Compression Intro
63/107
Compression
Disadvantage:To store the probability model with respect to all
contexts of a given length, the number of contextswould grow exponentially with the length of context.
Source imposes some structure on its output, many of
these contexts may correspond to strings that wouldnever occur in practice.
Different sources may have different repeating patterns.
Solution: PPM(Prediction& Partial March) Algorithm.Context is found for the symbol of non-zero/max
probability first(Encoding). Zero Prob symbols are
replaced with escape symbols and computed
Composite Source Model
-
7/31/2019 Data Compression Intro
64/107
p
In many applications, it is noteasy to use a single model to
describe the source.
In such cases,we can define a
composite source, which can
be viewed as a combinationor composition of several
sources, with only one source
being active at any given
time.
Source Si will have its own
Model Mi based on
probability Pi.
Coding
-
7/31/2019 Data Compression Intro
65/107
g
Coding: Assignment of binary sequences (0s or 1s) toelements or symbols.
The set of binary sequences is called a code,and the individualmembers of the set are calledcodewords.
Code ( 100101100110010101) Codewords ( a -> 001, b -> 010)
An alphabet is a collection of symbols called letters.For example,the alphabet used in writing most books consists of the 26 lowercase
letters, 26 uppercase letters, and a variety of punctuation marks.
In the terminology used in this book, a comma is a letter.
The ASCII code for the letter a is 1000011, the letter A is coded as1000001, and the letter "," is coded as 0011010.
Notice that the ASCII code uses the same number of bits to representeach symbol. Such a code is called afixed-length code.
Coding
-
7/31/2019 Data Compression Intro
66/107
g
To reduce the number of bits required to represent differentmessages, we need to use a different number of bits to
represent different symbols.
If we use fewer bits to represent symbols that occur more often,on the average we would use fewer bits per symbol. The
average number of bits per symbol is often called the rate ofthe code.
Example: Morse Code, Huffman code.
letters that occur more frequently are shorter than for letters
that occur less frequently.
The codeword forE is 1 bit
while the codeword for Z is 7 bits.
Uniquely Decodable Codes
-
7/31/2019 Data Compression Intro
67/107
q y
Average length of the code, is not only thecriteria for good code.
Example Suppose our source alphabet consists of fourletters a1, a2, a3 & a4 with probabilities P(a1) = 1/2 ,
P(a2) = 1/4, P(a3) = P(a4) = 1/8. The entropy for thissource is 1.75 bits/symbol.
where n(ai) is the number of bits in the codeword for
letter ai and the average length is given in bits/symbol.
Uniquely Decodable Codes
-
7/31/2019 Data Compression Intro
68/107
From the table, w.r.t average lengthCode1 appears to be the best code.
However code should have the ability totransfer information in an unambiguousway.
Uniquely Decodable Codes
-
7/31/2019 Data Compression Intro
69/107
Code 1 Both a1 and a2 have been assigned the codeword 0. When
a 0 is received, there is no way to know whether an a1was transmitted or an a2. Hence we would like each
symbol to be assigned a unique codeword.
Uniquely Decodable Code
-
7/31/2019 Data Compression Intro
70/107
Code 2 seems to have no problem with ambiguity. However if we encode {a2 a1 a1}. Binary string
would be 100.
However 100 can be decoded as {a2 a1 a1} and {a2
a3}
Meaning original sequence cannot be recovered withcertainty.
There is no Unique decodability. (Not Desirable)
Uniquely Decodable Code
-
7/31/2019 Data Compression Intro
71/107
How about Code 3? First 3 codewords end with 0. 0 denotes
termination of codeword.
And a4 codeword is 3-bit 1's. Which iseasily decodable.
Code 3
-
7/31/2019 Data Compression Intro
72/107
Code 3? Notice that the first three codewords all end in a 0. In fact,
a 0 always denotes the termination of a codeword.
The final codeword contains no 0s and is 3 bits long.
Because all other codewords have fewer than three 1s andterminate in a 0, the only way we can get three as in a rowis as a code for a4.
The decoding rule is simple. Accumulate bits until you get
a 0 or until you have three 1s. There is no ambiguity in this rule, and it is reasonably
easy to see that this code is uniquely decodable.
Uniquely Decodable Code
-
7/31/2019 Data Compression Intro
73/107
Code 4. Each codeword starts with a 0, and the only time we see a
0 is in the beginning of a codeword.
Decoding rule is to accumulate bits until you see a 0. The
bit before 0 is the last bit of the previous codeword.
Code 4
-
7/31/2019 Data Compression Intro
74/107
Difference between Code 3 and Code 4 is that the Code 3,decoder knows the moment a code is complete. In Code 4,
we have to wait till the beginning of the next codewordbefore we know that the current codeword is complete.
Because of this property, Code 3 is called aninstantaneous
code and code 4 is near instantaneous code.
Q). Is code 4 Uniquely Decodable?
Ans. Decode the string 011111111111111111.
Uniquely Decodable Code
-
7/31/2019 Data Compression Intro
75/107
Instantaneous and near-Instantaneous
Decode 011111111111111111 from Code 5.
From the above string
First codeword can be
either 0 (a1) or 01(a2). Assuming 1st codeword is a1,
after decoding other 8 codewords as a3's. We will be leftwith a single (dangling) 1.
If we assume 1st codeword as a2, we will be able todecode next 16 codewords as 8 a3's.
The string can be uniquely decoded. In fact. Code 5,
while it is certainly not instantaneous, but is uniquely
decodable in one way and not unique in other way
Uniquely Decodable Code
-
7/31/2019 Data Compression Intro
76/107
Decode the 01010101010101010 from Code 6. Step 1: Decode a1 and 8 a3's.
Step 2: Decode 8 a2's and one a1.
Not Uniquely Decodable.
Even with looking at these small codes, it is not
immediately evident whether the code is uniquely
decodable or not.For Lager codes?
Hence a systematic procedure should be followed to testfor unique decodability.
-
7/31/2019 Data Compression Intro
77/107
A Test for Unique Decodability is
not there in portion
Test for Unique Decodability:Example 1
-
7/31/2019 Data Compression Intro
78/107
p
Consider Code 5. First list the codewords
{0,01,11}
The codeword 0 is a prefix for thecodeword 01.
Hence the dangling suffix is 1.
There are no other pairs for which oneelement of the pair is the prefix of theother.
Example 1
-
7/31/2019 Data Compression Intro
79/107
Let us augment (add) the codeword listwith the dangling suffix.
{0,01,11,1}
Comparing the elements of this list, we find0 is a prefix of 01 with a dangling suffix of1. But we have already included 1 in ourlist.
Also, 1 is a prefix of 11. This gives us adangling suffix of 1, which is already in thelist.
Example 1
-
7/31/2019 Data Compression Intro
80/107
There are no other pairs that wouldgenerate a dangling suffix, so we cannotaugment the list any further.
Therefore, Code 5 is uniquely decodable.
Test for Unique Decodability:Example 2
-
7/31/2019 Data Compression Intro
81/107
p
Consider Code 6. First list the codewords {0,01,10}
The codeword 0 is a prefix for the
codeword 01. The dangling suffix is 1. There are no other pairs for which one
element of the pair is the prefix of theother.
Augmenting the codeword list with 1, weobtain the list
{0,01,10,1}
Example 2
-
7/31/2019 Data Compression Intro
82/107
In this list, 1 is a prefix for 10. The danglingsuffix for this pair is 0, which is thecodeword for a1.
Therefore, Code 6 is not uniquely
decodable.
Prefix Codes
-
7/31/2019 Data Compression Intro
83/107
The test for unique Decodability requires examining thedangling suffixes.
If the dangling suffix is itself a codeword, then the code isnot uniquely decodable.
One type of code in which we will never face thepossibility of a dangling suffix being a codeword is a code
in which no codeword is a prefix of the other
A code in which no codeword is a prefix to another
codeword is called aprefix code. A simple way to check if a code is a prefix code is to draw
the rooted binary tree corresponding to the code.
Prefix Codes
-
7/31/2019 Data Compression Intro
84/107
Draw a tree that starts from a single node(the root node) &has possible 2 branches at each node.
One of the branch corresponds to 1 and the other 0.
The Convention followed is that root node at the top, left
branch is 0 and the right branch is 1. Using convention, draw binary tree for Code 2, 3 & 4.
Prefix Codes
-
7/31/2019 Data Compression Intro
85/107
-
7/31/2019 Data Compression Intro
86/107
-
7/31/2019 Data Compression Intro
87/107
The Kraft-McMillan Inequality
Not in Syllabus
Algorithmic information Theory
-
7/31/2019 Data Compression Intro
88/107
Information theory dealt with data andcorresponding from it.
While Algorithmic Information Theory deals with
program you code for compressing the data.
At the heart of algorithmic information theory is ameasure called Kolmogorov complexity.
The Kolmogorov complexity k(x)) of a sequencex isthe size of the program needed to generatex.
Size: includesall the needed i/p's for the program are
present.
Algorithmic information Theory
If f ll hi hl ibl
-
7/31/2019 Data Compression Intro
89/107
Ifxwas a sequence of all ones, a highly compressible
sequence, the program would simply be a printstatement in a loop.
On the other extreme, ifx were a random sequencewith no structure then the only program that could
generate it would contain the sequence itself.
The size of the program, would be slightly larger thanthe sequence itself.
Thus, there is a clear correspondence between thesize of the smallest program that can generate asequence and the amount of compression that can be
obtained.
Lo er bo nd ncertain and is not practicall sed
-
7/31/2019 Data Compression Intro
90/107
Huffman Coding
Huffman CodingOverview
The Huffman Coding Algorithm
-
7/31/2019 Data Compression Intro
91/107
The Huffman Coding Algorithm
Developer: David Huffman class assignment;information theory, taught by Robert Fano at MIT.
These codes are prefix codes and are optimum for agiven model (set of probabilities).
The Huffman procedure is based on twoobservations regarding optimum prefix codes.
1. In an optimum code, symbols that occurmore frequently (have a higher probability
of occurrence) will have shorter codewordsthan symbols that occur less frequently.
2. In an optimum code, the two symbolsthat occur least frequently will have the
same len th
Design of Huffman Code
-
7/31/2019 Data Compression Intro
92/107
Let us design a Huffman code for a source that puts outletters from an alphabetA = {a1, a2, a3, a4, a5} with
P(a1) = P(a3) = 0.2, P(a2) = 0.4, and P(a4) = P(a5) =0.1.
First find first order Entropy?
Step1: Sort the letters in Descending Probability order.
Huffman Coding AlgorithmExample
-
7/31/2019 Data Compression Intro
93/107
Step 3: Find the average length.
L = . 4 x l + . 2 x 2 + . 2 x 3 + . l x 4 + . l x 4 = 2.2
bits/symbol.
Step 4: Calculate Redundancy?
Example Step 3: Find the average length.
-
7/31/2019 Data Compression Intro
94/107
L = . 4 x l + . 2 x 2 + . 2 x 3 + . l x 4 + . l x 4 = 2.2bits/symbol.
Step 4: Calculate Redundancy?
Step 5: Binary Huffman Tree
Example 2
-
7/31/2019 Data Compression Intro
95/107
Transmit 28 data samples using HuffmanCode.
1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 55 5 6 6 7
Minimum Variance HuffmanCoding
-
7/31/2019 Data Compression Intro
96/107
Minimum Variance HuffmanCoding
-
7/31/2019 Data Compression Intro
97/107
L = . 4 x 2 + . 2 x 2 + . 2 x 2 + . 1 x 3 + . 1 x 3 = 2.2
bits/symbol.
The two codes are identical in terms of their redundancy.However, the variance of the length of the codewords
is significantly different.
Minimum Variance HuffmanCodes
R b th t i li ti lth h i ht b
-
7/31/2019 Data Compression Intro
98/107
Remember that in many applications, although you might be
using a variable-length code, the available transmissionrate is generally fixed.
For example, if we were going to transmit symbols from thealphabet we have been using at 10,000 symbols per
second, we might ask for transmission capacity of 22,000bits per second. This means that during each second thechannel expects to receive 22,000 bits, no more and no
less. As the bit generation rate will vary around 22,000bits per second, the output of the source coder is generally
fed into a buffer. The purpose of the buffer is to smoothout the variations in the bit generation rate.
However, the buffer has to be of finite size, and the greater
the variance in the codewords, the more difficult the buffer
Minimum variance Huffmancoding
-
7/31/2019 Data Compression Intro
99/107
Suppose that the source generates a string ofa4 and a5 forseveral seconds. If we are using the first code, this means
that we will be generating bits at a rate of 40,000 bits persecond. For each second, the buffer has to store 18,000
bits
On the other hand, if we use the second code, we would begenerating 30,000 bits per second, and the buffer would
have to store 8000 bits for every second.
If we have a string of a2'sinstead of a string of a4and a5, the
first code would result in the generation of 10,000 bits persecond. Deficit of 12000 bits per second.
Second code would lead to deficit of 2000 bits per second.
So which do we select?
Application of Huffman coding
H ff di i f d i j i i h h
-
7/31/2019 Data Compression Intro
100/107
Huffman coding is often used in conjunction with othercoding techniques in
Loss-less Image Compression
Text Compression
Audio Compression
Loss-Less Image Compression
-
7/31/2019 Data Compression Intro
101/107
Monochrome Image Pixel(0-255)
Monochrome Images
-
7/31/2019 Data Compression Intro
102/107
Compression of test imagesusing Huffman Coding
-
7/31/2019 Data Compression Intro
103/107
Original (Uncompressed) test images are
represented using bits/pixel.
The image consists of 256 rows of 256 pixels, so theuncompressed representation uses 65,536 bytes.
Image Compression
From a visual inspection of the test images we can clearly
-
7/31/2019 Data Compression Intro
104/107
From a visual inspection of the test images, we can clearlysee that the pixels in an image are heavily correlated with
their neighbors.
We could represent this structure with the crude model
Xn = Xn-1 The residual would be the difference between
neighboring pixels.
Huffman Coding TextCompression
-
7/31/2019 Data Compression Intro
105/107
We encoded the earlier version of this chapter using Huffman codes that werecreated using the probabilities of occurrence obtained from the chapter. The file
size dropped from about 70,000 bytes to about 43,000 bytes with Huffman
di
Audio Compression
-
7/31/2019 Data Compression Intro
106/107
-
7/31/2019 Data Compression Intro
107/107
The End of Unit 1Any Thoughts , Doubts or Ideas |