data compression intro

Upload: madanram

Post on 05-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Data Compression Intro

    1/107

    DATA COMPRESSIONLecture By

    Kiran Kumar KVPESSE

  • 7/31/2019 Data Compression Intro

    2/107

    Block diagram of DataCompression

  • 7/31/2019 Data Compression Intro

    3/107

    INTRODUCTION TO LOSSLESSCOMPRESSION

    Unit 1

    Chapter 1

  • 7/31/2019 Data Compression Intro

    4/107

    Preface

    Introduction

    Data

    Need for Compression

    Compression Techniques

    Lossless and Lossy Compression

    Performance Measure

    Modeling and Coding

    Problems

  • 7/31/2019 Data Compression Intro

    5/107

    Introduction

    The word data means "to give", hence "somethinggiven".

    In geometry, mathematics, engineering, and so on, thetermsgiven and data are used interchangeably.

    Also, data is a representation of a fact, figure, and idea.

    In computer science: data are numbers, words, images,etc., accepted as they stand.

  • 7/31/2019 Data Compression Intro

    6/107

    Data(Analog)

  • 7/31/2019 Data Compression Intro

    7/107

    First Images sent over Atlantic usingsubmarine cable(telegraph) in 1920's

    1964 Lunar Probe

  • 7/31/2019 Data Compression Intro

    8/107

    Data (Analog)

    ToThe PrincipalCollege

    Respected Sir,Subject: Need a Heater in Class

    Students pleasefill in other things.

    Yours Sincerely

    Faculty

  • 7/31/2019 Data Compression Intro

    9/107

    Data(Digital World)

    Raw Data => Digital Data011.0110101.

  • 7/31/2019 Data Compression Intro

    10/107

    Difference b/w Data, Information andKnowledge?

    Data is the lowest level ofabstraction, informationis the next level, and finally, knowledge is the highest levelamong all three.

    Data on its own carries no meaning. In order fordata to become information, it must be interpreted and takeon a meaning.

    For example, the height of Mt. Everest is generallyconsidered as "data", a book on Mt. Everest geological

    characteristics may be considered as "information", and areport containing practical information on the best way toreach Mt. Everest's peak may be considered as "knowledge".

  • 7/31/2019 Data Compression Intro

    11/107

    Compression

    What is the need of compression?

    What are the different kinds ofCompression?

    Which is the better one ?

    Which technique is used more often?

    What the use of combining both the

    technique?

  • 7/31/2019 Data Compression Intro

    12/107

    What is the need for compression?

  • 7/31/2019 Data Compression Intro

    13/107

    Need for Compression

    Weather Forecasting

    Internet data

    Broadband

  • 7/31/2019 Data Compression Intro

    14/107

    Need for Compression

    Planning Cities

  • 7/31/2019 Data Compression Intro

    15/107

    COMPRESSION TECHNIQUES

    Introduction to Lossless Compression

  • 7/31/2019 Data Compression Intro

    16/107

    Different kinds of compression

    Loss-less compression

    Compressed data can be reconstructedback to the exact original data itself.

    Lossy Compression

    Compressed data cannot bereconstructed back to the exact originaldata.

  • 7/31/2019 Data Compression Intro

    17/107

    Loss-less Compression

    Involve no loss of information.

    Area: Text Compression.

    Reconstructed text is identical to the original.

    Do not send money Do now send money

    Other Areas: Radiology, Satellite Imagery.

    Main Advantage? Zero Distortion

    Main Disadvantage?

    Amount of Compression is less when compared to

    lossy compression.

  • 7/31/2019 Data Compression Intro

    18/107

    Lossy Compression

    Disadvantage:

    Data that have been compressed using lossy

    techniques generally cannot be recovered or

    reconstructed exactly.(Involves some loss of

    information) Advantage:

    Much higher compression ratios.

    Areas

    Audio and Video Compression.

    (MP3, MPEG, JPEG)

  • 7/31/2019 Data Compression Intro

    19/107

    MEASURE OF PERFORMANCE

    Introduction to Lossless Compression

    How do we measure or

  • 7/31/2019 Data Compression Intro

    20/107

    How do we measure orquantify Compression

    performance?1. Based on Relative complexity of the algorithm?

    2. Memory required to implement the algorithm.

    3. How fast the algorithm performs on a givenmachine, (Secondary).

    or

    1. The amount of compression?2. How closely the reconstruction resembles the

    original. (Primary).

  • 7/31/2019 Data Compression Intro

    21/107

    Compression Ratio

    Mostly widely used measure to compute data

    compressed is -- Compression Ratio.

    Ratio of the number of bits required to represent the

    data before compression to the number of bitsrequired to represent the data after compression.

    Example : Suppose storing an image made up of asquare array of 256 X 256 pixels requires 65,536

    bytes. The image is compressed and the compressedversion requires 16,384 bytes.

    Compression Ratio = 4:1

  • 7/31/2019 Data Compression Intro

    22/107

    Another Measure

    Rate: It is the average number of bits required to

    represent a single sample.

    Consider the last example, 256*256 original imagecontains 65536 bytes. Hence each pixel contains 1

    byte or 8 bits per pixel(sample).

    Now the compressed image contains 16384 bytes. Somany bits does each pixel contain?

    The rate is 2 bits/pixel.

    Is the above 2 Measures fine for lossy compression?

  • 7/31/2019 Data Compression Intro

    23/107

    Distortion In lossy compression, the reconstruction differs from the

    original data.

    In order to determine the efficiency of a compression

    algorithm, we have to quantify/measure the difference.

    The difference between the original and thereconstruction is called the distortion.

    Lossy techniques are generally used for the compression of datathat originate as analog signals, such as speech and video.

    For comparison of speech and video, the final arbiter/judge of quality is

    human.(behavioral analysis)

    Because human responses are difficult to modelmathematically, many approximate measures of distortion

    are used to determine thefidelity/quality of the

    reconstructed waveforms.

  • 7/31/2019 Data Compression Intro

    24/107

    MODELING AND CODING

    Introduction to Lossless Compression

  • 7/31/2019 Data Compression Intro

    25/107

    Modeling and Coding

    1) Compression scheme can be either Loss-lessor Lossy, based on the requirements/Appli.

    2) Exact compression scheme depends on

    different factors. But the main factor itdepends, is based on the characteristics of thedata.

    3) Eg. Technique used for compression of a text

    may not work well for compressing images.

    The best approach for a given application,largely depends on the redundancies inherent

    in the data.

  • 7/31/2019 Data Compression Intro

    26/107

    Modeling and Coding

    Redundancies

    Redundant means that is not needed.

    Or that can be omitted without any loss, or

    significance.

    Example: If we take an portrait image, most of thebackground is same and all of it need not beencoded at all.

    The approach may work for one kind ofdata, but may not work for other kind ofdata ( a landscape or group photo).

  • 7/31/2019 Data Compression Intro

    27/107

    Modeling and Coding

    The development of data compressionalgorithms for a variety of data can be dividedinto two phases.

    Modeling: Extract information about anyredundancy present in the data and model it.

    Coding: Description of the model and a"description" of how the data differ from the

    model are encoded (binary alphabet).

    The difference between the data and themodel is often referred to as the residual.

  • 7/31/2019 Data Compression Intro

    28/107

    DATA MODELING EXAMPLESIntroduction to Lossless Compression

  • 7/31/2019 Data Compression Intro

    29/107

    Example 1Q. Consider the following sequence of numbers X= {x1,x2,

    x3,...}: 9 11 11 11 14 13 15 17 16 17 20 21. How many bitsare required to store or transmit every sample?

    Ans. 5 bits/sample or by exploiting the structure of data.

    1) Model the data

    Structure of a straight line.

    Y=mX'+c or

    Y=X'+8 X'={1 2 ......}

    2) Residue

    Difference b/w Model and Data

    e=X-X'.

    0 1 0 - 1 1 - 1 0 1 - 1 - 1 1 1

  • 7/31/2019 Data Compression Intro

    30/107

    Example 1

    The residual sequence consists of only three numbers {-1,0, 1}. Assign a code of00 to -1, 01 to 0 & 10 to

    1, we need to use 2 bits to represent each element ofthe residual sequence.

    Therefore, we can obtain compression by transmittingthe parameters of the model and the residual sequence.

    Lossy if only model is transmitted. Lossless if bothresidue/difference and parameter are transmitted.

    Q. Model the given data for compression.

    { 6 8 10 10 12 11 12 15 16}

    { 5 6 9 10 11 13 17 19 20}

  • 7/31/2019 Data Compression Intro

    31/107

    Example 2

    Q. Find the structure present in this data sequence.

    27 28 29 28 26 27 29 28 30 32 34 36 38

    I. Ans. No structure is found.

    Hence

    II. Check for closeness of the

    values.

    III. 27 1 1 -1 -2 1 2 -1 2 2 2 2

    IV. Send First value. Thensend rest of residue .

    V. Are the bits/sample reduced?

    VI. Decoder adds current value to the previous decoded value to

    reconstruct back the original sequence.

  • 7/31/2019 Data Compression Intro

    32/107

    Note

    Techniques that use the past values of a sequencetopredictthe current value and then encode theerror in prediction, or residual, are called

    predictive coding schemes.

    Note: Assuming both encoder and decoder knowthe model being used, we would still have to

    send the value of the first element of thesequence.

  • 7/31/2019 Data Compression Intro

    33/107

    Example 3

    Suppose we have the following sequence:

    aba ray a ranba rrayb ranbfa rbfaa rbfaaa rbaway

    To repesent above 8 Symbols, 3 bits/symbol are required.

    Suppose if we assign 1 bit to the symbol that occurs most often.

    As there are 41 symbols in

    the sequence, this works out

    to approximately 2.58 bits per

    symbol. This means we have

    obtained a compression ratio of

    1.16:1. (Huffman coding)

    Dictionary Compression Scheme. (Letter/Words repeat)

  • 7/31/2019 Data Compression Intro

    34/107

    Note

    There will be situations in which it is easier to takeadvantage of the structure if we decompose the datainto a number of components. We can then study eachcomponent separately and use a model appropriate to

    that component.There are a number of different ways to characterize

    data. Different characterizations will lead to different

    compression schemes.

    We can compress something with products from onevendor and reconstruct it using the products of a

    different vendor. International standards organizationshave standards for various compression applications.

  • 7/31/2019 Data Compression Intro

    35/107

    MATHEMATICAL PRELIMINARIESFOR LOSS-LESS COMPRESSION

    Unit-1

    Chapter 2

    Loss-Less Compression

  • 7/31/2019 Data Compression Intro

    36/107

    Overview

    This chapter deals with Lossless schememathematical framework.

    Starting with Information Theory

    Basic Probability Concept.

    Based on the above mathematicalconcepts, modeling of the data.

    Introduction to Information

  • 7/31/2019 Data Compression Intro

    37/107

    Introduction to InformationTheory

    Quantitative measure of Information.

    Father of Information Theory: Claude ElwoodShannon, an electrical Engineer at Bell Labs.

    He defined a quantity called Self Information. Example: Given an Random Experiment, if A is an Event

    occurring in a set of outcomes. The self information associated

    with A is given by

    Where i(A) is the self information.

    P(A) is the probability of the event A.

    Introduction to Information

  • 7/31/2019 Data Compression Intro

    38/107

    Introduction to InformationTheory

    Log(1) = 0 and -log(x) increases if x decreases.

    Probability of an event is low, information associated

    with it is high and vice-versa.

    Another Property: Information obtained from theoccurrence of 2 independent events is the sumof information obtained form occurrence of theindividual events.

    Suppose A & B are 2 independent events. Theself information associated with the occurrenceof both event A & B is

  • 7/31/2019 Data Compression Intro

    39/107

    Intro. To Information Theory

  • 7/31/2019 Data Compression Intro

    40/107

    Intro. To Information Theory

    I T I f i Th

  • 7/31/2019 Data Compression Intro

    41/107

    Intro. To Information Theory

    E t

  • 7/31/2019 Data Compression Intro

    42/107

    Entropy

    Suppose we have independent events Ai, whichare the outcomes of some experiment S.

    Then the average self information associatedwith the random experiment is

    This quantity is called entropy associated with

    the experiment.(Shannon) Note: Entropy is also the measure of average

    no. of binary symbols needed to code the output.

    N t

  • 7/31/2019 Data Compression Intro

    43/107

    Note

    Most of the experiment results that we see in thissubject are independent and identicaldistributed(iid).

    Above Entropy equation holds good only if the

    experiment is iid.

    Theorem: Shannon showed that the maximum

    average no. of bits that a loss-less compression

    scheme can achieve will be equal to the entropy of the

    source.

    The estimate of the entropy depends on ourassumptions about the structure of the sourcesequence.

    E l 4

  • 7/31/2019 Data Compression Intro

    44/107

    Example 4

    Q. Consider the sequence

    1 2 3 2 3 4 5 4 5 6 7 8 9 8 9 10

    The probability of occurrence of each element is

    P(l) = P(6) = P(7) = P(10) = 1/16

    P(2) = P(3) = P(4) = P(5) = P(8) = P(9) =2/16

    Assuming the sequence is iid, the entropy for this sequence is first-orderentropy calculated as

    Entropy of this source is 3.25bits.

    Hence w.r.t. Shannon the maximum no. of bits required tocode the sample is 3.25 bits/sample.

    E l 4

  • 7/31/2019 Data Compression Intro

    45/107

    Example 4 Step 2 : Model the given data to remove

    redundancy. Solution: there is a sample-to-sample correlation between the

    samples and we remove the correlation by taking differences

    of neighboring sample values.

    1 1 1 - 1 1 1 1 - 1 1 1 1 1 1 - 1 1 1

    The sequence is constructed using two values 1 and -1. P(1) =

    13/16 and P(-1)=3/16.

    The entropy in this case is 0.70 bits per symbol.

    Knowing only these sequence is not enough to reconstruct the

    original sequence. We must know the process by which this

    sequence was generated from the original. Process depends on

    the assumption about input data structure.

    Assumption = Model

    Note

  • 7/31/2019 Data Compression Intro

    46/107

    Note

    If the parameter rn does not changes with n, then it is called static model.

    A model whose parameters does not

    change or adapt with n to the changingcharacteristics of the data is called anadaptive model.

    Basically, we see that knowing something

    about the structure of the data can help to"reduce the entropy."

    Structure

  • 7/31/2019 Data Compression Intro

    47/107

    Structure

    Consider the following sequence:1 2 1 2 3 3 3 3 1 2 3 3 3 3 1 2 3 3 12

    Obviously, there is some structure to this data. However, if we

    look at it one symbol at a time, the structure is difficult to

    extract. Consider the probabilities: P{1) = P{2) = 1/4 , and p(3)= 1/2. The entropy is 1.5 bits/symbol. This particular sequence

    consists of 20 symbols; therefore, the total number of bits

    required to represent this sequence is 30.

    Now let's take the same sequence and look at it in blocks of two.

    Obviously, there are only two symbols, 1 2, and 3 3. Theprobabilities are P(l 2) = 1/2, P(3 3) = 1/2, and the entropy is 1

    bit/symbol.

    As there are 10 such symbols in the sequence, we need a total of

    10 bits to represent the entire sequencea reduction of a factorof three.

  • 7/31/2019 Data Compression Intro

    48/107

    Derivation of Average Information

    Not in Syllabus

    Models

  • 7/31/2019 Data Compression Intro

    49/107

    Models

    Good models for sources lead to more efficientcompression algorithms.

    In general, in order to develop techniques that manipulatedata using mathematical operations, we need to have a

    Mathematical model for the data. There are several approaches to build Mathematical

    model.

    Physical Model

    Probability Model

    Markov Model

    Composite Source Model

    Physical Model

  • 7/31/2019 Data Compression Intro

    50/107

    Physical Model

    If we know something about the physics of the data generationprocess, we can use that information to construct a model. For

    example,

    In speech-related applications, knowledge about the physics of

    speech production can be used to construct a mathematical

    model for the sampled speech process. Sampled speech can

    then be encoded using this model.

    If residential electrical meter readings at hourly intervals were to

    be coded, knowledge about the living habits of the populace

    could be used to determine when electricity usage would behigh and when the usage would be low. Then instead of the

    actual readings, the difference (residual) between the actual

    readings and those predicted by the model could be coded.

    Physical Model

  • 7/31/2019 Data Compression Intro

    51/107

    Physical Model

    Disadvantages

    In general, however, the physics of data

    generation is simply too complicated to understand, letalone use to develop a model.

    Since the physics of the problem is too complicated,currently we can a model based on empirical

    observation of the statistics in data.

    Probability Model

  • 7/31/2019 Data Compression Intro

    52/107

    Probability Model

    The simplest Mathematical model for the source is toassume that all the events are independent andidentically distributed(IID). Hence the nameignorance model.

    Used when we dont know anything about the source.

    Next lets assume the events are independent but notequally distributed. Using Entropy equation we can

    find the entropy.

    For a source that generates letters from an alphabet A={a1,a2 , . . . , aM}, can be represented by a

    probability modelP = {P(a1), P(a2) ........P(aM)}

    Probability Model

  • 7/31/2019 Data Compression Intro

    53/107

    Probability Model

    Next if we discard the assumption ofindependence also, we come up with a betterdata compression scheme but we have to definethe dependency of data sequence on each other.

    One of the most popular ways of representingdependence in the data is through the use ofMarkov models, named after the Russian

    mathematician Andrei Andrevich Markov(1856-1922).

    Markov Models

  • 7/31/2019 Data Compression Intro

    54/107

    Markov Models

    For models used in loss-less compression, we use aspecific type of Markov process called a discrete time

    Markov chain.

    Let{Xn} be a sequence, which is said to follow a Kth-

    order morkov model if

    P(Xn|Xn-1,........,Xn-k) = P(Xn|Xn-1,........,Xn-k,.........)

    The knowledge of the past k symbols is equivalent to

    the knowledge of the entire past history of the process. The values taken on by the set process {Xn-1 . . . . . .

    ,........,Xn-k} are called the states of the process.

    Markov Models

  • 7/31/2019 Data Compression Intro

    55/107

    Markov Models

    The most commonly used Markov model is the first-order Markov model, for which

    P(Xn|Xn-1) = P(Xn|Xn-1,Xn-2,Xn-3.......,)

    Markov chain property: probability of each subsequent statedepends only on what was the previous state:

    The above equations indicate the existence of dependencebetween samples. However, they do not describe the form of

    the dependence.

    We can develop different first-order Markov models depending

    on our assumption about the form of the dependence betweensamples.

    To define Markov model, the following probabilities have to bespecified: transition probabilities P{X2|X1} and initial

    probabilities P{X1}.

    Markov Models

  • 7/31/2019 Data Compression Intro

    56/107

    Markov Models

    If we assumed that the dependence was introduced in a linearmanner, we could view the data sequence as the output of a

    linear filter driven by white noise.

    The output of such a filter can be given by the difference equation

    En is the white noise.

    This model is often used when developing coding algorithms for

    speech and images.

    The Markov model does not need the assumption of linearity

    Markov Model Example

  • 7/31/2019 Data Compression Intro

    57/107

    Markov Model Example

    For example, consider a binary image. The image has only twotypes of pixels, white pixels and black pixels.

    Q. Based on the current pixel appearance, can we predict the

    appearance the next observation?

    Ans. Yes, we can model the pixel process as a discrete timeMarkov chain.

    Define two states Swand Sb.{Sw} would correspond to the case

    where the current pixel is a white pixel, and {Sb} corresponds

    to the case where the current pixel is a black pixel).

    We define the transition probabilities P{w/b)and P(b/w), and the

    probability of being in each state P(Sw) and P{Sb).

    The Markov model can then be represented by the state diagram

    shown in Figure.

    Markov Model

  • 7/31/2019 Data Compression Intro

    58/107

    Markov Model

    The entropy of a finite state process with states S, issimply the average value of the entropy at each state:

    Example of Markov Model

  • 7/31/2019 Data Compression Intro

    59/107

    Rain Dry

    0.70.3

    0.2 0.8

    Two states : Rain and Dry.

    Transition probabilities: P(Rain|Rain)=0.3 ,P(Dry|Rain)=0.7 , P(Rain|Dry)=0.2, P(Dry|Dry)=0.8 Initial probabilities: say P(Rain)=0.4 , P(Dry)=0.6 .

    Example of Markov Model

    Markov Model Example

  • 7/31/2019 Data Compression Intro

    60/107

    Markov Model Example

    Markov Model in textC i

  • 7/31/2019 Data Compression Intro

    61/107

    Compression

    Probability of the next letter is heavily influencedby the preceding letter in English.

    Current Text Compression literature, the k-orderMarkov models are widely known as finite

    context models, the word context is being usedfor state. Example:

    Consider the word preceding .

    Suppose we have already processedprecedin

    and are goingto encode the next letter.

    If no account of context is taken intoconsideration and we treat the letter as asurprise, the probability of the letter goccurring is

    relatively low

    Example

  • 7/31/2019 Data Compression Intro

    62/107

    p

    If we use a first-order Markov model (i.e. we look atn probability model), we can see that the probabilityof g would increase substantially.

    As we increase the context size (go from n to in to

    din and so on), the probability of the alphabetbecomes more and entropy decreases.

    Shannon used a second-order model for Englishtext consisting of the 26 letters and one space toobtain an entropy of 3.1 bits/letter . Using a model

    where the output symbols were words rather thanletters brought down the entropy to 2.4 bits/letter.

    Note: The longer the context, the better its predictivevalue.

    Markov Model in TextCompression

  • 7/31/2019 Data Compression Intro

    63/107

    Compression

    Disadvantage:To store the probability model with respect to all

    contexts of a given length, the number of contextswould grow exponentially with the length of context.

    Source imposes some structure on its output, many of

    these contexts may correspond to strings that wouldnever occur in practice.

    Different sources may have different repeating patterns.

    Solution: PPM(Prediction& Partial March) Algorithm.Context is found for the symbol of non-zero/max

    probability first(Encoding). Zero Prob symbols are

    replaced with escape symbols and computed

    Composite Source Model

  • 7/31/2019 Data Compression Intro

    64/107

    p

    In many applications, it is noteasy to use a single model to

    describe the source.

    In such cases,we can define a

    composite source, which can

    be viewed as a combinationor composition of several

    sources, with only one source

    being active at any given

    time.

    Source Si will have its own

    Model Mi based on

    probability Pi.

    Coding

  • 7/31/2019 Data Compression Intro

    65/107

    g

    Coding: Assignment of binary sequences (0s or 1s) toelements or symbols.

    The set of binary sequences is called a code,and the individualmembers of the set are calledcodewords.

    Code ( 100101100110010101) Codewords ( a -> 001, b -> 010)

    An alphabet is a collection of symbols called letters.For example,the alphabet used in writing most books consists of the 26 lowercase

    letters, 26 uppercase letters, and a variety of punctuation marks.

    In the terminology used in this book, a comma is a letter.

    The ASCII code for the letter a is 1000011, the letter A is coded as1000001, and the letter "," is coded as 0011010.

    Notice that the ASCII code uses the same number of bits to representeach symbol. Such a code is called afixed-length code.

    Coding

  • 7/31/2019 Data Compression Intro

    66/107

    g

    To reduce the number of bits required to represent differentmessages, we need to use a different number of bits to

    represent different symbols.

    If we use fewer bits to represent symbols that occur more often,on the average we would use fewer bits per symbol. The

    average number of bits per symbol is often called the rate ofthe code.

    Example: Morse Code, Huffman code.

    letters that occur more frequently are shorter than for letters

    that occur less frequently.

    The codeword forE is 1 bit

    while the codeword for Z is 7 bits.

    Uniquely Decodable Codes

  • 7/31/2019 Data Compression Intro

    67/107

    q y

    Average length of the code, is not only thecriteria for good code.

    Example Suppose our source alphabet consists of fourletters a1, a2, a3 & a4 with probabilities P(a1) = 1/2 ,

    P(a2) = 1/4, P(a3) = P(a4) = 1/8. The entropy for thissource is 1.75 bits/symbol.

    where n(ai) is the number of bits in the codeword for

    letter ai and the average length is given in bits/symbol.

    Uniquely Decodable Codes

  • 7/31/2019 Data Compression Intro

    68/107

    From the table, w.r.t average lengthCode1 appears to be the best code.

    However code should have the ability totransfer information in an unambiguousway.

    Uniquely Decodable Codes

  • 7/31/2019 Data Compression Intro

    69/107

    Code 1 Both a1 and a2 have been assigned the codeword 0. When

    a 0 is received, there is no way to know whether an a1was transmitted or an a2. Hence we would like each

    symbol to be assigned a unique codeword.

    Uniquely Decodable Code

  • 7/31/2019 Data Compression Intro

    70/107

    Code 2 seems to have no problem with ambiguity. However if we encode {a2 a1 a1}. Binary string

    would be 100.

    However 100 can be decoded as {a2 a1 a1} and {a2

    a3}

    Meaning original sequence cannot be recovered withcertainty.

    There is no Unique decodability. (Not Desirable)

    Uniquely Decodable Code

  • 7/31/2019 Data Compression Intro

    71/107

    How about Code 3? First 3 codewords end with 0. 0 denotes

    termination of codeword.

    And a4 codeword is 3-bit 1's. Which iseasily decodable.

    Code 3

  • 7/31/2019 Data Compression Intro

    72/107

    Code 3? Notice that the first three codewords all end in a 0. In fact,

    a 0 always denotes the termination of a codeword.

    The final codeword contains no 0s and is 3 bits long.

    Because all other codewords have fewer than three 1s andterminate in a 0, the only way we can get three as in a rowis as a code for a4.

    The decoding rule is simple. Accumulate bits until you get

    a 0 or until you have three 1s. There is no ambiguity in this rule, and it is reasonably

    easy to see that this code is uniquely decodable.

    Uniquely Decodable Code

  • 7/31/2019 Data Compression Intro

    73/107

    Code 4. Each codeword starts with a 0, and the only time we see a

    0 is in the beginning of a codeword.

    Decoding rule is to accumulate bits until you see a 0. The

    bit before 0 is the last bit of the previous codeword.

    Code 4

  • 7/31/2019 Data Compression Intro

    74/107

    Difference between Code 3 and Code 4 is that the Code 3,decoder knows the moment a code is complete. In Code 4,

    we have to wait till the beginning of the next codewordbefore we know that the current codeword is complete.

    Because of this property, Code 3 is called aninstantaneous

    code and code 4 is near instantaneous code.

    Q). Is code 4 Uniquely Decodable?

    Ans. Decode the string 011111111111111111.

    Uniquely Decodable Code

  • 7/31/2019 Data Compression Intro

    75/107

    Instantaneous and near-Instantaneous

    Decode 011111111111111111 from Code 5.

    From the above string

    First codeword can be

    either 0 (a1) or 01(a2). Assuming 1st codeword is a1,

    after decoding other 8 codewords as a3's. We will be leftwith a single (dangling) 1.

    If we assume 1st codeword as a2, we will be able todecode next 16 codewords as 8 a3's.

    The string can be uniquely decoded. In fact. Code 5,

    while it is certainly not instantaneous, but is uniquely

    decodable in one way and not unique in other way

    Uniquely Decodable Code

  • 7/31/2019 Data Compression Intro

    76/107

    Decode the 01010101010101010 from Code 6. Step 1: Decode a1 and 8 a3's.

    Step 2: Decode 8 a2's and one a1.

    Not Uniquely Decodable.

    Even with looking at these small codes, it is not

    immediately evident whether the code is uniquely

    decodable or not.For Lager codes?

    Hence a systematic procedure should be followed to testfor unique decodability.

  • 7/31/2019 Data Compression Intro

    77/107

    A Test for Unique Decodability is

    not there in portion

    Test for Unique Decodability:Example 1

  • 7/31/2019 Data Compression Intro

    78/107

    p

    Consider Code 5. First list the codewords

    {0,01,11}

    The codeword 0 is a prefix for thecodeword 01.

    Hence the dangling suffix is 1.

    There are no other pairs for which oneelement of the pair is the prefix of theother.

    Example 1

  • 7/31/2019 Data Compression Intro

    79/107

    Let us augment (add) the codeword listwith the dangling suffix.

    {0,01,11,1}

    Comparing the elements of this list, we find0 is a prefix of 01 with a dangling suffix of1. But we have already included 1 in ourlist.

    Also, 1 is a prefix of 11. This gives us adangling suffix of 1, which is already in thelist.

    Example 1

  • 7/31/2019 Data Compression Intro

    80/107

    There are no other pairs that wouldgenerate a dangling suffix, so we cannotaugment the list any further.

    Therefore, Code 5 is uniquely decodable.

    Test for Unique Decodability:Example 2

  • 7/31/2019 Data Compression Intro

    81/107

    p

    Consider Code 6. First list the codewords {0,01,10}

    The codeword 0 is a prefix for the

    codeword 01. The dangling suffix is 1. There are no other pairs for which one

    element of the pair is the prefix of theother.

    Augmenting the codeword list with 1, weobtain the list

    {0,01,10,1}

    Example 2

  • 7/31/2019 Data Compression Intro

    82/107

    In this list, 1 is a prefix for 10. The danglingsuffix for this pair is 0, which is thecodeword for a1.

    Therefore, Code 6 is not uniquely

    decodable.

    Prefix Codes

  • 7/31/2019 Data Compression Intro

    83/107

    The test for unique Decodability requires examining thedangling suffixes.

    If the dangling suffix is itself a codeword, then the code isnot uniquely decodable.

    One type of code in which we will never face thepossibility of a dangling suffix being a codeword is a code

    in which no codeword is a prefix of the other

    A code in which no codeword is a prefix to another

    codeword is called aprefix code. A simple way to check if a code is a prefix code is to draw

    the rooted binary tree corresponding to the code.

    Prefix Codes

  • 7/31/2019 Data Compression Intro

    84/107

    Draw a tree that starts from a single node(the root node) &has possible 2 branches at each node.

    One of the branch corresponds to 1 and the other 0.

    The Convention followed is that root node at the top, left

    branch is 0 and the right branch is 1. Using convention, draw binary tree for Code 2, 3 & 4.

    Prefix Codes

  • 7/31/2019 Data Compression Intro

    85/107

  • 7/31/2019 Data Compression Intro

    86/107

  • 7/31/2019 Data Compression Intro

    87/107

    The Kraft-McMillan Inequality

    Not in Syllabus

    Algorithmic information Theory

  • 7/31/2019 Data Compression Intro

    88/107

    Information theory dealt with data andcorresponding from it.

    While Algorithmic Information Theory deals with

    program you code for compressing the data.

    At the heart of algorithmic information theory is ameasure called Kolmogorov complexity.

    The Kolmogorov complexity k(x)) of a sequencex isthe size of the program needed to generatex.

    Size: includesall the needed i/p's for the program are

    present.

    Algorithmic information Theory

    If f ll hi hl ibl

  • 7/31/2019 Data Compression Intro

    89/107

    Ifxwas a sequence of all ones, a highly compressible

    sequence, the program would simply be a printstatement in a loop.

    On the other extreme, ifx were a random sequencewith no structure then the only program that could

    generate it would contain the sequence itself.

    The size of the program, would be slightly larger thanthe sequence itself.

    Thus, there is a clear correspondence between thesize of the smallest program that can generate asequence and the amount of compression that can be

    obtained.

    Lo er bo nd ncertain and is not practicall sed

  • 7/31/2019 Data Compression Intro

    90/107

    Huffman Coding

    Huffman CodingOverview

    The Huffman Coding Algorithm

  • 7/31/2019 Data Compression Intro

    91/107

    The Huffman Coding Algorithm

    Developer: David Huffman class assignment;information theory, taught by Robert Fano at MIT.

    These codes are prefix codes and are optimum for agiven model (set of probabilities).

    The Huffman procedure is based on twoobservations regarding optimum prefix codes.

    1. In an optimum code, symbols that occurmore frequently (have a higher probability

    of occurrence) will have shorter codewordsthan symbols that occur less frequently.

    2. In an optimum code, the two symbolsthat occur least frequently will have the

    same len th

    Design of Huffman Code

  • 7/31/2019 Data Compression Intro

    92/107

    Let us design a Huffman code for a source that puts outletters from an alphabetA = {a1, a2, a3, a4, a5} with

    P(a1) = P(a3) = 0.2, P(a2) = 0.4, and P(a4) = P(a5) =0.1.

    First find first order Entropy?

    Step1: Sort the letters in Descending Probability order.

    Huffman Coding AlgorithmExample

  • 7/31/2019 Data Compression Intro

    93/107

    Step 3: Find the average length.

    L = . 4 x l + . 2 x 2 + . 2 x 3 + . l x 4 + . l x 4 = 2.2

    bits/symbol.

    Step 4: Calculate Redundancy?

    Example Step 3: Find the average length.

  • 7/31/2019 Data Compression Intro

    94/107

    L = . 4 x l + . 2 x 2 + . 2 x 3 + . l x 4 + . l x 4 = 2.2bits/symbol.

    Step 4: Calculate Redundancy?

    Step 5: Binary Huffman Tree

    Example 2

  • 7/31/2019 Data Compression Intro

    95/107

    Transmit 28 data samples using HuffmanCode.

    1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 55 5 6 6 7

    Minimum Variance HuffmanCoding

  • 7/31/2019 Data Compression Intro

    96/107

    Minimum Variance HuffmanCoding

  • 7/31/2019 Data Compression Intro

    97/107

    L = . 4 x 2 + . 2 x 2 + . 2 x 2 + . 1 x 3 + . 1 x 3 = 2.2

    bits/symbol.

    The two codes are identical in terms of their redundancy.However, the variance of the length of the codewords

    is significantly different.

    Minimum Variance HuffmanCodes

    R b th t i li ti lth h i ht b

  • 7/31/2019 Data Compression Intro

    98/107

    Remember that in many applications, although you might be

    using a variable-length code, the available transmissionrate is generally fixed.

    For example, if we were going to transmit symbols from thealphabet we have been using at 10,000 symbols per

    second, we might ask for transmission capacity of 22,000bits per second. This means that during each second thechannel expects to receive 22,000 bits, no more and no

    less. As the bit generation rate will vary around 22,000bits per second, the output of the source coder is generally

    fed into a buffer. The purpose of the buffer is to smoothout the variations in the bit generation rate.

    However, the buffer has to be of finite size, and the greater

    the variance in the codewords, the more difficult the buffer

    Minimum variance Huffmancoding

  • 7/31/2019 Data Compression Intro

    99/107

    Suppose that the source generates a string ofa4 and a5 forseveral seconds. If we are using the first code, this means

    that we will be generating bits at a rate of 40,000 bits persecond. For each second, the buffer has to store 18,000

    bits

    On the other hand, if we use the second code, we would begenerating 30,000 bits per second, and the buffer would

    have to store 8000 bits for every second.

    If we have a string of a2'sinstead of a string of a4and a5, the

    first code would result in the generation of 10,000 bits persecond. Deficit of 12000 bits per second.

    Second code would lead to deficit of 2000 bits per second.

    So which do we select?

    Application of Huffman coding

    H ff di i f d i j i i h h

  • 7/31/2019 Data Compression Intro

    100/107

    Huffman coding is often used in conjunction with othercoding techniques in

    Loss-less Image Compression

    Text Compression

    Audio Compression

    Loss-Less Image Compression

  • 7/31/2019 Data Compression Intro

    101/107

    Monochrome Image Pixel(0-255)

    Monochrome Images

  • 7/31/2019 Data Compression Intro

    102/107

    Compression of test imagesusing Huffman Coding

  • 7/31/2019 Data Compression Intro

    103/107

    Original (Uncompressed) test images are

    represented using bits/pixel.

    The image consists of 256 rows of 256 pixels, so theuncompressed representation uses 65,536 bytes.

    Image Compression

    From a visual inspection of the test images we can clearly

  • 7/31/2019 Data Compression Intro

    104/107

    From a visual inspection of the test images, we can clearlysee that the pixels in an image are heavily correlated with

    their neighbors.

    We could represent this structure with the crude model

    Xn = Xn-1 The residual would be the difference between

    neighboring pixels.

    Huffman Coding TextCompression

  • 7/31/2019 Data Compression Intro

    105/107

    We encoded the earlier version of this chapter using Huffman codes that werecreated using the probabilities of occurrence obtained from the chapter. The file

    size dropped from about 70,000 bytes to about 43,000 bytes with Huffman

    di

    Audio Compression

  • 7/31/2019 Data Compression Intro

    106/107

  • 7/31/2019 Data Compression Intro

    107/107

    The End of Unit 1Any Thoughts , Doubts or Ideas |