ahc.pdf

Upload: m-abd-jabbar-hussein

Post on 14-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 AHC.pdf

    1/12

    Chapter 3

    HUFFMAN CODING

    Yeuan-Kuen Lee[ MCU, CSIE ]

    Ch 3 Huffman Coding 2

    Outline

    3.1 Overview

    3.2 The Huffman Coding Algorithm3.2.1 Minimum Variance Huffman Codes

    3.2.2 Optimality of Huffman Codes (*)

    3.2.3 Length of Huffman Codes (*)

    3.2.4 Extended Huffman Codes (*)

    3.3 Nonlinear Huffman Codes (*)

    3.4 Adaptive Huffman Coding

    3.4.1 Update Procedure3.4.2 Encoding Procedure

    3.4.3 Decoding Procedure

    Ch 3 Huffman Coding 3

    Outline

    3.5 Golomb Codes

    3.6 Rice Codes

    3.6.1 CCSDS Recommendation for Lossless Compression

    3.7 Tunstall Codes3.8 Applications of Huffman Coding

    3.8.1 Lossless Image Compression

    3.8.2 Text Compression

    3.8.3 Audio Compression

    3.9 Summary

    3.10 Projects and Problems

    Ch 3 Huffman Coding 4

    3.1 Overview

    In this chapter, we describe a very popular coding algorithm

    called the Huffman coding algorithm:

    9 Present a procedure for building Huffman codes

    when the probability model for the source is known.

    9 A procedure for building codes when the source statistics are unknown

    9 Describe a new technique for code design that are in some sense similar

    to the Huffman coding approach

    9 Some applications

  • 7/30/2019 AHC.pdf

    2/12

    Ch 3 Huffman Coding 5

    3.2 The Huffman Coding Algorithm

    9 This technique was developed by David Huffman as part of a class assignment;

    the class was the first ever in the area of information theory andwas taught by Robert Fano at MIT.

    9 The Codes generated using this technique are called Huffman Codes.

    9 These Codes are

    9 Prefix codes

    9 Optimum for a given model ( set of probabilities )

    9 Based on two observations regarding optimum prefix codes:

    1. In an optimum code, symbols that occurs more frequently ( have a higher

    probability of occurrence) will have short codewords than symbols that

    occur less frequently.

    2. In an optimum code, the two symbols that occur least frequently will have

    the same length.

    Ch 3 Huffman Coding 6

    In an optimum code,

    the two symbols that occur least frequently will have the same length.

    9 Suppose an optimum code C exists in which the two codewords correspondingto the two least probable symbols do not have the same length.

    9 Suppose the longer codeword is k bits longer than the shorter codeword.

    9

    As these codewords correspond to the least probable symbols in the alphabet,no other codeword can be longer than these codewords;

    therefore there is no danger that the shortened codeword would become the

    prefix of some other codeword.

    3.2 The Huffman Coding Algorithm

    k bitsdistinct

    Ch 3 Huffman Coding 7

    3.2 The Huffman Coding Algorithm

    9 Furthermore, by dropping these k bits we obtain a new code that has a

    shorter average length than C.9 But, this violates our initial contention that C is an optimal code.9 Therefore, for an optimal code the second observation also holds true.

    A simple requirement

    A simple requirement

    The codewords corresponding to the two lowest probability symbols

    differ only in the last bit.

    That is, if and are the two least probable symbols in an alphabet,

    if the codeword for was m 0, the codeword for would be m 1.

    Here, m is a string of 1s and 0s, and denotes concatenation.

    Ch 3 Huffman Coding 8

    3.2 The Huffman Coding Algorithm

    Example 3.2.1 Design of a Huffman Code

    Example 3.2.1 Design of a Huffman Code

    An alphabetA = { a1 , a2 , a3 , a4 , a5 } withP( a1 ) = P( a3 ) = 0.2

    P( a2 ) = 0.4

    P( a4 ) = P( a5 ) = 0.1

    The entropy = -2 * 0.2 log2 (0.2) - 0.4 log2 (0.4) - 2 * 0.1 log2 (0.1)

    = 2.122 bits/symbol

    Table 3.1 The initial five-letter alphabet

    a2a1a3a4a5

    0.40.20.20.10.1

    c(a2)c(a1)c(a3)c(a4)c(a5)

    Letter Probability CodewordThe two symbols with the lowestprobability are a4 and a5.

    c(a4) = 1 0

    c(a5) = 1 1

    1 is a binary string

  • 7/30/2019 AHC.pdf

    3/12

    Ch 3 Huffman Coding 9

    3.2 The Huffman Coding Algorithm

    Table 3.2 The reduced four-letter alphabet

    a2a1a3a4

    0.4

    0.2

    0.2

    0.2

    c(a2)

    c(a1)

    c(a3)

    1

    Letter Probability Codeword

    Define a new alphabetA = { a1 , a2 , a3 , a4 }where

    a4 is composed of

    a4 and

    a5.

    P( a4 ) = P( a4 ) + P( a5 ) = 0.2

    In this alphabet A ,a3 and a4 are the two letters

    at the bottom of the sorted list.

    We assign their codewords as

    c(a3) = 2 0

    c(a4) = 2 1

    but c(a4) = 1 .

    Therefore, 1 = 2 1

    Which mean that

    c(a4) = 1 0 = 2 10

    c(a5) = 1 1 = 2 11

    Ch 3 Huffman Coding 10

    3.2 The Huffman Coding Algorithm

    We again define a new alphabet A = { a1 , a2 , a3 }where

    a3 is composed of

    a3 and

    a 4.

    P( a3 ) = P( a3 ) + P( a4 ) = 0.4

    Table 3.3 The reduced three-letter alphabet

    a2a3a1

    0.4

    0.4

    0.2

    c(a2)

    2c(a1)

    Letter Probability CodewordIn this case, the least probable

    symbols are a3 and a1 .

    Therefore,

    c(a3) = 3 0

    c(a1) = 3 1

    but c(a3) = 2 .Therefore, 2 = 3 0

    Which mean thatc(a3) = 2 0 = 3 00

    c(a4) = 2 10 = 3 010

    c(a5) = 2 11 = 3 011

    Ch 3 Huffman Coding 11

    3.2 The Huffman Coding Algorithm

    We again define a new alphabet A = {a3 , a2 }where a3 is composed of a3 and a 1 .

    P( a3 ) = P( a3 ) + P( a1 ) = 0.6

    Table 3.4 The reduced two-letter alphabet

    a3a 2

    0.6

    0.4

    3c(a2)

    Letter Probability Codeword

    We have only two letters,

    The codeword assignment isstraightforward:

    c(a3) = 0

    c(a2) = 1

    but c(a3) = 3 .

    Therefore, 3 = 0

    Which mean that

    c(a1) = 3 1 = 01

    c(a3) = 3 00 = 000

    c(a4) = 3 010 = 0010

    c(a5) = 3 011 = 0011

    Ch 3 Huffman Coding 12

    3.2 The Huffman Coding Algorithm

    Table 3.5 Huffman code for the original five-letter alphabet

    a2a1a3a4a5

    0.40.2

    0.20.10.1

    101

    00000100011

    Letter Probability Codeword

    The average length for this code is

    l = .4*1 + .2*2 + .2*3 + .1*4 + .1*4 = 2.2 bits/symbol.

    A measure of the efficiency of this code is its

    redundancy the difference between the entropy and the average length.

    In this case, the redundancy = 2.2 2.122 = 0.078 bits/symbol.

  • 7/30/2019 AHC.pdf

    4/12

    Ch 3 Huffman Coding 13

    3.2 The Huffman Coding Algorithm

    a2 (0.4)

    a1 (0.2)

    a3 (0.2)

    a4 (0.1)

    a5 (0.1)

    a2 (0.4)

    a1 (0.2)

    a3 (0.2)

    a4 (0.2)

    a2 (0.4)

    a 1 (0.2)

    a3 (0.4)

    a3 (0.6)

    a2 (0.4)

    0

    1

    0

    1

    0

    1

    0

    1

    Figure 3.1 The Huffman encoding procedure.The symbol probabilities are list in parentheses.

    Sorted by probabilities

    Ch 3 Huffman Coding 14

    3.2 The Huffman Coding Algorithm

    Figure 3.2 Building the binary Huffman tree.

    a2 (0.4)

    0

    1

    a1 (0.2)

    a3 (0.2)

    a4 (0.1)

    a5 (0.1)

    (0.2)

    (0.4)

    (0.2)

    (0.2)

    (0.4)

    (0.4)

    (0.2)

    (0.6)

    (0.4)(1.0)

    0

    1

    0

    1

    0

    1

    We build the binary tree starting at the leaf nodes.

    a5

    1

    1

    10

    10

    a4(0.1) (0.1)

    (0.2)

    a3(0.2)

    (0.4) a1(0.2)

    (0.6) a2(0.4)

    0

    root

    Notice the similarity between Figures 3.1 and 3.2.This is not surprising, as they are a result of viewingthe same procedure in two different ways.

    Ch 3 Huffman Coding 15

    3.2.1 Minimum Variance Huffman Codes

    Table 3.2 Reduced four-letter alphabet

    a2a1

    a3a4

    0.4

    0.2

    0.20.2

    c(a2)

    c(a1)

    c(a3)1

    Letter Probability Codeword

    Table 3.6 Reduced four-letter alphabet

    a2a4a1a3

    0.4

    0.2

    0.2

    0.2

    c(a2)

    1c(a1)

    c(a3)

    Letter Probability Codeword

    Ch 3 Huffman Coding 16

    3.2.1 Minimum Variance Huffman Codes

    Table 3.7 Reduced three-letter alphabet.

    a1

    a2a4

    0.4

    0.40.2

    2

    c(a2)1

    Letter Probability Codeword

    Table 3.8 Reduced two-letter alphabet.

    a2a1

    0.6

    0.4

    32

    Letter Probability Codeword

  • 7/30/2019 AHC.pdf

    5/12

    Ch 3 Huffman Coding 17

    3.2.1 Minimum Variance Huffman Codes

    Table 3.9 Minimum variance Huffman code

    a1a2a3a4a5

    0.20.40.20.10.1

    100011010011

    Letter Probability Codeword

    The average length for this code is

    l = .4*2 + .2*2 + .2*2 + .1*3 + .1*3 = 2.2 bits/symbol.

    These two codes are identical in terms of their redundancy.However, the variance of the length of the codewords is significantly different.

    Ch 3 Huffman Coding 18

    3.2.1 Minimum Variance Huffman Codes

    a2 (0.4)

    a1 (0.2)

    a3 (0.2)

    a4 (0.1)

    a5 (0.1)

    a2 (0.4)

    a4 (0.2)a2 (0.4)

    a 4 (0.2)

    a1 (0.4) a2 (0.6)

    a1 (0.4)

    0

    1

    0

    1

    0

    1

    0

    1

    Sorted by probabilities

    a1 (0.2)

    a3 (0.2)

    Figure 3.3 The minimum variance Huffman encoding procedure.

    Ch 3 Huffman Coding 19

    3.2.1 Minimum Variance Huffman Codes

    a5

    1

    1

    10

    10

    a4(0.1) (0.1)

    (0.2)a3(0.2)

    (0.4) a1(0.2)

    (0.6) a2(0.4)

    0

    root

    (0.4)

    a5

    1

    10

    10

    a4(0.1) (0.1)

    (0.2) a1(0.2) (0.2)

    a2

    (0.6)

    root

    a3

    10

    0

    (0.4)

    minimum variance

    Figure 3.4 Two Huffman trees corresponding to the same probabilities.

    Ch 3 Huffman Coding 20

    3.4 Adaptive Huffman Coding

    5

    3external node, leaf

    internal node

    Two parameters are added to the binary tree:

    1. Weight

    2. Node number: unique

    The number of times the symbolhas been encountered

    Sum of the weight of its offspring

    An alphabet of size n,

    2n-1 node (internal + external)

    Node number : y1, y2, y3, y(2n-1)Weight : x1, x2, x3, x(2n-1) , x1 x2 x3 x(2n-1)Sibling property :

    nodes y(2j-1) and y(2j) are sibling for 1 j < n

    node number for the parent number is greater than y(2j-1) and y(2j)

  • 7/30/2019 AHC.pdf

    6/12

    Ch 3 Huffman Coding 21

    3.4 Adaptive Huffman Coding

    transmittertransmitter receiverreceiver

    0 0NYT NYT

    01101symbols codes

    As transmission progresses, nodes corresponding to symbols transmitted will be

    added to the tree, and the tree is reconfigured using a update procedure.

    initial tree initial tree

    Ch 3 Huffman Coding 22

    3.4 Adaptive Huffman Coding

    Before the beginning of transmission,

    a fixed code for each symbol is agreed upon between transmitter and receiver.

    If the source has an alphabet ( a1, a2, a3, , am ) of size m ,

    then pick e and r such that

    m = 2 e + r and 0 r < 2 e .

    ex: m = 26, 26 = 24 + 10, e = 4 , r = 10

    The letter ak is encoded as

    (e+1)-bit binary representation of k-1 if 1 k 2r

    e-bit binary representation of k-r-1 , otherwise

    ex: a1 [ 1 2*10 ] 1-1 00000 (5 bits)

    a2 [ 2 2*10 ] 2-1 00001 (5 bits)

    a22 [22 > 2*10 ] 22-10-1 1011 (4 bits)

    Ch 3 Huffman Coding 23

    3.4 Adaptive Huffman Coding

    When a symbol is encountered for the first time,

    1. The code for the NYT node is transmitted

    2. Followed by the fixed code for the symbol

    3. A node for the symbol is created

    4. The symbol is taken out of the NYT list.

    Both transmitter and receiver

    9 Start with the same tree structure

    9 Update procedure is identical

    Therefore, the encoding and decoding processes remain synchronized.

    Ch 3 Huffman Coding 24

    3.4.1 Update Procedure

    9 The update procedure requires the nodes be in a fixed order.

    9 This ordering is preserved by numbering the node.

    9 The largest node number is given to the root of the tree, andthe smallest number is assigned to the NYT node.

    9 The number from the NYT node to the root are assigned

    in increasing order from left to right, and from lower to upper level.

    9 The set of nodes with the same weight make up a block.

    9 The function of the update procedure is to preserve the sibling property.

  • 7/30/2019 AHC.pdf

    7/12

    Ch 3 Huffman Coding 25

    3.4.1 Update Procedure

    START

    Firstappearancefor symbol?

    NYT gives birthto new NYT and

    external node

    Yes

    Increment weightof external node

    and old NYT node

    Go to oldNYT node

    Go to symbolexternal node

    Nodenumber max

    in block ?

    switch node with

    highest numberednode in block

    A

    B

    Figure 3.6 (a) Update Procedure

    Yes

    No

    No

    C

    Ch 3 Huffman Coding 26

    3.4.1 Update Procedure

    Yes

    Incrementnode weight

    Is thisthe rootnode ?

    Go toparent node

    A

    B

    Figure 3.6 (b) Update Procedure

    C

    STOP

    No

    Ch 3 Huffman Coding 27

    3.4.1 Update Procedure

    Example 3.4.1 Update ProcedureExample 3.4.1 Update Procedure

    Message [ a a r d v a r k],

    where the alphabet consists of the 26 lowercase letters of the English alphabet.

    Total number of node = 2 * 26 1 = 51.

    0NYT

    initial tree

    510NYT

    51

    1

    1

    49 50

    a

    ( a )

    root

    0NYT

    51

    2

    2

    49 50

    a

    ( aa )

    root

    Send a binary code 00000 for a

    Since the index of a is 1

    0 1

    Send 1 for the second a

    aa

    0 1

    Ch 3 Huffman Coding 28

    3.4.1 Update Procedure

    0NYT

    51

    2

    2

    49 50

    a

    ( aa )

    root

    Old NYT

    51

    2

    3

    49 50

    a

    ( aar )

    root

    0 1NYT

    1

    47 48

    r

    Send 0 for NYT node,then send the fixed code 10001 for rSince the index of r is 18So, the fixed code is 10001 (17)

    update the tree for r

    r

    0 1

  • 7/30/2019 AHC.pdf

    8/12

    Ch 3 Huffman Coding 29

    3.4.1 Update Procedure

    51

    2

    3

    49 50

    a

    ( aar )

    root

    0 1NYT

    1

    47 48

    r

    51

    2

    4

    49 50

    a

    ( aard )

    root

    1Old NYT

    2

    47 48

    r

    0 1NYT

    1

    45 46

    d

    Send 00 for NYT node,then send the fixed code 00011 for dSince the index of d is 4So, the fixed code is 00011 (3)

    d

    update the tree for d

    Ch 3 Huffman Coding 30

    3.4.1 Update Procedure

    51

    2

    4

    49 50

    a

    ( aardv )

    root

    1

    2

    47 48

    r

    1Old NYT

    1

    45 46

    d

    51

    2

    4

    49 50

    a

    ( aard )

    root

    1

    2

    47 48

    r

    0 1NYT

    1

    45 46

    d

    0 1NYT

    1

    43 44

    v

    B

    Swap nodes

    Send 000 for NYT node,then send the fixed code 1011 for vSince the index of v is 22So, the fixed code is 1011 (11)

    v

    update the tree for v

    Ch 3 Huffman Coding 31

    3.4.1 Update Procedure

    51

    2

    4

    49 50a

    ( aardv )

    root

    2

    47 48

    1

    45 46

    d

    0 1NYT

    1

    43 44

    v

    1 2r

    51

    2

    5

    49 50a

    ( aardv )

    root

    3

    47 48

    1

    45 46

    d

    0 1NYT

    1

    43 44

    v

    1 2rSwap nodes

    Ch 3 Huffman Coding 32

    3.4.2 Encoding Procedure

    START

    Firstappearancefor symbol?

    YesSend code for NYTnode followed by

    index in the NYT list

    Call updateprocedure

    Code is the path fromthe root node to thecorresponding node

    A

    B

    Figure 3.8 (a) flowchart of theencoding procedure

    No

    Read in Symbol

  • 7/30/2019 AHC.pdf

    9/12

    Ch 3 Huffman Coding 33

    3.4.2 Encoding Procedure

    START

    Is this thelast symbol?

    Yes

    A

    B

    Figure 3.8 (b) flowchart of theencoding procedure

    No

    Ch 3 Huffman Coding 34

    3.4.2 Encoding Procedure

    Example 3.4.2 Encoding procedureExample 3.4.2 Encoding procedure

    Message [ a a r d v a r k ]

    0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0

    NYT NYT NYT

    Ch 3 Huffman Coding 35

    3.4.3 Decoding Procedure

    START

    Is the node anexternal node?

    Read bit and go tocorresponding node

    A

    Figure 3.9 (a) flowchart of thedecoding procedure

    No

    Go to root

    of the tree

    B

    Yes

    Ch 3 Huffman Coding 36

    3.4.3 Decoding Procedure

    Yes

    Figure 3.9 (b) flowchart of thedecoding procedure

    Is the node

    the NYT node? Read e bit

    A

    Decode elementcorresponding to node

    Is the e-bitnumber p less

    than r ?Add r to p

    Read one more bit

    D

    C

    Yes

    No

    No

  • 7/30/2019 AHC.pdf

    10/12

    Ch 3 Huffman Coding 37

    3.4.3 Decoding Procedure

    Figure 3.9 (c) flowchart of thedecoding procedure

    Is thisthe last bit ?

    Decode the (p+1)element in NYT list

    C

    Call update procedure

    D

    Yes

    No

    START

    B

    Ch 3 Huffman Coding 38

    3.4.3 Decoding Procedure

    Example 3.4.3 Decoding procedureExample 3.4.3 Decoding procedure

    Message [ a a r d v a r k ]

    0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0

    NYT NYT NYT

    Ch 3 Huffman Coding 39

    3.8 Applications of Huffman Coding

    3.8.1 Lossless Image Compression

    Figure 3.10 Test Images.

    Sena Sensin Earth Omaha

    256*256 Gray scale raw image.

    ftp://ftp.mkp.com/pub/Sayood/uncompressed_software/datasets/images/

    Ch 3 Huffman Coding 40

    3.8.1 Lossless Image Compression

    Table 3.23 Compression using Huffman codes on pixel values.

    Image Name Bits/Pixel Total Size(bytes) Compression Ratio

    Sena 7.01 57,504 1.14Sensin 7.49 61,430 1.07Earth 4.94 40,534 1.62Omaha 7.12 58,374 1.12

    Table 3.24 Compression using Huffman codes on pixel difference values.

    Image Name Bits/Pixel Total Size(bytes) Compression Ratio

    Sena 4.02 32,968 1.99Sensin 4.70 38,541 1.70Earth 4.13 33,880 1.93Omaha 6.42 52,643 1.24

  • 7/30/2019 AHC.pdf

    11/12

    Ch 3 Huffman Coding 41

    3.8.1 Lossless Image Compression

    Table 3.25 Compression using adaptive Huffman codes on pixel difference values.

    Image Name Bits/Pixel Total Size(bytes) Compression Ratio

    Sena 3.93 32,261 2.03Sensin 4.63 37,896 1.73Earth 4.82 39,504 1.66Omaha 6.39 52,321 1.25

    Adaptive Huffman coder9 Adv.

    9 Can be used as an on-line or real-time coder9 Disadv.

    9 More vulnerable to errors9 More difficult to implement

    Ch 3 Huffman Coding 42

    3.8.2 Text Compression

    Table 3.26 Probabilities of occurrence of the lettersin the English alphabet in the U.S. Constitution.

    Letter Probability Letter Probability Letter Probability

    A 0.057305 J 0.002031 S 0.060289

    B 0.014876 K 0.001016 T 0.078085

    C 0.025775 L 0.031403 U 0.018474

    D 0.026811 M 0.015892 V 0.009882

    E 0.112578 N 0.056035 W 0.007576

    F 0.022875 O 0.058215 X 0.002264

    G 0.009523 P 0.021034 Y 0.011702

    H 0.042915 Q 0.000973 Z 0.001502I 0.053475 R 0.048819

    Ch 3 Huffman Coding 43

    3.8.2 Text Compression

    Table 3.27 Probabilities of occurrence of the lettersin the English alphabet in this chapter.

    Letter Probability Letter Probability Letter Probability

    A 0.049885 J 0.000394 S 0.042657

    B 0.016110 K 0.002450 T 0.061142

    C 0.025835 L 0.025835 U 0.015794

    D 0.030232 M 0.016494 V 0.004988

    E 0.097434 N 0.048039 W 0.012207

    F 0.019745 O 0.050642 X 0.003413

    G 0.012053 P 0.015007 Y 0.008466

    H 0.035723 Q 0.001509 Z 0.001050

    I 0.048783 R 0.040492

    Ch 3 Huffman Coding 44

    3.8.2 Text Compression

    0

    0.02

    0.04

    0.06

    0.080.1

    0.12

    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

    U.S. Constitution Chapter 1

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

    U.S. Constitution Chapter 1

  • 7/30/2019 AHC.pdf

    12/12

    Ch 3 Huffman Coding 45

    3.8.3 Audio Compression

    9 Each stereo channel is sampled at 44.1kHz

    9 Each sample is represented by 16 bits.

    ( the amount of data stored on one CD is enormous )

    CD-quality audio dataCD-quality audio data

    16 bits : 65,536 distinct values

    Huffman coder would require 65,536 distinct (variable-length) codewords.

    In most applications, a codeword of this size would not be practical.

    Large alphabet

    9 Recursive indexing chapter 8

    9 Others [ reference: #180]

    Ch 3 Huffman Coding 46

    3.8.3 Audio Compression

    Table 3.28 Huffman codes of 16-bit CD-quality audio.

    Mozart 939,862 12.8 725,420 1.30Cohn 402,442 13.8 349,300 1.15Mir 884,020 13.7 759,540 1.16

    File Name CompressionRatio

    Original FileSize(bytes)

    Entropy(bits) Estimated CompressedFile Size(bytes)

    Table 3.29 Huffman codes of differences of 16-bit CD-quality audio.

    Mozart 939,862 9.7 569,792 1.65Cohn 402,442 10.4 261,590 1.54Mir 884,020 10.9 602,240 1.47

    File NameCompression

    RatioOriginal FileSize(bytes)

    Entropy(bits)Estimated Compressed

    File Size(bytes)