aads - w14 huffmancode_knapsack.pdf

Upload: gustavomcpra

Post on 14-Apr-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    1/16

    1AaDS 2010/2011

    Huffman code &

    knapsack problemAlgorithms

    and Data Structures

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    2/16

    2AaDS 2010/2011

    ProblemProblems:

    Huffman codes

    knapsack problem:

    0-1 knapsack problem fractional knapsack problem

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    3/16

    3AaDS 2010/2011

    Huffman codes

    110011011111001010Variable-length codeword

    101100011010001000Fixed-length codeword5916121345

    Frequency (in thousands)

    fedcba

    Suppose we have a 100,000-character data file that we wish to store compactly.We observe that the characters in the file occur with the frequencies given bybelow table. That is, only six different characters appear, and the character aoccurs 45,000 times.

    Fixed-length code:3*100,000 = 300,000

    Variable-length codeword:(45 1 + 13 3 + 12 3 + 16 3 + 9 4 + 5 4) 1,000 = 224,000 bits

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    4/16

    4AaDS 2010/2011

    Prefix codes We consider here only codes in which no codeword is also a prefix

    of some other codeword. Such codes are calledprefix codes.

    Encoding is always simple for any binary character code; we justconcatenate the codewords representing each character of the file.For example, with the variable-length prefix code of the table, wecode the 3-character file abc as 0101100 = 0101100, where weuse to denote concatenation.

    Prefix codes are desirable because they simplify decoding. Since nocodeword is a prefix of any other, the codeword that begins anencoded file is unambiguous. We can simply identify the initialcodeword, translate it back to the original character, and repeat thedecoding process on the remainder of the encoded file. In ourexample, the string 001011101 parses uniquely as 0 0 101 1101, which decodes to aabe.

    110011011111001010prefix codes

    fedcba

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    5/16

    5AaDS 2010/2011

    Trees We interpret the binary codeword for a character as the path from the root to

    that character, where 0 means "go to the left child" and 1 means "go to theright child

    100

    86 14

    58 28 14

    a:45 b:13 c:12 d:16 e:9 f:5

    0 1 0 1 0 1

    0 1 0

    0 10 1

    0 1

    0 1

    0 1 0 1

    100

    a:45 55

    25 30

    14b:13c:12 d:16

    f:5 e:9

    Given a tree Tcorresponding to a prefix code, it is a simplematter to compute the number of bits required to encode afile. For each charactercin the alphabet C, let f(c) denotethe frequency ofcin the file and let dT(c) denote the depth ofc's leaf in the tree. Note that dT(c) is also the length of thecodeword for characterc. The number of bits required toencode a file is thus B(T) which we define as the cost of thetree T

    ( ) ( ) ( )cdcfTBT

    Cc

    =

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    6/16

    6AaDS 2010/2011

    Huffman codes - Code We assume that Cis a set ofn characters and that each character

    cCis an object with a defined frequency f[c]. The algorithm builds

    the tree Tcorresponding to the optimal code in a bottom-up manner.It begins with a set of |C| leaves and performs a sequence of |C| - 1"merging" operations to create the final tree. A min-priority queue Q,keyed on f, is used to identify the two least-frequent objects tomerge together. The result of the merger of two objects is a newobject whose frequency is the sum of the frequencies of the twoobjects that were merged.

    Huffman(C)

    n := |C|

    Q := C

    for i:=1 to n-1 do

    z := Allocate_Node()

    x := left[z] := Extract_Min(Q)

    y := right[z] := Extract_Min(Q)f[z] := f[x] + f[y]

    Insert(Q,z)

    return Extract_Min(Q)

    { 1}

    { 2}

    { 3}

    { 4}

    { 5}

    { 6}{ 7}

    { 8}

    { 9}

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    7/16

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    8/16

    8AaDS 2010/2011

    Huffman codes - complexity Q is implemented as a binary minheap. For a set Cofn

    characters, the initialization ofQ in line 2 can beperformed in O (n) time using the BUILD-MIN-HEAPprocedure. The forloop in lines 3-8 is executed exactlyn-1 times, and since each heap operation requires timeO (lg n), the loop contributes O (n lg n) to the runningtime. Thus, the total running time of HUFFMAN on a set

    ofn characters is O (n lg n).

    letters and frequency:a:1 b:1 c:2 d:3 e:5 f:8 g:13 h:21

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    9/16

    9AaDS 2010/2011

    Knapsack problems Knapsack problems:

    The 0-1 knapsack problem is posed as follows. Athief robbing a store finds n items; the ith item is worthvidollars and weighs wipounds, where viand wiareintegers. He wants to take as valuable a load aspossible, but he can carry at most Wpounds in his

    knapsack for some integerW. Which items should hetake? (This is called the 0-1 knapsack problembecause each item must either be taken or leftbehind; the thief cannot take a fractional amount of anitem or take an item more than once.)

    In the fractional knapsack problem, the setup is thesame, but the thief can take fractions of items, ratherthan having to make a binary (0-1) choice for eachitem.

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    10/16

    10AaDS 2010/2011

    Solutionfractional knapsack problem solution: (greedy

    algorithm): compute the value per pound vi/wifor each item

    take as much as possible of the item with the greatestvalue per pound. If the supply of that item is

    exhausted and you can still carry more, take as muchas possible of the item with the next greatest valueper pound, and so forth until you can't carry any more

    because of sorting the items by value per pound,

    the greedy algorithm runs in O(n lg n) time.for0-1 knapsack problem this greedy strategy

    does not work!

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    11/16

    11AaDS 2010/2011

    Example

    10

    2030

    50 50 50 50 50

    optimal solution forthe fractionalknapsack problem

    10

    20

    20---30

    $60 $100 $120 knapsack

    $60

    $100

    $80

    $6 $5 $4

    +

    +

    =$240

    10

    20

    10

    20

    30

    30

    $60 $60

    $100$120

    $100

    $120

    =$160 =$180 =$220

    optimal solution forthe 0/1knapsack problem

    non-optimal solution forthe 0/1 knapsack problem

    valueperpound

    value

  • 7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf

    12/16

    12AaDS 2010/2011

    0/1 knapsack problem - solution 0/1 knapsack problem solution: (dynamic programming) Knowing

    the best solutions for items from a set {v1,v2,,vi} for a knapsack

    with capacity from 1 to W, we can find a formula to compute the bestsolutions for items from a set {v1,v2,,vi,vi+1}

    =

    n

    i

    iiWtw

    1

    =

    n

    i

    iitv

    1

    max

    ni

    ix

    ix

    vwxiFxiF

    xiF

    ii

    =