module 4 arithmetic coding
TRANSCRIPT
![Page 1: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/1.jpg)
Module 4, Data Compression 1LISA, NTPU
Module 4Arithmetic Coding
Prof. Hung-Ta Pai
![Page 2: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/2.jpg)
Module 4, Data Compression 2LISA, NTPU
Reals in BinaryAny real number x in the interval [0, 1) can be represented in binary as .b1b2... where bi is a bit
![Page 3: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/3.jpg)
Module 4, Data Compression 3LISA, NTPU
First Conversion
L:=0; R:=1; i :=1;while x > L *
if x < (L+R)/2 then bi := 0; R := (L+R)/2;if x ≥ (L+R)/2 then bi := 1; L := (L+R)/2;i := i + 1;
end{while}bi := 0 for all j ≥ i;
* Invariant: x is always in the interval [L, R)
![Page 4: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/4.jpg)
Module 4, Data Compression 4LISA, NTPU
Basic IdeasRepresent each string x of length n by a unique interval [L, R) in [0, 1)The width of the interval [L, R) represents the probability of x occurringThe interval [L, R) can itself be represented by any number, called a tag, within the half open intervalThe k significant bits of the tag .t1t2t3.... is the code of x
That is, .t1t2t3...tk000... is in the interval [L, R)
![Page 5: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/5.jpg)
Module 4, Data Compression 5LISA, NTPU
Example
1. Tag must be in the half open interval2. Tag can be chosen to be (L+R)/23. Code is the significant bits of the tag
![Page 6: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/6.jpg)
Module 4, Data Compression 6LISA, NTPU
Better Tag
![Page 7: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/7.jpg)
Module 4, Data Compression 7LISA, NTPU
Example of CodesP(a) = 1/3, P(b) = 2/3
![Page 8: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/8.jpg)
Module 4, Data Compression 8LISA, NTPU
Code Generation from TagIf binary tag is .t1t2t3... = (L+R)/2 in [L, R), then we want to choose k to form the code t1t2 ...tkShort code: choose k to be as small as possible so that L ≤ . t1t2 ...tk000... < RGuaranteed code:
Choose k = ⎡log2(1/(R-L))⎤ + 1L ≤ . t1t2 ...tkb1b2b3... < R for any bits b1b2b3... For fixed length strings provides a good prefix codeExample: [.000000000..., .000010010...), tag = .000001001...
Short code: 0Guaranteed code: 000001
![Page 9: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/9.jpg)
Module 4, Data Compression 9LISA, NTPU
Guaranteed Code ExampleP(a) = 1/3, P(b) = 2/3
Guaranteed code -> Prefix code
![Page 10: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/10.jpg)
Module 4, Data Compression 10LISA, NTPU
Coding AlgorithmP(a1), P(a2), ..., P(am)C(ai) = P(a1) + P(a2) + ... +P(ai-1)Encode x1x2...xn
Initialize L := 0; and R:=1;For i = 1 to n do
W := R - L;L := L + W * C(xi);R := L + W * P(xi);
end;t := (L+R)/2; choose code for the tag
![Page 11: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/11.jpg)
Module 4, Data Compression 11LISA, NTPU
Coding ExampleP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/4abca
![Page 12: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/12.jpg)
Module 4, Data Compression 12LISA, NTPU
Coding ExcerciseP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/4bbbb
![Page 13: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/13.jpg)
Module 4, Data Compression 13LISA, NTPU
Decoding (1/3)Assume the length is known to be 30001 which converts to the tag .0001000
![Page 14: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/14.jpg)
Module 4, Data Compression 14LISA, NTPU
Decoding (2/3)Assume the length is known to be 30001 which converts to the tag .0001000
![Page 15: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/15.jpg)
Module 4, Data Compression 15LISA, NTPU
Decoding (3/3)Assume the length is known to be 30001 which converts to the tag .0001000
![Page 16: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/16.jpg)
Module 4, Data Compression 16LISA, NTPU
Decoding AlgorithmP(a1), P(a2), ..., P(am)C(ai) = P(a1) + P(a2) + ... +P(ai-1)Decode b1b2...bm, number of symbols is n
Initialize L := 0; and R:=1;t := b1b2...bm000...for i = 1 to n do
W := R - L;find j such that L + W * C(aj) ≤ t < L + W * (C(aj)+P(aj));output aj;L := L + W * C(aj); R = L + W * P(aj);
![Page 17: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/17.jpg)
Module 4, Data Compression 17LISA, NTPU
Decoding ExampleP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/400101
![Page 18: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/18.jpg)
Module 4, Data Compression 18LISA, NTPU
Decoding IssuesThere are two ways for the decoder to know when to stop decoding
Transmit the length of the stringTransmit a unique end of string symbol
![Page 19: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/19.jpg)
Module 4, Data Compression 19LISA, NTPU
Practical Arithmetic CodingScaling:
By scaling we can keep L and R in a reasonable range of values so that W = R–L does not underflowThe code can be produced progressively, not at the endComplicates decoding some
Integer arithmetic coding avoids floating point altogether
![Page 20: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/20.jpg)
Module 4, Data Compression 20LISA, NTPU
AdaptationSimple solution – Equally Probable Model
Initially all symbols have frequency 1After symbol x is coded, increment its frequency by 1Use the new model for coding the next symbolExample in alphabet a, b, c, d
![Page 21: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/21.jpg)
Module 4, Data Compression 21LISA, NTPU
Zero Frequency ProblemHow do we weight symbols that have not occurred yet?
Equal weight? Not so good with many symbolsEscape symbol, but what should its weight be?When a new symbol is encountered send the <esc>, followed by the symbol in the equally probable model (both encoded arithmetically)
![Page 22: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/22.jpg)
Module 4, Data Compression 22LISA, NTPU
End of File ProblemSimilar to Zero Frequency ProblemReasonable solution:
Add EOF to the post-ESC equally-probable modelWhen done compressing:
First send ESCThen send EOF
What’s the cost of this approach?
![Page 23: Module 4 Arithmetic Coding](https://reader033.vdocuments.us/reader033/viewer/2022052307/559137bf1a28ab01498b462a/html5/thumbnails/23.jpg)
Module 4, Data Compression 23LISA, NTPU
Arithmetic vs. HuffmanBoth compress very wellFor m symbol grouping
Huffman is within 1/m of entropyArithmetic is within 2/m of entropy
ContextHuffman needs a tree for every contextArithmetic needs a small table of frequencies for every context
AdaptationHuffman has an elaborate adaptive algorithmArithmetic has a simple adaptive mechanism