unusual coding possibilities information theory in...
TRANSCRIPT
![Page 1: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/1.jpg)
Unusual coding possibilities β
information theory in practice statistical physics on hypercube of 2π sequences
1) Entropy coding: remove statistical redundancy in data compression:
Huffman, arithmetic coding, ANS
2) (Forward) Error correction: add deterministic redundancy to repair
damaged message (erasure, BSC, deletion channels):
LDPC, sequential decoding, fountain codes and expansions
3) Rate distortion: shorter description approximation (lossy
compression) , Kuznetsov-Tsybakov: (statistical) constraints known
only to sender (steganography, QR codes) and their βdualityβ
4) Distributed source coding β for multiple correlated sources that
donβt communicate with each other.
JarosΕaw Duda, KrakΓ³w, 23.XI.2015
![Page 2: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/2.jpg)
1. ENTROPY CODING β DATA COMPRESSION
We need π bits of information to choose one of ππ possibilities.
For length π 0/1 sequences with ππ of β1β, how many bits we need to choose one?
π!
(ππ)! ((1βπ)π)!β
(π/π)π
(ππ/π)ππ((1βπ)π/π)(1βπ)π = 2π lg(π)βππ lg(ππ)β(1βπ)π lg((1βπ)π) = 2π(βπ lg(π)β(1βπ)lg (1βπ))
Entropy coder: encodes sequence with (ππ )π =0..πβ1 probability distribution
using asymptotically at least π― = β ππ π₯π (π/ππ)π bits/symbol (π» β€ lg(π))
Seen as weighted average: symbol of probability π contains π₯π (π/π) bits
![Page 3: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/3.jpg)
symbol of probability π
contains π₯π (π/π) bits
symbol β length π bit sequence
assumes: Pr(symbol) β 2βπ
Huffman coding:
- Sort symbols by frequencies
- Replace two least frequent
with tree of them and so on
generally: prefix codes
defined by a tree
unique decoding:
no sequence is a prefix of another
http://en.wikipedia.org/wiki/Huffman_coding
π΄ β 0, π΅ β 100, πΆ β 101, π· β 110, πΈ β 111
![Page 4: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/4.jpg)
Symbol of probability π contains π₯π (π/π) bits
Encoding (ππ ) distribution with entropy coder optimal for (ππ ) distribution costs
Ξπ» = β ππ lg(1/ππ ) β β ππ lg(1/ππ ) = β ππ lg (ππ
ππ ) β
1
ln (4)π π β(ππ βππ )2
ππ π π
more bits/symbol β so called Kullback-Leibler βdistanceβ (not symmetric).
Huffman/prefix codes β encode symbols directly as bit sequences
Approximates probabilities as powers of 1/2
e.g. p(A) =1/2, p(B)=1/4, p(C)=1/4
example of Huffman penalty
for π(π β π)π distribution on 256 size alphabet
Huffman: 8 bits β at least 1 bit
ratio β€ 8 here
terrible at encoding very frequent events
Ratio β 8/H Huffman ANS π = 0.5 4.001 3.99 4.00 π = 0.6 4.935 4.78 4.93 π = 0.7 6.344 5.58 6.33 π = 0.8 8.851 6.38 8.84 π = 0.9 15.31 7.18 15.29 π = 0.95 26.41 7.58 26.38 π = 0.99 96.95 7.90 96.43
0
0
1
1A
B C
A 0
C 11
B 10
![Page 5: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/5.jpg)
Huffman coding (HC), prefix codes, needs sorting: example of Huffman penalty
most of everyday compression, for truncated π(1 β π)π distribution
e.g. zip, gzip, cab, jpeg, gif, png, tiff, pdf, mp3β¦ (canβt use less than 1 bits/symbol) Zlibh (the fastest generally available implementation):
Encoding ~ 320 MB/s (/core, 3GHz)
Decoding ~ 300-350 MB/s
Range coding (RC): large alphabet arithmetic coding,
needs multiplication, e.g. 7-Zip, VP Google video codecs
(e.g. YouTube, Hangouts).
Encoding ~ 100-150 MB/s tradeoffs
Decoding ~ 80 MB/s
(binary) Arithmetic coding (AC):
H.264, H.265 video, ultracompressors e.g. PAQ Huffman: 1byte β at least 1 bit
Encoding/decoding ~ 20-30MB/s ratio β€ 8 here
Asymmetric Numeral Systems (ANS) tabled (tANS) - no multiplication or sorting
FSE implementation of tANS: Encoding ~ 350 MB/s Decoding ~ 500 MB/s
RC β ANS: ~7x decoding speedup, no multiplication (switched e.g. in LZA compressor)
HC β ANS means better compression and ~ 1.5x decoding speedup (e.g. zhuff, lzturbo)
Ratio β 8/H zlibh FSE π = 0.5 4.001 3.99 4.00 π = 0.6 4.935 4.78 4.93 π = 0.7 6.344 5.58 6.33 π = 0.8 8.851 6.38 8.84 π = 0.9 15.31 7.18 15.29
π = 0.95 26.41 7.58 26.38 π = 0.99 96.95 7.90 96.43
![Page 6: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/6.jpg)
Huffman vs ANS in compressors (LZ + entropy coder): from Matt Mahoney benchmarks http://mattmahoney.net/dc/text.html
Enwiki8 100,000,000B
encode time [ns/byte]
decode time [ns/byte]
LZA 0.82b βmx9 βb7 βh7 26,396,613 449 9.7 lzturbo 1.2 β39 βb24 26,915,461 582 2.8
WinRAR 5.00 βma5 βm5 27,835,431 1004 31 WinRAR 5.00 βma5 βm2 29,758,785 228 30
lzturbo 1.2 β32 30,979,376 19 2.7 zhuff 0.97 βc2 34,907,478 24 3.5 gzip 1.3.5 β9 36,445,248 101 17
pkzip 2.0.4 βex 36,556,552 171 50 ZSTD 0.0.1 40,024,854 7.7 3.8
WinRAR 5.00 βma5 βm1 40,565,268 54 31
zhuff, ZSTD (Yann Collet): LZ4 + tANS (switched from Huffman)
lzturbo (Hamid Bouzidi): LZ + tANS (switched from Huffman)
LZA (Nania Francesco): LZ + rANS (switched from range coding)
e.g. lzturbo vs gzip: better compression, 5x faster encoding, 6x faster decoding
saving time and energy in extremely frequent task
![Page 7: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/7.jpg)
Apple LZFSE = Lempel-Ziv + Finite State Entropy
Default in iOS9 and OS X 10.11 βmatching the compression ratio of
ZLIB level 5, but with much higher
energy efficiency and speed
(between 2x and 3x) for both
encode and decode operationβ
Finite State Entropy is Yann Colletβs (known from e.g. LZ4)
implementation of tANS
CRAM 3.0 of European Bioinformatics Institute for DNA compression
Format Size Encoding(s) Decoding(s)
BAM 8164228924 1216 111
CRAM v2 (LZ77+Huff) 5716980996 1247 182
CRAM v3 (order 1 rANS) 4922879082 574 190
![Page 8: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/8.jpg)
gzip used everywhere β¦ but itβs ancient (1993)β¦ what next?
Google β first Zopfli (2013, gzip compatible) Now Brotli (incompatible, Huffman) β Firefox support, all news:
βGoogleβs new Brotli compression algorithm is 26 percent better than
existing solutionsβ (22.09.2015)
First tests of new ZSTD_HC (Yann Collet, open source, tANS, 28.10.2015):
zstd v0.2 427 ms (239 MB/s), 51367682, 189 ms (541 MB/s)
zstd_HC v0.2 -1 996 ms (102 MB/s), 48918586, 244 ms (419 MB/s)
zstd_HC v0.2 -2 1152 ms (88 MB/s), 47193677, 247 ms (414 MB/s)
zstd_HC v0.2 -3 2810 ms (36 MB/s), 45401096, 249 ms (411 MB/s)
zstd_HC v0.2 -4 4009 ms (25 MB/s), 44631716, 243 ms (421 MB/s)
zstd_HC v0.2 -5 4904 ms (20 MB/s), 44529596, 244 ms (419 MB/s)
zstd_HC v0.2 -6 5785 ms (17 MB/s), 44281722, 244 ms (419 MB/s)
zstd_HC v0.2 -7 7352 ms (13 MB/s), 44108456, 241 ms (424 MB/s)
zstd_HC v0.2 -8 9378 ms (10 MB/s), 43978979, 237 ms (432 MB/s)
zstd_HC v0.2 -9 12306 ms (8 MB/s), 43901117, 237 ms (432 MB/s)
brotli 2015-10-11 level 0 1202 ms (85 MB/s), 47882059, 489 ms (209 MB/s)
brotli 2015-10-11 level 3 1739 ms (58 MB/s), 47451223, 475 ms (215 MB/s)
brotli 2015-10-11 level 4 2379 ms (43 MB/s), 46943273, 477 ms (214 MB/s)
brotli 2015-10-11 level 5 6175 ms (16 MB/s), 43363897, 528 ms (193 MB/s)
brotli 2015-10-11 level 6 9474 ms (10 MB/s), 42877293, 463 ms (221 MB/s)
lzturbo (tANS) seems even better, but close-source.
![Page 9: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/9.jpg)
Operating on fractional number of bits
restricting range π to length πΏ subrange contains π₯π (π΅/π³) bits
number π₯ contains π₯π (π) bits
adding symbol of probability π - containing π₯π (π/π) bits
lg(π/πΏ) + lg(1/π) β lg(πΏ/πΏβ²) for π³β² β π β π³ lg(π₯) + lg(1/π) = lg(π₯β²) for πβ² β π/π
![Page 10: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/10.jpg)
RENORMALIZATION to prevent π₯ β β
Enforce π₯ β πΌ = {πΏ, β¦ ,2πΏ β 1} by
transmitting lowest bits to bitstream
βbufferβ π contains π₯π (π) bits of information
β produces bits when accumulated
Symbol of ππ = π contains lg (1/π) bits:
lg(π₯) β lg(π₯) + lg (1/π) modulo 1
![Page 11: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/11.jpg)
![Page 12: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/12.jpg)
What symbol spread should we choose?
(using PRNG seeded with cryptkey to add encryption)
for example: rate loss for π = (0.04, 0.16, 0.16, 0.64)
πΏ = 16 states, π = (1, 3, 2, 10), π/πΏ = (0.0625, 0.1875, 0.125, 0.625)
method symbol spread dH/H rate
loss
- - ~0.011 penalty of quantizer itself
Huffman 0011222233333333 ~0.080 would give Huffman
decoder
spread_range_i() 0111223333333333 ~0.059 Increasing order
spread_range_d() 3333333333221110 ~0.022 Decreasing order
spread_fast() 0233233133133133 ~0.020 fast
spread_prec() 3313233103332133 ~0.015 close to quantization dH/H
spread_tuned() 3233321333313310 ~0.0046 better than quantization dH/H due to using also π
spread_tuned_s() 2333312333313310 ~0.0040 L log L complexity (sort)
spread_tuned_p() 2331233330133331 ~0.0058 testing 1/(p ln(1+1/i)) ~ i/p
approximation
![Page 13: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/13.jpg)
2. (forward) ERROR CORRECTION
Add deterministic redundancy to repair eventual errors
Binary Symmetric Channel (BSC): (e.g. 0101 β 0111)
Every bit has independent ππ probability of being flipped (0 β 1)
simplest error correction - Triple Modular Redundacy:
Use 3 (or more) identical copies, decode using Majority Voting
0 β 000, 1 β 111 β¦ 001 β 0, 110 β 1
Codewords: the used coding sequences (000, 111)
Decoder chooses the closest codeword (in its ball)
Hamming distance: the number of different bits (e. g. π(001,000) = 1)
![Page 14: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/14.jpg)
Triple Modular Redundancy:
2 radius 1 balls (000,111) cover all 3 bits sequences
2 codewords β (1 + 3) ball size = 23 codes
(7,4) Hamming codes: π(π1, π2) β₯ 2 (β )
7 bits β ball of radius 1 contains 8 sequences
24 codewords, union of their balls: 24 β 8 = 27
Finally: 27 codes are divided among 24 radius 1 balls around codewords
Bit position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
...
Encoded data
bits p1 p2 d1 p4 d2 d3 d4 p8 d5 d6 d7 d8 d9 d10 d11 p16 d12 d13 d14 d15
Parity
bit
coverage
p1 X X X X X X X X X X
p2 X X X X X X X X X X
p4 X X X X X X X X X
p8 X X X X X X X X
p16 X X X X X
![Page 15: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/15.jpg)
Triple Modular Redundancy: rate 1/3 (we need to transmit 3 times more)
Assume π = 0.01, probability of changed 2 of 3 bits is β 0.0003 > 0
Theory: rate 1 β β(0.01) β 0.919 can fully repair the message
The size of radius ππ ball in the space of length π sequences (hypercube):
(π
ππ) β 2πβ(π)
The number of such balls: 2π/2πβ(π) = 2π(1ββ(π))
We can send π(1 β β(π)) bits
by choosing one of balls (codewords)
Alternative explanation of rate πΉ(π) = π β π(π) bits/symbol:
additionally having bitflip positions, worth β(π) bits/symbol,
we would get 1 bit/symbol
Theoretical rate is asymptotic β required long bit blocks
![Page 16: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/16.jpg)
Directly long blocks - Low Density Parity Check (LDPC)
randomly distributed parity bits
Correction uses belief propagation β operates on Pr(ci = 1)
Ci = Pr(ci = 1) for input message (initialization of the process)
qij (1) (= Ci ) β βbeliefβ from ci to fj rji (b) β Pr(ci = b) from fj to ci
![Page 17: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/17.jpg)
Sequential decoding β search tree of possible corrections
π π‘ππ‘π = π(ππππ’π‘_πππππ, π π‘ππ‘π)
e.g. ππ’π‘ππ’π‘_πππππ =
βππππ’π‘_πππππ | (π π‘ππ‘π & 1)β
πππππ β depends on the history β connects redundancies,
guards the proper correction, uniformly spreading checksums
Encoding uses only some subset of π π‘ππ‘πs (e.g. last bit = 0)
![Page 18: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/18.jpg)
Channels
Binary Symmetric Channel (difficult to repair, well understood) β
bits have independent Pr = ππ of being flipped (e.g. 0101 β 0111)
rate = 1 β β(ππ)
Erasure Channel (simple to repair, well understood) -
bits have independent Pr = ππ of being erased (e.g. 0101 β 01? 1)
rate = 1 β ππ
synchronization errors , for example Deletion Channel -
Bits have independent Pr = ππ of being deleted (e.g. 0101 β 011)
Extremely difficult to repair, rate = ???
![Page 19: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/19.jpg)
Rateless erasure codes: produce some large number of packets,
any large enough subset (β₯ π) is sufficient for reconstruction
For example Reed-Salomon codes (slow):
π β 1 degree polynomial is defined by values in any π points
Fountain codes (from 2002, Michael Luby)
π are XORs of some (random) subsets of πΏ (π1, π2, β¦ ) β (π1, π2, β¦ )
e.g. if π1 = π1 , π2 = π1 XOR π2 then π1 = π1 , π2 = π2 XOR π1
What densities of XORred numbers should we chose?
Pr(1) =1
π Pr(π) =
1
π(πβ1) for π = 2, β¦ , π
Quick, but needs (1 + π)π to reliably decode (matrix invertible?)
![Page 20: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/20.jpg)
If packet wasnβt lost, it is probable damaged
protecting with Error Correction requires a priori damage level knowledge
unavailable in: data storage, watermarking, unknown packet route
Joint Reconstruction Codes: the same using a posteriori damage level knowledge
![Page 21: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/21.jpg)
Some packets are lost, some are damaged
Adding redundancy (sender)
requires knowing the damage level
Not available in many situations (JRCβ¦):
- damage of storage medium depends
on its history,
- broadcaster doesnβt know individual
damage levels,
- damage of watermarking
depends on capturing conditions
- damage of packet in network
depends on its route
- rapidly varying conditions
can prevent adaptation
![Page 22: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/22.jpg)
We could add some statistical/deterministic knowledge to decoder,
Getting compression/entropy coding without encoderβs knowledge
![Page 23: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/23.jpg)
JRC : undamaged case when received π + 1 packets
Large π: Gaussian distribution, on average 1.5 more nodes to test
Damaged case (π = 3, 2: π = 0, π: π = 0.2): β Pareto disstribution
![Page 24: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/24.jpg)
Sequential decoding: choose and expand most promising candidate (largest weight)
Rate is π π(π) = 1 β β1/(1+π)(π) for Pareto coefficient π
where βπ’(π) =ππ’+(1βπ)π’
1βπ’ is Renyi entropy, β1 is Shannon entropy
π 0(π) = 1 β β(π)
π 1(π) = lg (1 + 2βπ(1 β π))
Pr(# steps > π ) β π βπ
![Page 25: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/25.jpg)
3. KUZNETSOV-TSYBAKOV PROBLEM, rate distortion
Imagine we can send π bits, but π = πππ of them are damaged (fixed)
For example QR-like codes with fixed some fraction of pixels (e.g. as contour of a plane):
If the receiver would know positions of these strong constraints,
we could just use the remaining π β π = (1 β ππ)π bits
Kuznetsov-Tsybakov problem: what if only the sender knows the constraints?
Surprisingly, we can still approach the same capacity.
But the cost (paid only by the sender!) grows to infinity there.
![Page 26: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/26.jpg)
Weak/statistical constraints can be applied to all positions
e.g. standard steganography problem:
To avoid detection, we would like to modify at least as possible
Entropy coder: difference map carries β(π) β 0.5 bits/symbol β¦ picture needed for XOR
Kuznetsov-Tsybakov β¦ do receiver have to know the constraints?
![Page 27: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/27.jpg)
Homogeneous contrast case: use (π, 1 β π) or (1 β π, π) distributions
![Page 28: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/28.jpg)
General constraints: grayness of a pixel is probability of using 1 there
![Page 29: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/29.jpg)
Approaching the optimum: use the freedom of choosing the exact encoding sequence!
Find the most suitable encoding sequence (F) for our constraints.
The cost (search) is paid while encoding, decoding is straightforward.
Decoder cannot directly deduce the constraints.
Uniform grayness: probability that a random length π bit sequence have π of β1β is
1
2π ( πππ
) β 2βπ(1ββ(π))
so if we had essentially more than ππ(πβπ(π)) different (random) ways to encode
single message, with high probability some of them will use π of β1β.
The cost: the number of different messages reduces to at most
2π/2π(1ββ(π)) = 2πβ(π) giving rate limit exactly as if the receiver would know π
Kuznetsov-Tsybakov: probability of accidently fulfilling all πππ constraints is πβπππ
so we need to have more than 2πππ different (pseudorandom) ways of encoding β
we could encode 2π/2πππ = 2π(1βππ) different messages there β
the rate limit is π β ππ as if the receiver would know positions of fixed bits.
![Page 30: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/30.jpg)
![Page 31: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/31.jpg)
Constrained Coding: find the closest π = π»(π©ππ²π₯π¨ππ_ππ’ππ¬, ππ«ππππ¨π¦_ππ’ππ¬), send π
Rate Distortion: find the closest π»(π, ππ«ππππ¨π¦_ππ’ππ¬), send freedom_bits
Because T is bijection, for the same constraints/message to fulfill/resemble,
density of freedom bits + density of payload bits = 1
rate of RD + rate of CC = 1
we can translate previous calculations by just using 1 - rate
B&W picture: 1 bit per pixel = visual aspect (RD) + hidden message (CC)
![Page 32: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/32.jpg)
4) Distributed Source Coding β separated compression and
joint decompression of uncommunicated correlated sources
e.g. for correlated sensors, stereo recording (sound, video)
![Page 33: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfΒ Β· Symbol of probability π γ
contains /πγ bits Encoding](https://reader031.vdocuments.us/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/33.jpg)
Statistical physics on 2π hypercube of sequences
( πππ
) β 2πβ(π) β(π) = βπ lg(π) β (1 β π)lg (1 β π)
Entropy coding: symbol of probability π contains lg (1
π) bits of information
Error correction: π-bitflip probability bit carries 1 β β(π) bits of information
encode as one of codewords: centers of size 2πβ(π) balls
JRC: reconstruct if sum of 1 β β(π) over received packets is sufficient
Rate distortion: approximate sequence as the closest codeword
Constrained coding: for each message there is a (disjoint) set of codewords
for given constraints, we find the closest codeword representing our message
Distributed source coding: e.g. codewords include correlations