distributed coding of compressively sensed sources...(cs) and distributed source coding (dsc), the...

Distributed Coding of Compressively SensedSources

by

Maxim Goukhshtein

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

in the

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

c© Copyright 2017 by Maxim Goukhshtein

ii

AbstractDistributed Coding of Compressively Sensed Sources

Maxim Goukhshtein

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2017

In this work we propose a new method for compressing multiple correlated sources with a

very low-complexity encoder in the presence of side information. Our approach uses ideas

from compressed sensing and distributed source coding. At the encoder, syndromes of the

quantized compressively sensed sources are generated and transmitted. The decoder uses

side information to predict the compressed sources. The predictions are then used to recover

the quantized measurements via a two-stage decoding process consisting of bitplane predic-

tion and syndrome decoding. Finally, guided by the structure of the sources and the side

information, the sources are reconstructed from the recovered measurements. As a motivating

example, we consider the compression of multispectral images acquired on board satellites,

where resources, such as computational power and memory, are scarce. Our experimental re-

sults exhibit a significant improvement in the rate-distortion trade-off when compared against

approaches with similar encoder complexity.

iii

AcknowledgementsI wish to express my most sincere gratitude to my advisor Prof. Stark Draper, for his contin-uous support and guidance and sharing his wealth of knowledge and ideas. Having had theopportunity to work for him as a teaching assistant, I am truly inspired not only by his pas-sion for learning and professionalism, but also his deep care for pedagogy and commitmentto teaching. It has been an absolute pleasure to have conversations about anything from in-formation theory, the “real" reason for the significance of the Fourier transform to politics andbiking.

I would like to extend a great amount of appreciation to Dr. Petros Boufounos, who servedas my host during my summer internship at the Mitsubishi Electric Research Laboratories(MERL). I’m deeply grateful for his mentorship during and after my time at MERL, his invalu-able contributions to this thesis and for introducing me to the wonderful world of compressedsensing. I’m looking forward to future collaboration.

I thank all the wonderful researchers, staff and interns at MERL for making the summerof 2016 such an enriching experience on both an academic and a personal level. A specialshoutout goes to Ulugbek Kamilov, Hassan Mansour and Dehong Liu from the Multimediateam and collaborators Toshiaki Koike-Akino and Ye Wang for sharing their expertise andenthusiasm for research.

Thanks to all my past and present office mates for sharing their knowledge, opinionsand world views around campus and over lunches in Chinatown. A big thank you goes toPao-Sheng (Joshua) Chou, with whom I spent countless hours exchanging ideas, strugglingthrough course material and playing soccer.

I want to thank my friends all over the world for making life so much more interesting.A huge thank you to Artur Plis whose close friendship has been a great source of joy, un-derstanding and growth. Thank you to Harsh Aurora, whose eternal optimism and genuinepositivity is at times baffling and at times awe-inspiring, for being an awesome summer ad-venture companion.

Finally, I am deeply indebted to my parents, Rita and Michael, who worked hard to raiseme and provide me with a comfortable life, for their unconditional love and support and forinstilling in me the love of learning.

iv

Contents

Acknowledgements iii

List of Figures vi

List of Tables viii

1 Introduction 1

2 Background 42.1 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Adaptive Compression via Sparsity . . . . . . . . . . . . . . . . . . . . . 42.1.2 Non-Adaptive Compression via Sparsity . . . . . . . . . . . . . . . . . . 52.1.3 Some Fundamental Notions in Compressed Sensing . . . . . . . . . . . . 62.1.4 Optimization Problems for Sparse Recovery . . . . . . . . . . . . . . . . 72.1.5 Quantized Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Distributed Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.1 Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Lossy and Lossless Compression . . . . . . . . . . . . . . . . . . . . . . . 9Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Code Rates, Capacity and the Channel Coding Theorem . . . . . . . . . 11Syndrome Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.3 Distributed Source Coding via Channel Coding . . . . . . . . . . . . . . 14

3 Methodology 163.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1 Quantized Compressively Sensed Measurements . . . . . . . . . . . . . 173.2.2 Prediction Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Contents v

3.2.3 Syndrome Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.4 Encoding Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1 Source Prediction, Measurement and Quantization . . . . . . . . . . . . 333.3.2 Recovery of Quantized Measurements . . . . . . . . . . . . . . . . . . . . 34

Calculating LDPC Likelihoods to Improve Syndrome Decoding . . . . . 363.3.3 Source Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Results 434.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2.1 Prediction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Successive Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2.2 WTV optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Conclusion 545.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.2 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

vi

List of Figures

2.1 A binary symmetric channel with crossover probability p. . . . . . . . . . . . . . 12

3.1 Compression system diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 (A) A 512× 512 image. (B) A 512× 512 image consisting of 64 randomly chosen

64× 64 image blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3 Compression rate in bits-per-pixel of the image Fig. 3.2(A) for different values

of ∆. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4 Combinations ofm and ∆ resulting in a 2.00 bpp compression of the two images

in Fig. 3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.5 Compression – compressed sensing following by quantization (in red). . . . . . 213.6 Compression – calculation of prediction statistics (in red). . . . . . . . . . . . . . 233.7 Calculating the error probability p3 by finding the area corresponding to the

inconsistent intervals (shown in blue). . . . . . . . . . . . . . . . . . . . . . . . . 263.8 Empirical and theoretical bit error probabilities for quantized measurements,

acquired using a Gaussian sensing matrix, as a function of the normalized pre-diction error εσ

∆ for K = 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.9 Histogram of measurements acquired using the randomized subsampled WHT

operator, a Gaussian fit using the empirical mean and variance of the measure-ments and the equivalent Gaussian sensing matrix statistics. . . . . . . . . . . . 29

3.10 Empirical and theoretical bit error probabilities for quantized measurements,acquired using a randomized subsampled WHT operator, as a function of thenormalized prediction error εσ

∆ for K = 3. . . . . . . . . . . . . . . . . . . . . . . 303.11 Bit error probability pK as a function of the normalized prediction error εσ

∆ fordifferent values of K (i.e., assuming that K − 1 bitplanes have already beencorrectly recovered). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.12 The number of bitplanes that need to be coded as syndromes as a function ofthe normalized prediction error εσ

∆ . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.13 Compression – syndrome generation (in red). . . . . . . . . . . . . . . . . . . . . 32

List of Figures vii

3.14 Decompression system diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.15 Decompression – prediction followed by compressed sensing and quantization

(in red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.16 Recovering quantized measurements of bitplane k from the quantized predicted

measurements q = Q(y) and the previously recovered k − 1 bits q(1:k−1). . . . . 353.17 Bitplane prediction of a quantization point using the prediction measurement. 363.18 Decompression – recovery of the quantized measurements (in red). . . . . . . . 373.19 Calculating L3(y) by determining the areas A1 (blue) and A2 (red). . . . . . . . 383.20 The bit error likelihood LK as a function of the distance parameter c for K = 3. 403.21 Decompression – reconstruction of the compressed source (in red). . . . . . . . 42

4.1 A 512×512×4 multispectral image used for testing of the compression method(acquired by the AVNIR-2 instrument on board of the ALOS satellite [1]) . . . . 44

4.2 System diagram for the proposed compression-decompression method usinglinear prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3 System diagram for the proposed compression-decompression method usingsuccessive prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

viii

List of Tables

4.1 Decoding PSNR at 2 bpp (512× 512 image crop) . . . . . . . . . . . . . . . . . . 524.2 Improvement over benchmark (512× 512 image crop compressed at rate 2 bpp) 524.3 Decoding PSNR at 1.68 bpp (full 7040× 7936 image) . . . . . . . . . . . . . . . . 534.4 Improvement over benchmark (7040× 7936 image compressed at rate 1.68 bpp) 53

1

Chapter 1

Introduction

Compression of data enables its efficient transmission and storage. Compression and decom-pression are sometimes performed under different settings and subject to different constraints.For example, remote sensing done on board satellites is often subject to limited computationalpower and memory. Under such conditions, prior to being transmitted, the remotely senseddata must be compressed using a low-complexity encoder. The encoding complexity of stan-dard compression approaches, such as JPEG-2000, may be prohibitively large in such scenar-ios. In contrast, decompression of the satellite captured data is typically done on earth, whereresources abound. Motivated by this example, in this thesis we developed a rate-efficientcompression method admitting a low-complexity encoder and shifting the complexity to thedecoder.

The focus of this thesis is on the lightweight compression of multiple correlated signalsunder the assumption that one of the signals and a small amount of statistical information areavailable at the decoder as side information. Combining techniques from compressed sensing(CS) and distributed source coding (DSC), the proposed method describes a low-complexityencoder, while exhibiting a favourable rate-distortion trade-off.

The use of DSC to establish methods for lossy compression with low-complexity encodingwas studied in [2, 3]. The lossless compression counterpart was presented in [4]. In [5], anapproach based on predictive coding for performing near-lossless compression of remotelysensed images is put forward. Although the aforementioned methods display good rate-distortion performance, in this work we wish to consider approaches with an even lower-complexity encoder. One such approach was presented in [6], sharing the same problem set-ting as considered in this work and proposing an encoder with a similar complexity. As such,we use [6] as a benchmark for our proposed approach. Similarly to our method, [6] uses CSprinciples to acquire signal measurements and perform their reconstruction, as well as sideinformation for signal prediction at the decoder. However, in contrast to our approach, which

Chapter 1. Introduction 2

relies on DSC to correct mismatches between the original and predicted signals, [6] encodestheir locations and transmits them as part of the side information.

At the encoder, randomized linear measurements of each signal are acquired. The numberof measurements is smaller than the signal dimension, providing effectively the first level ofcompression. These measurements are then quantized using a simple scalar quantizer. A smallnumber of bitplanes of the quantized measurements are encoded. In particular, a bitplane isencoded as a syndrome of a linear channel code. The syndrome size is smaller than the size ofthe bitplanes, providing the second level of compression. At the end of the compression pro-cess, the signals are represented in the form of syndromes. The correlation between the signalsallows the encoder to determine which bitplanes to encode and at what code rates. This insightis one of the key elements of the proposed approach. Determination of the code rates exhibitsa strong theoretical connection to universal scalar quantization, which was presented in [7] asa rate-distortion efficient alternative to uniform scalar quantization of CS measurements.

At the decoder, the side information is used to predict the signals. These predictions arethen measured and quantized in the same fashion as was done at the encoder. Recovery ofthe original quantized measurements is done via DSC. In particular, the decoder estimates abitplane of the quantized predicted measurements using the signal prediction and the pre-viously decoded bitplanes. The estimated bitplane is then encoded as a syndrome. The en-coder and decoder generated syndromes are then used by the decoder to retrieve the originalquantized measurement via syndrome decoding. This process continues until all bitplanes aredecoded, at which point all the quantized measurements are recovered exactly. In this work,we used the capacity-achieving low-density parity-check (LDPC) codes in order to maximizethe syndrome-based compression. Furthermore, efficient decoding of these codes can be per-formed using belief propagation. Finally, the signals are reconstructed from the quantizedmeasurements by solving a sparse optimization problem.

The nature, or structure, of the signals, as well as the side information, determine the par-ticular form of the optimization problem. Using signal structure in compression has beencommonly used in image and video coding. For example, spatial correlations of image pix-els have been used to compress encrypted images [8, 9], while temporal correlations betweenvideo frames were used in a similar manner for compression of encrypted video streams [9,10].

The thesis is organized as follows: Chapter 2 provides background information about com-pressed sensing, distributed source coding and other related topics such as transform codingand various fundamental aspects of information theory. The problem formulation, as wellas the encoding and decoding procedures are presented in Chapter 3. An example of an ap-plication of the developed approach to the compression of multispectral images is discussed

Chapter 1. Introduction 3

in Chapter 4. Finally, Chapter 5 concludes the thesis with a summary and a discussion of anumber of practical considerations and open questions.

4

Chapter 2

Background

2.1 Compressed Sensing

2.1.1 Adaptive Compression via Sparsity

A signal x is said to be sparse if it has few non-zero components. More specifically, we saythat x ∈ Rn is S-sparse if it belongs to the set ΣS = {x ∈ Rn : ‖x‖0 ≤ S}, where ‖x‖0 returnsthe number of non-zero components of x. A signal is said to be compressible if it can be wellapproximated by a sparse signal (i.e., it has few large non-zero components, while the rest arevery small, but not necessarily equal to zero).

One approach to compression, called transform coding, relies on the empirical observationthat many naturally occurring signals are sparse or compressible in some domain. This impliesthat the coefficients needed to represent an observed signal can sometimes be redundant; wecan potentially represent the same signal in a different domain with far fewer coefficients. Thisidea is used, for example, in image compression standards such as JPEG-2000 [11]. Imagesare known to be highly compressible in the wavelet domain. It is therefore possible to storeimages by keeping a very small fraction of their wavelet coefficients, which can then be usedto reconstruct the image without introducing much of a noticeable perceptual loss.

To compress a signal z ∈ Rn, the simple idea behind such approaches is the following:

1. Transform the signal z into its sparsity domain, x = ΨT z ∈ Rn, where Ψ ∈ Rn×n is anorthonormal basis in which z is can be represented using the sparse (or compressible)vector x.

2. Form the set ΛS = {P1, P2, . . . , PS} ⊂ {1, 2, . . . , n} consisting of the indices of the Slargest components of x.

3. Calculate y = PΛSx where PΛS ∈ RS×n is a matrix selecting the components of x that be-long to ΛS (i.e., (PΛS )i,j = 1{j = Pi}, where 1(·) is an indicator function). For example,

Chapter 2. Background 5

if n = 4 and ΛS = {P1, P2} = {1, 3}, then

PΛS =

[1 0 0 0

0 0 1 0

]

The vector y ∈ RS consists therefore of the S largest coefficients of x, which is the sparserepresentation of z in the domain Ψ. Therefore, following this approach the signal z can berepresented using only 2S (rather than n) numbers (i.e., the S coefficients of y and the set ΛS),thereby achieving compression (provided that 2S < n). The signal can be reconstructed as

z = ΨPTΛS

y = ΨPTΛS

PΛSx = ΨPTΛS

PΛSΨT z. (2.1)

Note that PTΛS

PΛSx ∈ Rn is equal to x in the locations of the S largest coefficients and zeroelsewhere. Therefore, if x is S-sparse, that is if x ∈ ΣS , the compression is lossless (since thenPT

ΛSPΛSx = x, and so z = Ψx = z). Otherwise, if x 6∈ ΣS , compression is lossy since z 6= z.

It is important to note that we do not know ahead of time the positions of a signal’s largestcoefficients in the sparsity domain (i.e., ΛS). Therefore, to determine ΛS (i.e., step 2), we mustfirst compute all n coefficients of x (i.e., step 1), before throwing most of them away in step 3.Such an approach is said to be adaptive, since it must adapt to the particular signal of interest(i.e., PΛS depends on the set ΛS which in turn depends on z).

2.1.2 Non-Adaptive Compression via Sparsity

One might wonder whether it is not wasteful to calculate all of these transform coefficientswhen eventually only a small fraction of them is to be kept? Or put differently, is it possibleto somehow directly obtain a compressed representation of z in the form y = Az ∈ Rm withm < n? This idea presents two challenges, namely:

• Such an approach must be non-adaptive since the acquisition system A ∈ Rm×n can nolonger rely on knowledge of the particular signal z (or its representation in some sparsitydomain).

• Reconstructing z from y involves finding a solution z to the underdetermined system oflinear equations y = Az, which has infinitely many solutions (or no solution at all).

Perhaps surprisingly, the theory of compressed sensing (CS) provides an affirmative answerto the question above: direct, non-adaptive, acquisition and efficient reconstruction of sparseor compressible signals is possible!


2.1.3 Some Fundamental Notions in Compressed Sensing

In a more abstract mathematical sense, CS is concerned with studying the theory, and devel-oping efficient methods, for solving underdetermined systems of linear equations of the form

y = Ax, (2.2)

where y ∈ Rm, x ∈ Rn and A ∈ Rm×n with m < n. The matrix A is referred to as thesensing matrix. In (2.2), the vector x may already be sparse, in which case we can think of thesensing matrix as A = ΦΨ, where Φ ∈ Rm×n is a measurement matrix and Ψ ∈ Rn×n is thesparsity basis of x. From a practical point of view, (2.2) represents the problem of inferring thedata x from a set of small number of linear measurements y. Aside from the aforementionedadaptive compression example, many practical problems in science and engineering naturallyfit this template.

The ability to find a solution to this inference problem depends on two important prop-erties. One is the sparsity of the signal x in some domain. In fact, CS theory shows that theacquisition system (i.e., sensing matrix) does not need knowledge of the sparsity domain ofx (this knowledge is, however, needed in reconstruction). The second property, incoherence,is concerned with the structure of the sensing matrix. Incoherence demands that the basis Φ,in which the measurements are taken, be dense (i.e., opposite of sparse) with respect to thesparsity basis Ψ of x (the intuition is that each measurement is designed to capture as muchinformation as possible from the sparse signal).

At the heart of CS lies the idea of the restricted isometry property (RIP) which provides guar-antees for the performance of various CS sparse reconstruction algorithms. It is said that amatrix A satisfies the RIP of order S if there exist a constant δS ∈ (0, 1) such that

(1− δS)‖x‖22 ≤ ‖Ax‖22 ≤ (1 + δS)‖x‖22, (2.3)

for all x ∈ ΣS . In other words, a matrix A satisfying the RIP of order S behaves roughly like anorthogonal matrix for all S-sparse vectors. A matrix A ∈ Rm×n whose entries are drawn in ani.i.d manner from a Gaussian distributionN (0, 1/m) will satisfy the RIP with high probabilityprovided thatm ∼ O(S log n

S ) [12]. Similar random constructions using other distributions arepossible [13].


2.1.4 Optimization Problems for Sparse Recovery

A natural way to approach sparse recovery is to solve the following optimization problem:

minx

‖x‖0

subject to y = Ax.(2.4)

Unfortunately, this problem is computationally intractable. Instead, one may try solving thebasis pursuit problem:

minx

‖x‖1

subject to y = Ax.(2.5)

The use of the l1 norm for finding a sparse solution is well-established in literature and datesback to its use in seismology in the early 70’s [14]. Provided that A satisfies the RIP of order 2S,the solution is recovered exactly [12]. More generally, if one considers recovering the solutionx from noisy measurements

y = Ax + w, (2.6)

where w ∈ Rm models the noise, one can attempt to recover a solution by solving the the basispursuit denoising (BPDN) (or the closely related Lasso problem):

minx

‖x‖1

subject to ‖y −Ax‖2 ≤ ε,(2.7)

where the ε is chosen such that ‖w‖2 ≤ ε with high probability [13, 15]. Once again, the RIPprovides reconstruction guarantees. In particular, it has been shown that for any x ∈ ΣS , thesolution x∗ to (2.7) obeys

‖x∗ − x‖2 ≤ Cε, (2.8)

where the constant C depends only on the restricted isometry constant δ4S [15]. In otherwords, the quality of sparse recovery from noisy measurements is bounded by the noise.

2.1.5 Quantized Compressed Sensing

In most practical systems, including the one studied in this work, signals are quantized follow-ing their acquisition. The effects of quantization in CS reconstruction has been studied exten-sively (e.g., [16–22]). A survey of quantized CS (QCS) quantization approaches, reconstructionmethods, as well as theoretical results can be found in [23]. In [24], the optimal quantization of


compressively sensed measurements corrupted by white Gaussian noise has been studied us-ing the replica method. Assuming reconstruction using an MMSE estimator, [24] proved theexistence of an optimal quantization scheme which achieves the minimal distortion (whichis a function of the number of measurements, sparsity rate, noise level and quantizer rate).Further, [24] presents a converse which considers source sub-blocks in the large system limitsetting. In QCS, the rate (i.e., the bit budget for the quantized source measurements) is theproduct of the number of source measurements and the number of bits used to quantize eachmeasurement. Given a constant rate, the effects of the interplay between these two quantitieson the quality of reconstruction were studied in [25]. This paper considered measurementscorrupted by noise and showed the existence of two regimes of operation which are a functionof the input SNR. In the low SNR regime, called quantization compression, it is better to acquiremore measurements and allocate fewer bits per measurement. On the other hand, in the highSNR measurement compression regime, it is more favorable to acquire fewer measurements andincrease the number of bits allocated to each measurement. This behaviour can be attributedto the effects of noise folding which increases the noise of the signal as fewer measurements areconsidered, resulting in measurements of lower quality.

In [16], uniform scalar quantization has been shown to be suboptimal in terms of rate-distortion trade-off. The design of alternative quantization approaches were studied in [19]which considered the design a high-resolution quantizer, as well as in [21] which considersthe use of a belief propagation based approach to design an optimal quantizer. These andother approaches to quantizer design increase the encoder’s complexity. Instead, in our work,we use a simple uniform scalar quantizer and focus on approaches for improving CS recon-struction.

In the method proposed in our work we introduce uniform dither to the acquired sig-nal measurements, making the statistics of the quantization noise (i.e., mismatch between themeasurements and their quantized values) known. Hence, a possible approach is to model theeffects of the quantizer as noise and perform recovery by solving the BPDN (2.7) program or asimilar optimization problem. This idea, of modeling quantization as noise, has been exploredin various papers such as in [20]. In [20], two optimization based methods for improved re-construction of quantized measurements corrupted by Gaussian noise were explored. Sparserecovery using the Basis Pursuit DeQuantizer (BPDQ) [26], which extends BPDN to norms otherthan the Euclidean norm, has been shown to outperform BPDN in terms of rate-distortiontrade-off. In [17], the effects of quantization on reconstruction using CS methods adapted toaccount for quantization noise were studied. A method based on the Generalized ApproximatedMessage Passing (GAMP) to perform CS reconstruction was presented in [27]. In that work,


the reconstruction procedure is quite general and can be applied to a variety of settings whichinclude different types of scalar quantizers, sampling rates and signal priors.

2.2 Distributed Source Coding

2.2.1 Source Coding

Source coding, or compression, is at the heart of much of today’s digital world. Transmissionand storage of digital data relies heavily on the ability to represent it compactly while simul-taneously preserving its inherent information content. Although compression algorithms candiffer widely in their modes of operations, their underlying principle is the same: strip awaythe data from its redundancies to retain only its essential information content.

Lossy and Lossless Compression

Compression can be either lossless or lossy. In the former case, it is possible to recover the orig-inal information exactly from its compressed representation. In the later case, the recoveredinformation will differ from the original information. The difference between the original andrecovered information is called distortion and can be measured in various ways (e.g., mean-squared error, Hamming distance, etc.). When compression is lossy, one often talks about thetrade-off between the rate of compression R and the incurred distortion D. This trade-off isoften characterized by the rate-distortion R(D) or distortion-rate D(R) functions.

Entropy

Shannon’s groundbreaking work [28], which has led to the birth of the field of informationtheory and the development of many of its fundamental results, adopts a probabilistic view ofcommunication systems. In particular, information sources are modeled as random variablesor processes.

Consider a discrete random variableX with a probability mass function p(x) taking valuesfrom an alphabet X . The entropy of X is defined as

H(X) = −∑

x∈Xp(x) log p(x). (2.9)

When log of base 2 and e are used, entropy is measured in bits and nats, respectively. Entropyis one of the central quantities in information theory. It measures the amount of uncertaintyin X , thereby quantifying the inherent information content in X ; observing a realization of a


highly random X reveals a lot of information (e.g., the outcome of a fair coin flip), whereas,at the other extreme, a realization of a deterministic X does not produce any new informa-tion (e.g., the outcome of a flip of a two-headed coin). Entropy also quantifies the number ofbits needed on average to represent a realization of X , providing a lower bound for the rateneeded to losslessly compress X . In the special case that X is a binary discrete random vari-able distributed according to a Bernoulli-(p) distribution, we often denote its binary entropy asHB(p), given by

HB(p) = −p log p− (1− p) log(1− p). (2.10)

Similarly, a pair of discrete random variables (X,Y ) ∈ (X ,Y), with a joint probability massfunction p(x, y), have a joint entropy defined as

H(X,Y ) = −∑

x∈X

∑

y∈Yp(x, y) log p(x, y). (2.11)

The conditional entropy of Y given X is defined as

H(Y |X) =∑

x∈Xp(x)H(Y |X = x) = −

∑

x∈Xp(x)

∑

y∈Yp(y|x) log p(y|x), (2.12)

where p(y|x) is the conditional probability mass function of Y given X . Finally, the chain rulereveals the relationship between the marginal, joint and conditional entropies

H(X,Y ) = H(X) +H(Y |X) = H(Y ) +H(X|Y ). (2.13)

2.2.2 Channel Coding

A data signal transmitted over a channel (e.g., air, wire) is often corrupted by various typesof noise (e.g., thermal, electromagnetic). In order to recover the original information from thereceived signal, the information must be encoded prior to transmission using a channel (orerror-correcting) code. In contrast to source coding, channel coding introduces redundancies todata.

Linear Codes

The simplest form of a channel code is the repetition code. Repetition codes simply repeat thesame bit of information a fixed number of times prior to transmission. At the receiver, the morefrequent value is considered to be the one sent by the transmitter. For example, a repetitioncode of length 3 repeats a bit of information 3 times; therefore, if one wants to send the bit 1, 111


will be transmitted. If at the receiver the sequence 101 is observed and the probability of thechannel flipping a bit is less than 0.5, the receiver deduces that the transmitter likely sent thebit 1. One can also use a higher rate of repetition in order to ensure that the original data canbe decoded at the receiver with higher probability. However, since higher repetition meansmore bits sent over the channel, and since transmission comes at a price (i.e., energy, time,money), it may be advisable to use more sophisticated approaches which enable correction oferrors while also minimizing the length of the sent data.

Repetition codes are an example of codes belonging to a family of channel codes, known aslinear block codes. An (n, k) binary linear block code C is a k-dimensional subspace of Fn2 . Thebasis of C is defined by the generator matrix G ∈ Fn×k2 . A k-tuple of information bits u ∈ Fk2is used to generate the length-n codeword x = Gu ∈ C. The length of the codewords is calledblock length. The parity-check matrix H ∈ F(n−k)×n

2 is a matrix such that Hx = 0 for all x ∈ C(hence, HG = 0, so C defines the null-space of H). More generally, for any v ∈ Fn2 , the vectors = Hv is called a syndrome.

The aforementioned repetition code of length 3 is an example of a (3, 1) code C = {000, 111}with a generator matrix G = [1, 1, 1]T .

Code Rates, Capacity and the Channel Coding Theorem

The rate of an (n, k) linear code C is the ratio between the number of information bits andthe block length, R = k

n . The rate therefore captures the level of a code’s redundancy: avery low rate means that the number of transmitted bits n is much larger than the number ofinformation bits k (and vice-versa).

A discrete channel is a communication system characterized by the triplet (X , p(y|x),Y).The channel law p(y|x) is the probability of observing the output y ∈ Y given that the input tothe channel was x ∈ X . A channel is said to be a discrete memoryless channel (DMC), if for allx ∈ X n and y ∈ Yn, it holds that

p(y|x) =n∏

i=1

p(yi|xi). (2.14)

In other words, the output of a DMC at any given time depends only on the input at that timeand not on any past or future inputs. A simple example of a DMC, is the binary symmetricchannel (BSC), shown in Fig.2.1. In a BSC, the binary input X is flipped with a probability p(called the crossover probability). In other words, p(Y = 0|X = 1) = p(Y = 1|X = 0) = p andp(Y = 0|X = 0) = p(Y = 1|X = 1) = 1 − p. We use BSC-(p) to denote a BSC with crossoverprobability p.


X Y

0

1

0

1

1− p

1− p

p

p

FIGURE 2.1: A binary symmetric channel with crossover probability p.

The capacity of a DMC (X , p(y|x),Y) is defined as

C = maxp(x)

I(X;Y ), (2.15)

whereI(X;Y ) =

∑

x∈X

∑

y∈Yp(x, y) log

p(x, y)

p(x)p(y), (2.16)

is called mutual information. Mutual information is also equal to

I(X;Y ) = H(X)−H(X|Y ) (2.17)

= H(Y )−H(Y |X), (2.18)

suggesting that I(X;Y ) quantifies the reduction in the uncertainty of a random variable X (orY ) given the knowledge of Y (or X). In the case of a BSC-(p)

I(X;Y ) ≤ 1−HB(p), (2.19)

where HB(p) is the binary entropy defined in (2.10). When the distribution of X is uniform(which induces a uniform distribution on Y ), (2.19) holds with equality. Therefore, the capacityof a BSC-(p) is CBSC−(p) = 1−HB(p).

In [28], Shannon addressed the following fundamental question: what is the optimal rateat which information can be sent reliably over a noisy channel? Put differently, the questionasks what is the highest possible code rate R of a code capable of correcting all errors intro-duced by a noisy channel? Shannon’s remarkable result, the channel coding theorem, says thatreliable communication over a noisy DMC with capacity C is possible at any rate R < C andimpossible otherwise. More formally, in the context of binary linear codes, it states that there


exists an (n, k) code with rate R = kn < C for which the probability that error-correction fails

can be made arbitrarily small. Furthermore, it states that no such code exists for rates R > C.A channel code whose rate matches (or approaches) the channel capacity is said to be a

capacity-achieving (or capacity-approaching) code. Shannon’s proof of the channel coding theo-rem relies on the use of random codes with asymptotically large block lengths. The explicitconstruction of finite length codes proved to be a very difficult problem. In particular, thechallenge lay in designing codes which admit a computationally tractable decoder. Indeed, asthe renowned mathematician and engineer Elwyn Berlekamp noted in 1968 [29]:

From a practical standpoint, the essential limitation of all coding and decodingschemes proposed to date has not been Shannon’s capacity but the complexity (andcost) of the decoder.

One such costly approach was proposed in 1963 by Robert Gallager who introduced thelow-density parity-check (LDPC) codes in his Doctoral thesis [30]. LDPC codes are capacity-approaching, however due to insufficient computational power at the time, Gallager’s resultswere largely ignored. It was only about 40 years later, thanks to significant technological ad-vancements in computing hardware, that his ideas were finally put into practice. LDPC codesremain to this day among the most popular and widely used error-correcting codes. Asidefrom being capacity-approaching, their popularity can be attributed to a number of reasons:they can be efficiently decoded using a message-passing approach called belief propagation (BP)and they lend themselves to mathematical analysis via techniques such as density evolution [31]and EXIT charts [32]. A more recent example of capacity-achieving channel codes, polar codes,were introduced by Erdal Arikan in 2009 [33].

Syndrome Decoding

Suppose that a transmitted codeword x ∈ C is corrupted by some noise vector e ∈ Fn2 toproduce the vector v = x + e (addition in F2 is equivalent to modulo-2 addition in Z). Uponobserving v, the receiver attempts to correct the error by identifying the most likely errorpattern (which is then subtracted from v to hopefully recover the sent codeword x). The setof possible error patterns is the coset E = {v + x : x ∈ C} = v + C, which can be uniquelyidentified by the syndrome s = Hv = H(x+e) = 0+He = He. Therefore, given a syndrome,calculated from the received vector v, the receiver can search through the coset E to find themost probable error pattern e. The receiver can then produce the decoding x = v−e. Providedthat e = e, x = x and the original codeword is recovered. This method for correcting errorsusing syndromes is called syndrome decoding.


A possible practical challenge with this approach is that the size of the coset |E| = |C|maybe prohibitively large for a direct search. For example, the size of a (400, 200) code is 2200 (morethan the number of electrons in the solar system [34]!). In practice, methods such as BP, forapproximating the search in an efficient manner, are often used.

2.2.3 Distributed Source Coding via Channel Coding

Distributed source coding (DSC) is concerned with the compression of correlated informationsources. The original work on DSC dates back to the work of Slepian and Wolf [35] in whichthey considered lossless compression of two correlated sequences in a variety of settings. Mostnotably, they showed that if one of the sources is available at the decoder as side information,the other source can be encoded without access to the side information at the same optimalrate that one would use if side information was available at the encoder! More generally, theSlepian-Wolf theorem [35] states that lossless compression, with separate encoders and a jointdecoder, of two sources X and Y drawn from p(x, y) can be achieved with any rates satisfying

RX ≥ H(X|Y ), (2.20)

RY ≥ H(Y |X), (2.21)

RX +RY ≥ H(X,Y ). (2.22)

An extension of the results to the lossy compression case was later presented by Wyner andZiv in [36].

The connection between distributed source coding and channel coding also has roots inthe same period of time [37, 38]. Nonetheless, it was only about 30 years later that a practicalapproach to perform DSC was proposed by Pradhan and Ramchandran [39]. Their approach,known as DISCUS (short for Distributed Source Coding Using Syndromes), uses, as the namesuggests, syndrome decoding of channel codes to perform DSC.

The general idea of DSC using syndrome decoding is the following. Consider the twobinary sources x,y ∈ Fn2 . Suppose that y = x + e, where the entries of e are distributed inan i.i.d manner according to a Bernoulli-p distribution (here p is assumed to be known, bute is not). The goal is to compress x (at the highest possible rate), knowing that the decoderhas access to the side information y. In a channel coding context, one may think of y as theobserved n-tuple output of a BSC-(p) with input codeword x. To recover x from y one mustthen try to infer the noise e. Suppose H ∈ F2 is a parity-check matrix of a code with code rateR. Let sx = Hx and sy = Hy be the syndromes of x and y, respectively. Consider now the


sum of these syndromes

sx+y = sx + sy = H(x + y) = H(x + x + e) = He = se, (2.23)

which is just the syndrome of the “error pattern" e. We can therefore perform syndrome decod-ing as described above in order to determine the highest probability error pattern and recoverthe original source x. The compression is achieved by representing x ∈ Fn2 via its syndromesx ∈ Fn−k2 . The compression is then by a factor of

ρ =# bits to represent x# bits to represent sx

=n

n− k =1

1− kn

=1

1−R. (2.24)

To maximize ρ, we need to maximize R. By the channel coding theory, the maximum possiblechannel code rate for a BSC with crossover probability p is Rmax = CBSC−(p) = 1 − HB(p).If the elements of x are distributed uniformly, we can design the channel code to have a rateR = Rmax (i.e., a code designed to correct errors over a BSC-(p)). In that case, we haveHB(p) =

H(Y |X) = H(X|Y ). Therefore, the highest possible compression ratio is

ρmax =1

HB(p)=

1

H(X|Y ), (2.25)

which implies that x can be compressed at a rateRX = 1ρmax

= H(X|Y ), achieving the Slepian-Wolf bound (2.20).

16

Chapter 3

Methodology

3.1 Problem Setting

Let x0,x1, . . . ,xN ∈ Rn be correlated information sources. The nature of the correlation be-tween the sources is assumed to be known (or can be determined) by both the encoder anddecoder. In this work we wish to develop an approach to compress x1,x2, . . . ,xN with anextremely low-complexity encoder, under the assumption that x0 is available at the decoderas side information, alongside sufficient prediction information to obtain good predictions ofthe sources. The size of the prediction information is expected to be extremely small, hence inpractice we may consider bundling the prediction information with the compressed sources,in which case the side information would consist only of x0.

To measure the performance of our method we consider the trade-off between the com-pression rate and the quality of reconstruction. In particular, we use peak signal-to-noise ratio(PSNR) to quantify the quality of reconstruction. Measured in decibels (dB), PSNR is definedas

PSNR(xi, xi) = 10 log10

(max(xi)

2

MSE(xi, xi)

), (3.1)

where MSE(xi, xi) is the mean squared error between the source xi and its reconstruction xi

and max(xi) returns the value of largest element of xi.In the remainder of this work, we will use the notation (xi)j , j = 1, 2, . . . , n to denote the

jth element of the ith source. When the index i of the source is not important, we will simplywrite x instead of xi and (x)j instead of (xi)j .

3.2 Encoding

Encoding, or compression, of the sources consists of three stages:

• Acquisition of compressive measurements followed by quantization.

Chapter 3. Methodology 17

• Calculation of statistical information required for prediction of the sources (at the de-coder), as well as the prediction error.

• Generation of syndromes from the quantized measurements.

A system diagram of the compression process is shown in Fig. 3.1

Uncompressedsource: xi

1∆i

Axi + wi Q(·) Encoder

Predictioninformationcalculation

Compressed source:s(1), s(2), . . .

Side information:Λ,ΘΛ

yi qi

FIGURE 3.1: Compression system diagram.

3.2.1 Quantized Compressively Sensed Measurements

In the first stage, the source i to be encoded, xi, is measured, scaled, and dithered according to

yi =1

∆iAxi + wi, (3.2)

where yi ∈ Rm are the measurements, A ∈ Rm×n is a measurement operator, ∆i ∈ R+ is ascaling parameter, and wi ∈ Rm is a dither vector with i.i.d. elements drawn uniformly in[0, 1).

The operator A is a compressive sensing matrix (i.e., it reduces the dimensionality at theoutput to m < n). A popular sensing matrix choice in CS is a random matrix whose entriesare drawn in an i.i.d. manner from a Gaussian distribution. However, when implementing theencoder, there are several practical considerations. In particular, a truly Gaussian matrix A isnot easy to implement because of storage and computation requirements, especially since theencoder is typically implemented in finite-precision arithmetic. Instead, similar to [6,40,41], asa good practical alternative to the Gaussian matrix, we use a binary ±1 matrix, implementedby randomly permuting the signal xi, taking the Walsh-Hadamard transform (WHT) and ran-domly subsampling the output (we refer to this sensing method as the randomized subsampledWHT). The WHT, Hk ∈ {−1, 1}n×n, applied on a signal of size n = 2k, is recursively defined


as

Hk =1√2

[Hk−1 Hk−1

Hk−1 −Hk−1

], (3.3)

where H0 , 1. The WHT can be calculated efficiently using the fast Walsh–Hadamard trans-form (FWHT), which bears a great similarity to the fast Fourier transform used for computingthe discrete Fourier transform. Thus, we have a O(n log n) complexity operator satisfying theRIP [42], instead ofO(nm), which further requires onlyO(n) storage, instead ofO(nm). In ourexperiments, the behavior of this ensemble is similar to the behaviour of a Gaussian ensemble(with respect to the method used for determining the DSC code rates, which assumes the useof a Gaussian sensing matrix; see Section 3.2.3 and Thm. 1 for further details).

In addition to choosing the kind of sensing operator A to use, the encoder must determinethe number of measurements m to acquire. The chosen value of m is one of two parameters,along with ∆i, which determine the compression rate. Reconstructing a signal from a largenumber of measurements m is expected to result in a better quality of reconstruction. On theother hand, as will be discussed in Section 3.2.3, the size of the syndromes used to representthe sources in their compressed form grows linearly inm. Therefore, the trade-off between thecompression rate and the resulting distortion is governed in part by the value of m.

The choice of the scaling parameter ∆i also affects the reconstruction quality and the bitbudget needed to encode xi. In particular, reducing ∆i results in finer measurement quanti-zation which translates to improved reconstruction PSNR at the cost of higher encoding rate.The scaling parameter can be chosen in two ways. One approach is to choose the same ∆ forall the sources (i.e., ∆i = ∆ for all i = 1, 2, . . . , N ), ensuring similar reconstruction qualitybut at a variable rate for each source. Alternatively, we can set ∆i such that all sources usethe same rate but have different reconstruction quality. It is expected that the easier-to-predictsources will consume lower rate in the first scenario or exhibit better reconstruction quality inthe second.

The resulting compression rates in bits-per-pixel (bpp) of the image in Fig. 3.2(A) for dif-ferent values of ∆ are shown in Fig. 3.3. The image was compressed using the randomizedsubsampled WHT in non-overlapping blocks of size 64 × 64. Two different values of m, 2000and 4000, were used. As evident from Fig. 3.3 the compression rate in the former case is dou-ble that of the latter case. This is expected in light of the fact that, as mentioned above, thesyndrome sizes (and hence the size of the compressed sources) grow linearly in m.

The same target compression rate can be attained by distinct combinations of m and ∆,as shown in Fig. 3.4. In this example, both images in Fig. 3.2 were compressed as above (i.e.,in non-overlapping 64 × 64 blocks using the randomized subsampled WHT). The image in


(A) (B)

FIGURE 3.2: (A) A 512 × 512 image. (B) A 512 × 512 image consisting of 64randomly chosen 64× 64 image blocks.

Fig. 3.2(B) was produced by randomly selecting 64 image blocks of size 64×64 from 4 differentsatellite-acquired multispectral images (16 blocks from each image). It should be noted, how-ever, that different parameter combinations producing the same compression rate are likelyto result in different reconstruction PSNR. Therefore, a judicious selection of these parametersis essential for obtaining the best possible results (i.e., achieving the lowest distortion, for agiven compression rate). At this time, this important aspect of the encoding process remainsan open question and will be further discussed in Section 5.3 of the closing chapter. For thepurpose of simulations, good parameter values were obtained experimentally.

The role of the random dither wi is to ensure that the measurements yi are distributed uni-formly within the quantization intervals in which they fall, as well as to make the quantizationerror, Q(yi)−yi, statistically independent of yi [43]. These properties play a central role in ourapproach for determining the required code rates for encoding the sources (see further detailsin the next section).

The acquired measurements are quantized element-wise, using a scalar uniform integerquantizer Q(·). The quantizer rounds the input and represents it using B bits, producing thequantized measurements qi = Q(yi) ∈ Zm. It should be noted that changing the scalar pa-rameter ∆i in (3.2) is equivalent to using unscaled measurements and setting the quantizationinterval to ∆i. We assume that B is selected sufficiently large, such that the quantizer does notsaturate. This can be done in different ways on the basis of what is known about the source


2 4 6 8 10 12 14∆

1

2

3

4

5BPP

M = 2000

M = 4000

FIGURE 3.3: Compression rate in bits-per-pixel of the image Fig. 3.2(A) for dif-ferent values of ∆.

(e.g., the source has a known distribution with finite support) and the used sensing operator.For example, one may use some kind of a concentration inequality to establish a probabilisticguarantee that the quantizer will not saturate. Alternatively, since (as will be discussed in thefollowing sections) eventually only a small number of the least significant bits will need to becompressed, while the higher significant bits can be determined from the prediction, one maysimply choose a large value for B without incurring any penalty.

To make notation lighter, when the index i of a source is not important when consideringits quantized measurements qi, we drop the subscript i and write q. Instead, we use q(k) todenote the bitplanes of the quantized measurements, indexed by k = 1, . . . , B. In other words,in its binary representation, q can be thought of as an m×B binary matrix, whose kth column,q(k) ∈ Fm2 , is a binary vector containing the kth significant bit of allm quantized measurementsof x. We will use the convention that k = 1 and k = B represent least significant bit (LSB) andmost significant bit (MSB), respectively. The following is an example of the above describedbitplane representation with B = 4:

q =

3

0

1

F2=⇒

q(4) q(3) q(2) q(1)

0 0 1 1

0 0 0 0

0 0 0 1


1500 2000 2500 3000 3500 4000m

2

4

6

8

10

∆

512× 512 image

64 random image blocks

FIGURE 3.4: Combinations of m and ∆ resulting in a 2.00 bpp compression ofthe two images in Fig. 3.2.

The used binary representation of the quantized measurements can be chosen in variousways (e.g., two’s complement or signed magnitude). In this work all values were made posi-tive (by appropriately shifting them prior to quantization and shifting back when reconstruct-ing), in order to use a simple binary representation and avoid potential issues related to thesigns of values (e.g., large number of bit errors when a small negative value is predicted as asmall positive value when using signed magnitude representation).

The first stage of the compression is highlighted in red in the system diagram in Fig. 3.5


1∆i





yi qi

FIGURE 3.5: Compression – compressed sensing following by quantization (inred).


3.2.2 Prediction Information

Given correlated signals, one can attempt to predict a signal (or its measurements) from theother ones. In general, the prediction will not be perfect since the relationship between the sig-nals is probabilistic. Furthermore, in practice, the assumed model relating the signals may of-ten be highly simplified and thus inaccurate (e.g., we might assume that the signals are jointlyGaussian (when in fact they are not) in order to predict a signal using an easy-to-calculate lin-ear minimum mean square error (LMMSE) estimator). Nonetheless, the imperfect predictionof a signal serves as the starting point in the decoding process.

Suppose we wish to generate a prediction u of the signal u. We use fu(Λ; ΘΛ) to denotethe prediction function that maps the signals in the set Λ, using the information in the set ΘΛ,to the prediction u of u:

u = fu(Λ; ΘΛ). (3.4)

The exact nature of fu,Λ and ΘΛ is problem-specific, as it depends on the assumed correlationmodel of u and the signals in Λ. For example, if we use an LMMSE estimator to predict thevalues of xi from the side information x0, then Λ = {x0},ΘΛ = {µx0 , µxi , σ

2x0, σx0xi} and

xi = fxi(Λ; ΘΛ) =σx0xi

σ2x0

(x0 − µx01) + µxi1, (3.5)

where 1 is the all-ones vector. Other correlation model choices will be discussed in Chapter 4.In the problem setup that we consider, it is assumed that:

• fu is known to the decoder.

• Λ is be either known or can be calculated at the decoder.

• ΘΛ is available at the decoder as part of the side information.

Since x0 is always used in the prediction process and since, by our problem setting, it is alwaysavailable at the decoder, it follows that x0 ∈ Λ (for all sources). Depending on the particularproblem and prediction model, prediction might require more than just x0. In such cases Λ

will contain other signals, to be calculated at the decoder (an example of such a scenario willbe discussed in Subsection 4.2.1).

In practice, ΘΛ must often be calculated at the encoder in order for it to be available atthe decoder as side information. This of course increases the computational complexity of theencoder. Thus, in such cases it is beneficial to have a relatively simple correlation model whichrelies on a small number of easy to compute statistics.


An essential piece of information that must always be determined at the encoder is theprediction error (i.e., the Euclidean distance between a source and its prediction), εi = ‖xi−xi‖.The prediction error will be needed for determining the code rates required for encoding thebitplanes of the quantized measurements. In this work we wish to avoid the need to directlycalculate the prediction xi at the encoder, in order to limit the encoder’s complexity. Instead,we consider correlation models/estimators that allow calculation of the prediction error in amore simplified way via the information in ΘΛ. For instance, in the aforementioned LMMSEestimator example, the error can be easily calculated as

ε2i = ‖xi − xi‖2 = nE[(xi − xi)2] = n

(σ2x0− σ2

x0xi

σ2x0

). (3.6)

The second stage of the compression is highlighted in red in the system diagram in Fig. 3.6


1∆i





yi qi

FIGURE 3.6: Compression – calculation of prediction statistics (in red).

3.2.3 Syndrome Generation

Aside from determining the prediction information set ΘΛ, the central goal of the encoder isto generate syndromes from the bitplanes q(k) of the quantized measurements of x. Thesesyndromes are used by the decoder to reconstruct x via the DSC procedure described in Sub-section 2.2.3.

Consider the information source x (we ignore the index i in this Subsection) with quantizedmeasurements given by

q = Q (y) = Q

(1

∆Ax + w

). (3.7)

Consider also x’s prediction x = fx(Λ; ΘΛ) and its estimated quantized measurements givenby q. Let pk denote the probability that a bit in q(k) will differ from the bit (at the same position)


in q(k) (i.e., pk is the probability that a predicted bit will be flipped when compared to the orig-inal quantized measurement bit). Furthermore, consider the syndromes s(k) = H(k)q(k) ands(k) = H(k)q(k) for k = 1, . . . , B. The syndromes s(k) are generated at the encoder and com-municated to the decoder. The syndromes s(k) are similarly generated at the decoder basedon q. Following the syndrome decoding procedure outlined in Subsection 2.2.2, the successfulrecovery of q(k) from the syndromes s(k) and s(k) requires the construction of parity-check ma-trices H(k) whose associated code rates allow the correction of the mismatches between q(k)

and q(k). In other words, for each k = 1, . . . , B, the code rate of the error-correcting code withparity-check matrix H(k) must be selected such that the code can correct the errors introducedby a BSC-(pk). As was mentioned in Subsection 2.2.3, for a BSC-(pk), the code rate associatedwith H(k) is given by

CBSC−(pk) = 1−HB(pk). (3.8)

Therefore, to successfully recover q(k) via syndrome decoding, we must be able to determinethe probabilities pk at the encoder (and decoder) in order to construct the proper parity-checkmatrix H(k).

The probabilities pk can be calculated according to the following theorem.

Theorem 1. Consider a signal x measured using a random matrix A with i.i.d. N (0, σ2) entriesaccording to (3.2). Also consider its prediction x with prediction error ε = ‖x − x‖ and assume thatbitplanes k = 1, . . . ,K − 1 have been correctly decoded. Then q(K) can be estimated with probabilityof bit error equal to

pK =1

2−

+∞∑

l=1

e− 1

2

(πσεl

2K−1∆

)2

sinc

(l

2K

)sinc

(l

2

). (3.9)

Proof. Consider a single measurement of the signal x ∈ Rn

y =1

∆〈a,x〉+ w, (3.10)

where a ∈ Rn is the sampling vector, ∆ ∈ R+ is a scaling parameter and w is dither. Theentries of a are distributed in an i.i.d. manner according to N (0, σ2). The dither w is drawnuniformly from [0, 1). Let s ∈

[−1

2 ,12

]denote the position of the measurement y within the

quantization interval in which it falls, with 0 denoting the center. Due to the added dither, swill be distributed uniformly in

[−1

2 ,12

][43] with probability density function

fs(t) = rect(t), (3.11)


where rect(t) is the rectangular function.Let y denote a measurement of x (the prediction of x), measured as in (3.10). Consider

the random variable d = y − y. Given this setup, d will be distributed as N (0,(σε∆

)2) with

probability density function fd.We assume that y is quantized with a uniform scalar quantizer, producing q(k), k = 1, . . . , B,

and that the decoder knows exactly the K − 1 least significant bits, q(k), k = 1, . . . ,K − 1.The estimate of the q(K) is done using the bitplane prediction procedure described in Sec-tion 3.3.2. Therefore, given a measurement y, in order for the estimated Kth bit to be incorrect,the predicted measurement y must lie somewhere in the union of intervals inconsistent withthe measurement (i.e., intervals in which the Kth bit is resolved to the complement of y’s Kth

bit), namely in

⋃

k∈Z

[1

22K−1 + 2k2K−1 + s,

1

22K−1 + (2k + 1)2K−1 + s

](3.12)

=⋃

k∈Z

[2K−2 + k2K + s, 2K−2 + 2K−1 + k2K + s

], (3.13)

where we again consider 0 to be the center of the quantization interval in which s falls. Deter-mination of the error probability of the third bit, p3, is illustrated in Fig. 3.7. In this example,the first 2 bits are determined to be 10, lying in one of the red quantization intervals. For agiven measurement position s, the third bit will be estimated incorrectly if it the predictedmeasurement y lies closer to one of the red quantization interval in the blue regions than inthe white regions. Therefore, to calculate the probability of bit error we must consider the areaassociated with the inconsistent intervals, while considering all the possible positions of s.

We define

g(t) = rect

(t

2K−1

)∗

+∞∑

l=−∞δ(t− l2K

), (3.14)

a rectangular function of width 2K−1, repeated at intervals of 2K .


00

01

10

11

00

01

10

11

01

10

11

00

01

10

11

00

01

10

11

00

01

10

11

FIGURE 3.7: Calculating the error probability p3 by finding the area correspond-ing to the inconsistent intervals (shown in blue).

Given the above setup, the probability that the Kth bit of y will be the same as that of y is

Pr(no flip) =

∫ 12

− 12

Pr(no flip|s = t)fs(t)dt (3.15)

=

∫ +∞

−∞

[∫ +∞

−∞fd(u)g(u− t)du

]fs(t)dt (3.16)

=

∫ +∞

−∞fd(t) [g(t) ∗ fs(t)] dt (3.17)

=

∫ +∞

−∞F {fd(t)} [F {g(t)}F {fs(t)}] dξ, (3.18)

where F {·} denotes the Fourier transform and the last line follows from Parseval’s and theconvolution theorems. We have

F {fd(t)} = F

1√2π(σε∆

)2 e− t2

2(σε∆ )2

= e−2(πσεξ∆ )

2

, (3.19)

F {fs(t)} = F {rect(t)} = sinc (ξ) , (3.20)


and

F {g(t)} = F{

rect

(t

2K−1

)∗

+∞∑

l=−∞δ(t− l2K

)}

(3.21)

= F{

rect

(t

2K−1

)}F{

+∞∑

l=−∞δ(t− l2K

)}

(3.22)

= 2K−1sinc(2K−1ξ)1

2K

+∞∑

l=−∞δ

(ξ − l

2K

)(3.23)

=1

2sinc(2K−1ξ)

+∞∑

l=−∞δ

(ξ − l

2K

), (3.24)

where we use the normalized sinc function, sinc(t) = sin(πt)πt . Therefore, substituting (3.19),

(3.20) and (3.24) into (3.18), we get

Pr(no flip) =1

2

∫ +∞

−∞e−2(πσεξ∆ )

2

sinc (ξ) sinc(2K−1ξ)

+∞∑

l=−∞δ

(ξ − l

2K

)dξ (3.25)

=1

2

(2

+∞∑

l=0

e−2(πσεl

∆2K

)2

sinc

(l

2K

)sinc

(2K−1 k

2K

))(3.26)

=1

2+

+∞∑

l=1

e− 1

2

(πσεl

∆2K−1

)2

sinc

(l

2K

)sinc

(l

2

). (3.27)

Finally, the probability of a bit flip is

pK = 1− Pr(no flip) (3.28)

=1

2−

+∞∑

l=1

e− 1

2

(πσεl

∆2K−1

)2

sinc

(l

2K

)sinc

(l

2

)(3.29)

Fig. 3.8 shows empirical and theoretical bit error probabilities as a function of the predic-tion error. In this example, varying the (normalized) prediction error εσ

∆ , measurements andtheir predictions were generated such that their distributions match those expected by Thm. 1(i.e., assuming a Gaussian sensing matrix). The empirical values are calculated on the basis ofthe mismatch between the quantized measurements and their predicted counterparts, follow-ing the bitplane prediction of the 3rd bit (assuming the first 2 have already been recovered).


As evident in the figure, the empirical probabilities match the ones predicted by Thm. 1 quiteclosely.

0 2 4 6 8 10εσ∆

0.0

0.1

0.2

0.3

0.4

0.5

p K

pK (actual)

pK (theoretical)

FIGURE 3.8: Empirical and theoretical bit error probabilities for quantized mea-surements, acquired using a Gaussian sensing matrix, as a function of the nor-

malized prediction error εσ∆ for K = 3.

As previously mentioned, more structured sensing matrices, rather than the Gaussian ma-trix, can be better suited in resource constrained environments. In particular, in this work weuse the randomized subsampled WHT described in Section 3.2.1. Acquiring measurements us-ing this method appears to be similar to using a Gaussian matrix with i.i.d entries distributedas N (0, σ

2

n ) where n is the dimension of the acquired signal. More precisely, if H : Rn 7→ Rm

is the randomized subsampled WHT operator and A ∈ Rm×n is a matrix with i.i.d entriesdistributed as N (0, σ

2

n ), then

H(x) ∼ Ax ∼ N (0,σ2‖x‖2n

), (3.30)

where ∼ is meant to indicate that the distribution of the random variables is approximatelythe same. This assertion is validated in Fig. 3.9.

Therefore, when using the randomized subsampled WHT, the error probabilities can becalculated using Thm. 1 by replacing σ with σ

n . Empirical and theoretical bit error probabili-ties as a function of the prediction error are shown in Fig. 3.10. As evident in the figure, theempirical and theoretical probabilities are, once again, well matched.


−1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.000.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00Best Gaussian fit

(N (0, 0.9998σ

2‖x‖2n )

)

Equivalent Gaussian sensing matrix(N (0, σ

2‖x‖2n )

)

Randomized subsampled WHT measurements

FIGURE 3.9: Histogram of measurements acquired using the randomized sub-sampled WHT operator, a Gaussian fit using the empirical mean and variance

of the measurements and the equivalent Gaussian sensing matrix statistics.

The behavior of the bit error probability as a function of the normalized prediction errorεσ∆ for different values of K − 1 correctly decoded bitplanes is shown in Fig. 3.11. As demon-strated by the figure, the error probability decreases sharply as the number of correctly recov-ered bitplanes increases. Therefore, at a certain point, once enough bitplanes are recovered,subsequent bitplanes can be reliably recovered through prediction only and need not be fur-ther corrected using syndrome decoding. Consequently, not all B bitplanes q(k) need to beencoded as syndromes.

The cutoff probability pcutoff below which a bitplane will not be encoded can be chosenheuristically in various ways. For example, we may choose a fixed probability (e.g., pcutoff =

0.001) or one that is reciprocal to the code block length, pcutoff = 1cm , where c ∈ R is chosen to

“control" the expected frequency of bit errors in non-encoded bitplanes. Alternatively, sincebitplanes are decoded sequentially with the assumption that the previous biplanes were cor-rectly recovered, we may consider assigning different cutoffs to each bitplane. In particular, inorder to prevent propagation of errors across the bitplanes, cutoffs can be assigned in a pro-gressive manner going from lower to higher values as we go from the LSB to the MSB (e.g.,pcutoffk = 0.001k or pcutoff

k = kcm ), thereby lower significance bits are treated more conservatively

than higher significance bits.The number of bitplanes that need to be coded as a function of the normalized prediction


0 2 4 6 8 10εσ∆

0.0

0.1

0.2

0.3

0.4

0.5

p K

pK (actual)

pK (theoretical)

FIGURE 3.10: Empirical and theoretical bit error probabilities for quantized mea-surements, acquired using a randomized subsampled WHT operator, as a func-

tion of the normalized prediction error εσ∆ for K = 3.

error is shown in Fig. 3.12 for three different cutoff probabilities. Note that the number ofbits to encode decreases as ∆ increases, hence improving the compression rate. On the otherhand, however, increasing ∆ is equivalent to using a coarser scalar quantizer, which results inhigher distortion (i.e., lower reconstruction PSNR). Hence, the ∆ parameter, which is one ofthe algorithm’s design parameters, plays a key role in controlling the rate-distortion trade-off.

To conclude, during the syndrome generation stage, the bit flip probabilities pk of the Bquantized bitplanes q(k) are calculated according to (3.9). Bitplanes whose error probabilityis greater than pcutoff (or pcutoff

k ) are encoded in the form of syndromes s(k) = H(k)q(k) withparity-check matrices H(k) of codes with rates (less than or equal to) 1−HB(pk).

The syndrome generation stage of the compression is highlighted in red in the systemdiagram in Fig. 3.13

3.2.4 Encoding Complexity

Acquisition of the compressive measurements in the first compression stage can be efficientlyperformed using compressive operators such as random partial Fourier matrices or the afore-mentioned randomized subsampled WHT. For example, as previously mentioned, the ran-domized subsampled WHT can be applied using the fast WHT which has computational com-plexity of O(n log n). The measurements are quantized using a O(n) scalar quantizer.


0 2 4 6 8 10εσ∆

0.0

0.1

0.2

0.3

0.4

0.5

p K

K = 1

K = 2

K = 3

K = 4

K = 5

FIGURE 3.11: Bit error probability pK as a function of the normalized predictionerror εσ∆ for different values ofK (i.e., assuming thatK−1 bitplanes have already

been correctly recovered).

The complexity associated with calculating the prediction information in the second stagedepends on the chosen prediction model. Ideally, we wish to use a low-complexity predictionapproach which, for example, requires only first and second order source statistics, in whichcase the complexity has order O(n).

The generation of syndromes in the final stage involves multiplication of the parity-checkmatrices Hk of size m(1 − Rk) ×m (where Rk ∈ [0, 1] is the code rate used to encode the kth

bitplane) by the vectors of quantized measurements q(k) of size m. This implies that in theworst case (i.e., when R ≈ 0), the complexity has order O(m2). However, in practice, struc-tured parity-check matrices can be used to generate syndromes more efficiently. For instance,one may choose to use the extremely sparse low-density parity check matrices to produce syn-dromes with complexity of∼ O(m). Obtaining the required code ratesRk involves calculatingthe bit error probabilities pk according to Thm. 1, which can be done at constant time.

Following the examination of the computational complexities of each of the stages, weconclude that compressive measurement acquisition is the principal source of complexity dur-ing encoding. Therefore, the proposed method establishes an encoder the has computationalcomplexity as low as O(n log n).


0 2 4 6 8 10εσ∆

0

1

2

3

4

5

6

Num

ber

ofbi

tpla

nes

toco

de

pcutoff = 0.001

pcutoff = 0.005

pcutoffk = 0.001k2

FIGURE 3.12: The number of bitplanes that need to be coded as syndromes as afunction of the normalized prediction error εσ

∆ .


1∆i





yi qi

FIGURE 3.13: Compression – syndrome generation (in red).




+

Prediction1

∆iAxi + wi Q(·)

EncoderSyndromedecoding

Bitplaneprediction

Decoder

Sourcereconstruction

Decompressedsource: xi

s(k)

s(k)

qi

s(k) + s(k)

qiyixi

qi

FIGURE 3.14: Decompression system diagram.

3.3 Decoding

The decoder has available the syndromes (generated from the bitplanes of the quantized mea-surements) and side information (i.e., the source x0 and prediction information set ΘΛ). De-coding, or decompression, consists of the following three stages:

• Generation of source predictions from the side information followed by their measure-ment and quantization.

• Recovery of the quantized measurements via bitplane prediction and syndrome decod-ing.

• Reconstruction of the sources from the quantized measurements via sparse optimization.

A system diagram of the decompression process is shown in Fig. 3.14

3.3.1 Source Prediction, Measurement and Quantization

Depending on the prediction method used, for any given source xi, the decoder generates aprediction xi of the source, or of its measurements yi. In the first case, a prediction of a sourcexi is obtained, as discussed in Section 3.2.2, according to

xi = fxi(Λi; ΘΛi). (3.31)

Measurements of the predictions are then acquired as

yi =1

∆iAxi + wi, (3.32)


where A,∆i and wi are the same as used by the encoder to obtain measurements yi of xi. Inthe second case, we obtain the predictions of the measurements directly as

yi = fyi(Λi; ΘΛi). (3.33)

Finally, obtained from either (3.32) or (3.33), the predicted measurements are quantizedusing the same quantizer as at the encoder to produce the predicted quantized measurementsqi = Q(yi).

As previously mentioned, the decoder is assumed to know Λ, or can calculate it. Whenprediction relies only on x0, then Λ = {x0}. In other cases, when prediction requires someother signals, the decoder will calculate these as needed. For example, a prediction of a sourcexL might require measurements of x0 and quantized measurements of the sources xi, i =

1, . . . , L − 1, in which case ΛL = {x0,q1,q2, . . . ,qL−1}. An example of such a predictionmethod will be discussed in the Subsection 4.2.1.

The first stage of decompression is shown in red in Fig. 3.15



+

Prediction1

∆iAxi + wi Q(·)


Bitplaneprediction

Decoder



s(k)

s(k)

qi

s(k) + s(k)

qiyixi

qi

FIGURE 3.15: Decompression – prediction followed by compressed sensing andquantization (in red).

3.3.2 Recovery of Quantized Measurements

The decoder reconstructs the original quantized measurements qi from the predicted quan-tized measurements qi. To do so, the decoder alternates between bitplane prediction and syn-drome decoding until all the quantized measurement bitplanes (for each source) are recoveredexactly. In the following, we once again drop the subscript i and consider the recovery of thequantized measurements q of some source x.

The quantized measurements are iteratively recovered starting with the least significantbitplane k = 1. At iteration k, a new estimate of the quantized measurements q is computed,incorporating all the new information from the previous iterations. From that estimate, the


kth bitplane, q(k), is extracted and corrected using the syndrome s(k) to recover the correctedbitplane q(k). If the syndrome has been properly designed at the correct rate, decoding issuccessful with high probability and q(k) = q(k). The stages for recovering quantized mea-surements are illustrated in Fig. 3.16.

qq(1:k−1)

q(k) q(k)

bitplaneprediction

syndromedecoding

(H(k), s(k))

FIGURE 3.16: Recovering quantized measurements of bitplane k from the quan-tized predicted measurements q = Q(y) and the previously recovered k− 1 bits

q(1:k−1).

In particular, for k = 1, the corrected least significant bitplane, q(1), is obtained by correct-ing the mismatch between q(1) and q(1) using the syndrome s(1). For k > 1, assuming k − 1

bitplanes have been successfully decoded, q is estimated by selecting the uniform quantizationinterval consistent with the decoded k − 1 bitplanes and closest to the prediction y. Havingcorrectly decoded the first k − 1 bitplanes is equivalent to the signal being encoded with a(k − 1)-bit universal quantizer [7]. Thus, recovering q is the same as the decoding performedin [6].

An example of k − 1 = 2 is shown in Fig. 3.17. The left hand side of the figure plots a 2-bituniversal quantizer, equivalent to a uniform scalar quantizer with all but the 2 least significantbits dropped. The right hand side shows the corresponding 3-bit uniform quantizer used toproduce q. In this example, the two least significant bits decode to the universal quantizationvalue of 1, which could correspond to q = 1 or−3 in the uniform quantizer. However, the pre-diction of the measurement y is closer to the interval corresponding to q = −3, and, thereforeq = −3 is recovered.

Formally, the estimate for q is

q = 2k(q(k:B) + c) + q(1:k−1), (3.34)

where the elements of c, cj ∈ {−1, 0, 1}, j = 1, . . . ,m are chosen to minimize the distance of(q)j to (y)j .

Finally, the kth bitplane q(k) is corrected by syndrome decoding to produce the correctedestimate q(k). As long as the syndrome satisfies the rate conditions of Thm. 1, the decoding


1Δ 2Δ 3Δ-1Δ-2Δ-3Δ

y

Q(y)

123

-1

-3-2

-4Δ

4Δ

Prediction from side information

Recovered coefficient value

1 2 3-1-2-3 y

Q(y)

123

-4 4

… …

Prediction from side information

FIGURE 3.17: Bitplane prediction of a quantization point using the predictionmeasurement.

is reliable. Decoding continues iteratively until all B bitplanes have been decoded. Afterdecoding each bitplane, the next bitplanes become increasingly reliable. At some point in thedecoding process the remaining bitplanes are sufficiently reliable that q will stop changingfrom iteration to iteration, and decoding can stop early. At this point no additional syndromesare transmitted. In our experiments, with the parameters used, typically 1 or 2 least significantbitplanes were transmitted as is (i.e., with a rate 0 code). A maximum of 3 additional bitplaneswere transmitted for which syndromes were required (i.e., for which the rate was greater than0 but less than 1).

Conceptually, the described syndrome coding procedure bears a similarity to multilevelcoding schemes where information is encoded using a number of channel codes of differentrates and decoding proceeds in a multistage fashion where each bitplane, starting from theLSB, is decoded by incorporating information from preceding stages [44].

The recovery of the quantized measurements via bitplane prediction and syndrome decod-ing is shown in red in Fig. 3.18

Calculating LDPC Likelihoods to Improve Syndrome Decoding

When LDPC codes are used to perform error-correction using belief propagation, the decodingis initialized with likelihoods associated with each of the bits to be decoded. The likelihoodis the prior probability of a certain bit taking the value 0 versus 1. In our case, the likelihoodcorresponds to the probability that the predicted bit in question will be in error (when com-pared to the same bit of the measured signal). One option is to set the likelihood for all the bitsof a given bitplane k to its associated bit flip probability pk. Another approach is to use the




+

Prediction1

∆iAxi + wi Q(·)


Bitplaneprediction

Decoder



s(k)

s(k)

qi

s(k) + s(k)

qiyixi

qi

FIGURE 3.18: Decompression – recovery of the quantized measurements (inred).

measurements of the predictions, y, to estimate the likelihood for each bit individually. Thisapproach has been shown experimentally to improve the LDPC decoding process. In particu-lar, it allows the use of higher code rates (when compared to the first approach) to successfullyperform belief propagation decoding. In turn, the higher code rates translate to smaller syn-drome sizes and hence increased compression rate. The following theorem is used to calculatethe likelihoods for the second approach.

Theorem 2. Consider a signal x and its prediction x with prediction error ε = ‖x − x‖. Considerthe single measurements y and y of x and x, respectively, measured using a random matrix A withi.i.d. N (0, σ2) entries according to (3.2). Assume that bitplanes k = 1, . . . ,K − 1 have been correctlydecoded. Furthermore, let c(y, K − 1) = c denote the smallest distance from y to the center of thequantization interval consistent with the correct K − 1 LSBs. Then the likelihood of error of the Kth

bit of y can be estimated as

LK(y) = Pr(Kth bit flipped | y, correct K − 1 LSBs) (3.35)

=A2(K − 1, c)

A1(K − 1, c) +A2(K − 1, c), (3.36)

where

A1(K − 1, c) =1

2K

(1 + 2

+∞∑

k=1

e− 1

2

(πσεk

2K−1∆

)2

cos

(πck

2K−1

)sinc

(k

2K

))(3.37)

and A2(K − 1, c) = A1(K − 1, 2K−1 − c).

Proof. The proof is similar to the proof of Thm. 1. Consider the random variable d = y − y.Given the measurement procedure in (3.2), dwill be distributed asN (0,

(σε∆

)2) with probability

density function fd.


Let A1(K − 1, c) = A1 denote the area under fd(x) associated with the consistent quantiza-tion intervals that result in the prediction of the Kth bit (e.g., if we end up predicting that theKth bit is 1, then we consider all the consistent intervals that would result in this prediction).Similarly, let A2(K − 1, c) = A2 denote the area under fd(x) associated with the consistentquantization intervals that result in the complement of the prediction of the (B + 1)th bit. ForA1, these consistent intervals repeat every 2K (the same holds for A2). For example, Fig. 3.19illustrates the case when K − 1 = 2. In this instance, the first two bits are assumed to be00. The quantity c extends from the predicted measurement y to the center of the closest 00

quantization interval (shown in blue).

A1A2

FIGURE 3.19: Calculating L3(y) by determining the areasA1 (blue) andA2 (red).

We can estimate the likelihood of the Kth bit as

LK(y) = Pr(Kth bit flipped | y, correct K − 1 LSBs) =A2

A1 +A2. (3.38)

We define

g(t) = rect(t) ∗+∞∑

k=−∞δ(t− k2K

), (3.39)


a rectangular function repeated at intervals of 2K . We then have

A1(K − 1, c) =

∫ +∞

−∞fd(t)g(t− c)dt (3.40)

=

∫ +∞

−∞F {fd(t)}F {g(t− c)} dξ, (3.41)

where F {·} denotes the Fourier transform and the second line follows from Parseval’s theo-rem. We have

F {fd(t)} = F

1√2π(σε∆

)2 e− t2

2(σε∆ )2

= e−2(πσεξ∆ )

2

(3.42)

and

F {g(t− c)} = F {g(t)} e−i2πξc (3.43)

= F{

rect(t) ∗+∞∑

k=−∞

(t− k2B+1

)}e−i2πξc (3.44)

= F {rect(τ)}F{

+∞∑

k=−∞

(t− k2K

)}e−i2πξc (3.45)

= e−i2πξcsinc(ξ)1

2K

+∞∑

k=−∞δ

(ξ − k

2K

). (3.46)

Substituting (3.42) and (3.46) into (3.41) we get

A1(K − 1, c) =

∫ +∞

−∞e−2(πσεξ∆ )

2

e−i2πfcsinc(ξ)1

2K

+∞∑

k=−∞δ

(ξ − k

2K

)dξ (3.47)

=1

2K

∫ +∞

−∞e−2(πσεξ∆ )

2

[cos(2πξc)− i sin(2πξc)] sinc(ξ)+∞∑

k=−∞δ

(ξ − k

2K

)dξ

(3.48)

=1

2K

∫ +∞

−∞e−2(πσεξ∆ )

2

cos(2πξc)sinc(ξ)

+∞∑

k=−∞δ

(ξ − k

2K

)dξ (3.49)

=1

2K

(1 + 2

+∞∑

k=1

e−2(πσε∆

k

2K

)2

cos

(2πc

k

2K

)sinc

(k

2K

))(3.50)

=1

2K

(1 + 2

+∞∑

k=1

e− 1

2

(πσεk

2K−1∆

)2

cos

(πck

2K−1

)sinc

(k

2K

)). (3.51)


Similarly,

A2(K − 1, c) = A1(K − 1, 2K−1 − c) (3.52)

=1

2K

(1 + 2

+∞∑

k=1

e− 1

2

(πσεk

2B∆

)2

cos

(π(2K−1 − c)k

2K−1

)sinc

(k

2K

))(3.53)

=1

2K

(1 + 2

+∞∑

k=1

e− 1

2

(πσεk

2K−1∆

)2

cos(πk(

1− c

2K−1

))sinc

(k

2K

)). (3.54)

Fig. 3.20 demonstrates the relationship between the error likelihood Lk as a function ofthe distance parameter c for K = 3 and different values of normalized prediction error εσ

∆ .As evident in the figure, the error likelihood is maximized and equals 0.5 at c = 23

2 = 4.This corresponds to the scenario when the prediction y falls exactly in the middle between thetwo possible consistent quantization intervals (i.e., exactly in between the closest A1 and A2

regions, in which case c = 2K

2 = 2K−1). In that case, there will be maximum uncertainty in theKth bit’s predicted value, resulting in the maximum likelihood of error.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

c

0.0

0.1

0.2

0.3

0.4

0.5

L K

(σε)/∆ = 0.5

(σε)/∆ = 2.0

(σε)/∆ = 3.5

(σε)/∆ = 5.0

(σε)/∆ = 6.5

(σε)/∆ = 8.0

FIGURE 3.20: The bit error likelihood LK as a function of the distance parameterc for K = 3.


3.3.3 Source Reconstruction

Once all bitplanes have been successfully decoded, the quantized measurements q (whichare equal to q, provided that the previous stage was successful) are used to reconstruct thesources. At a high level, we are dealing with the standard quantized compressed sensingscenario discussed in Section 2.1.5. Namely, the reconstruction of a signal x from its quantizedcompressive measurements q.

If x is sparse, one can perform recovery by solving the Lasso problem

minx

∥∥q− 1

∆Ax−w

∥∥2

2+ λ‖x‖1. (3.55)

More generally, the reconstruction problem can be written as

minx

∥∥q− 1

∆Ax−w

∥∥2

2+ λR(x), (3.56)

whereR(·) is a regularizer function. The role of the regularizer function is to promote solutionsthat exhibit properties consistent with the signal model of x.

When x is known to be sparse, the regularizer can be chosen as the sparsity inducing l1

norm,R(x) = ‖x‖1, resulting in the Lasso problem (3.55). At other times, the regularizer maybe chosen to promote sparsity in another domain. For example, images tend to vary relativelyslowly (i.e., neighboring pixels tend to have similar values). Hence, images are consideredto be sparse (or compressible) in terms of their gradients. A popular regularizer for imagereconstruction, which promotes solutions with sparse gradients, is the weighted total variation(WTV) regularizer, defined as

WTV (X) =∑

s,t

√W xs,t(Xs,t −Xs−1,t)2 +W y

s,t(Xs,t −Xs,t−1)2, (3.57)

where X is a 2D image, W x and W y are 2D sets of weights, and (s, t) are image coordinates.Larger weights penalize edges more in the respective location and direction, while smallerweights reduce their significance. When all the weights are set to 1, the regularizer is knownas total variation (TV).

The role of the parameter λ ∈ R is to adjust the trade-off between the fidelity of the solution,captured by the term

∥∥q − 1∆Ax − w

∥∥2

2, and the degree to which the solution exhibits the

property promoted by the regularizer.


The reconstruction problem (3.56) can be efficiently solved using proximal gradient methods[45]. These iterative methods are used to solve problems of the form

minx

F (x) +R(x), (3.58)

where F (·) andR(·) are convex and F (·) is differentiable, through the iterations

xt+1 := proxδkR

(xt − δk∇F (xt)

), (3.59)

where we used a superscript to indicate the iteration number and where

proxδR(y) = arg minx

(R(x)− 1

2δ

∥∥x− y∥∥2

2

)(3.60)

is known as a proximal operator. When solving the Lasso problem (3.55), the algorithm is com-monly known as the iterative shrinkage-thresholding algorithm (ISTA), or in its accelerated formas the fast ISTA (FISTA) [46].

Another possible approach for solving (3.56) is through generalized approximate message pass-ing (GAMP) [47]. Generalizing an earlier work on approximate message passing (AMP) [48]to a larger family of problems, GAMP is message passing-inspired method for solving inverseproblems. The use of GAMP for solving reconstruction problems such as (3.56) was consid-ered, for example, in [49].

The process of source reconstruction from the recovered quantized measurements is high-lighted in red in the decompression system diagram in Fig. 3.21



+

Prediction1

∆iAxi + wi Q(·)


Bitplaneprediction

Decoder



s(k)

s(k)

qi

s(k) + s(k)

qiyixi

qi

FIGURE 3.21: Decompression – reconstruction of the compressed source (in red).

43

Chapter 4

Results

In this section we demonstrate an application of the developed method to the compression ofmultispectral images. Multispectal images are compromised of four to eight spectral bands,which are typically highly correlated. Such images are often acquired on board satellites forthe purpose of various remote sensing applications such as mineral exploration, surveillanceand cartography, among many others. Advances in modern acquisition technology resultedin the increase of the amount and size of such imaging data. Efficient transmission of thedata requires that it first be compressed. The scarcity of resources, such as computationalpower and memory, on satellites makes existing transform coding-based approaches, suchas JPEG-2000, unsuitable for the task. Hence, rate-efficient compression algorithms with low-complexity encoders are of interest. Methods for multispectal images compression with a low-complexity encoder were explored in [2–6,40]. More recently, the use of the method developedin this thesis, but without the use of the likelihoods described in Section 3.3.2, was presentedin [41].

As a benchmark we use the results in [6] where a similar complexity encoder was used un-der the same settings to compress 4-band images acquired by the ALOS satellite [1]. Similarlyto [6], we chose the blue band to be available at the decoder as side information. Therefore, inaccordance with the notation used in the previous chapter, x0 represents the blue band, whilexi, i = 1, 2, 3, are the green, red and infrared bands, respectively. We performed tests on theentire 7040 × 7936 image, as well as on a 512 × 512 section from the same image, shown inFig.4.1, which was deemed to be challenging to compress. In each case, the rate (in bpp) waschosen through the choice of the ∆ parameter, to match the rates used by the method in [6].The images were compressed/decompressed in non-overlapping blocks of size n = 64 × 64.We present two variations of the developed algorithm, using two different models for predict-ing the image bands at the decoder. The first model, which we call linear prediction, is the sameas in [41]. We call the second model successive prediction.

Chapter 4. Results 44

FIGURE 4.1: A 512 × 512 × 4 multispectral image used for testing of the com-pression method (acquired by the AVNIR-2 instrument on board of the ALOS

satellite [1])

4.1 Encoding

A total of m = 4000 measurements of each image band, xi, i = 1, 2, 3, are obtained accordingto (3.2), using the randomized subsampled WHT, described in Section 3.2.1. Despite provid-ing very minimal dimensionality reduction of the n = 4096 dimensional signal, the choice touse m = 4000 measurements was observed experimentally to yield the best rate-distortiontrade-off. Although in this particular case using a relatively large number of measurementsappears advisable, such a behaviour may not be universal and can be (at least to some degree)attributed to the relatively small signal dimensions (which, in turn, result in short channelcode syndromes). As mentioned above, the scaling parameter ∆i is chosen in each case to ob-tain a certain target compression rate (further details will be discussed later, when the resultsare presented). The measurements are then quantized to B = 11 bits. The prediction errors,εi, are determined and syndromes are generated as described in Section 3.2.3. Due to the com-plexity of generating arbitrary code rates on the fly, we maintain a database of LDPC codescorresponding to the rates 0.05, 0.1, . . . , 0.95. Furthermore, due to the effects of finite blocklengths (in this case the length is m = 4000), we select a code whose rate is 0.05 lower than theone that is closest to capacity (e.g., if for some bitplane k, the error probability is pk = 0.11, theresulting channel capacity is C = 0.5, in which case we choose the rate 0.45). The choice tobackoff by one code rate is heuristic. More sophisticated approaches can result in better coderate selection (and hence better compression rates), however from our observations, the added


complexity would be significant while providing relatively limited benefit.As mentioned above, we consider two approaches to generate predictions at the decoder:

linear and successive. Each approach requires different sets Λ and ΘΛ to be generated (orpartially generated) by the encoder. The details are discussed in the following section.

4.2 Decoding

At the beginning of the decoding, the decoder has available:

• The syndromes of the encoded bitplanes of each of the image bands xi, i = 1, 2, 3.

• The side information: the reference blue band x0, as well as Λ and ΘΛ.

Depending on the prediction method used, the sets Λ and ΘΛ may be only partially availableinitially and will need to be determined progressively as decoding unfolds.

The decoding process begins with the prediction of the image bands or their measure-ments. The measurements of the predicted bands are then used to recover the quantized mea-surements by alternating between bitplane prediction and syndrome decoding, as describedin Section 3.3.2. Finally, the images are reconstructed through WTV optimization.

4.2.1 Prediction Methods

The prediction of the image bands (or their measurements) was done using two different ap-proaches: linear and successive.

Linear Prediction

In the case of linear prediction, each of the three bands xi, i = 1, 2, 3, is predicted from the blueband x0 using a simple linear minimum mean squared error (LMMSE) estimator

xi =σx0xi

σ2x0

(x0 − µx01) + µxi1, (4.1)


where 1 is the all-ones vector. Therefore, in this case Λ = {x0} andΘΛ = {µx0

, µxi , σ2x0, σx0xi}. The statistical parameters in ΘΛ are calculated according to

µxi =1

n

n∑

j=1

(xi)j , (4.2)

σ2x0

=1

n

n∑

j=1

((x0)j − µx0)2 , (4.3)

σx0xi =1

n

n∑

j=1

((x0)j − µx0) ((xi)j − µxi) . (4.4)

The parameters µx0and σ2

x0can be calculated at the decoder from the side information. The

parameters µxi and σx0xi for i = 1, 2, 3 are calculated at the encoder and transmitted to thedecoder. The overhead due to transmission of these parameters is very small. For example,if we use 8 bits per parameter, we need to transmit a total of 6 parameters (i.e., 3 covariancesand 3 means) per 64× 64 block, resulting in an overhead of 48 bits (or 0.0039 bpp).

The prediction error for each band can be calculated as

ε2i = ‖xi − xi‖2 = nE[(xi − xi)2] = n

(σ2x0− σ2

x0xi

σ2x0

). (4.5)

A diagram of the whole system which uses the linear prediction method is shown inFig. 4.2.

1∆i

Axi + wi Q(·)

Reference band x0

Means,Variances,Covariances

Side information (Λ,ΘΛ)

Encoder

Linearprediction

1∆i

Axi + wi Q(·)

Bitplaneprediction

Encoder

+

Syndromedecoding

Imagereconstruction

(WTV)

Compression Decompression

Decoder

xi yi qi

xi yi

qi

qi

syndromes (si)

syndromes (si) error syndromes (serrori)

qi xi

FIGURE 4.2: System diagram for the proposed compression-decompressionmethod using linear prediction.


Successive Prediction

In contrast to the linear prediction which uses a single band x0 to predict each of the otherbands, successive prediction leverages the results of the decoding of all the precedent bandsto make better predictions.

Consider the measurements of band i acquired according to (3.2)

yi =1

∆iAxi + wi. (4.6)

Consider also the pre-dithered measurements

yi = yi −wi, (4.7)

and the quantized measurements with dither removed after quantization

yqi = Q(yi)−wi (4.8)

where we used a tilde to indicate that the dither was removed. Since the entries of the ditherwere drawn uniformly in [0, 1), the entries of the quantization error

ei = Q(yi)− yi = yqi − yi, (4.9)

will be independent of the entries of yi and uniformly distributed in [−12 ,

12) [43]. The statistics

of the i.i.d quantization error entries, denoted by the random variableEi, are therefore µEi = 0

and σ2Ei

= 112 .

Following the successful decoding of all the bitplanes of band i, the quantized measure-ments Q(yi) are recovered (and hence also yqi , since the dither wi is assumed to be known bythe decoder). The idea of successive prediction is to predict the measurements of theKth bandusing the recovered quantized measurement of all the bands Q(yi), i = 1, . . . ,K−1, as well asthe side information x0. Once again we use an LMMSE estimator to predict the pre-ditheredmeasurements of the Kth band as

(yK)j

= CTK,K−1C−1K−1

(y0)j − µy0

(yq1)j − µyq1...(

yqK−1

)j− µyqK−1

+ µyK , j = 1, . . . ,m (4.10)


where

CK,K−1 =

σyK y0

σyK yq1...

σyK yqK−1

, (4.11)

and

CK−1 =

σ2y0

σy0yq1

. . . σy0yqK−1

σyq1y0σ2yq1

. . . σyq1yqK−1

......

. . ....

σyqK−1y0σyqK−1y

q1

. . . σ2yqK−1

. (4.12)

The mean squared error (MSE) is then

MSEK = E[(yK − yK)2] =1

m‖yK − yK‖2 = σ2

yK− CTK,K−1CK−1CK,K−1. (4.13)

The dithered prediction is obtained by adding the dither to the pre-dithered prediction

yK = yK + wK . (4.14)

The prediction error is then

ε2yK = ‖yK − yK‖2 = ‖yK − yK‖2. (4.15)

In order to calculate the code rates for the syndromes as per Thm.1, we need determine thedistance εxK = ‖xK − xK‖ between a source xK and its prediction xK . We use the RIP (2.3) torelate the distance of the measurements and their predictions, εyK , to εxK as follows

ε2xK ≈n∆2

mε2yK . (4.16)

Note that (4.10) implies that the empirical statistical parameters need to be calculated fromnon-quantized measurements (e.g., µyK ), quantized measurements (e.g., µyqK−1

) and a mix ofboth (e.g., σy0y

q1). However, the addition of the uniform dither allows us to relate the non-

empirical statistics of quantized measurements to the non-quantized ones. For any k, let Ykand Y q

k denote the random variables whose distribution matches the distribution of the i.i.d


entries of yk and yqk, respectively. Then

µY qj

= E[Y qj ] = E[Yj + Ej ] = E[Yj ] + E[Ej ] = E[Yj ] = µ

Yj. (4.17)

and

σ2Y qj

= Var[Y qj ] = Var[Yj + Ej ] = Var[Yj ] + Var[Ej ] + 2Cov[Yj , Ej ] = σ2

Yj+

1

12, (4.18)

where Cov[Yj , Ej ] = 0 since Ej and Yj are independent. Furthermore,

σY qi Yj

= Cov[Y qi , Yj ] (4.19)

= E[Y qi Yj ]− E[Y q

i ]E[Yj ] (4.20)

= E[(Yi + Ei)Yj ]− E[Yi]E[Yj ] (4.21)

= E[YiYj ] + E[EiYj ]− E[Yi]E[Yj ] (4.22)

= E[YiYj ] + E[Ei]E[Yj ]− E[Yi]E[Yj ] (4.23)

= E[YiYj ]− E[Yi]E[Yj ] (4.24)

= σYiYj

, (4.25)

and similarly

σY qi Y

qj

= Cov[Y qi , Y

qj ] (4.26)

= E[Y qi Y

qj ]− E[Y q

i ]E[Y qj ] (4.27)

= E[(Yi + Ei)(Yj + Ej)]− E[Yi]E[Yj ] (4.28)

= E[YiYj ] + E[EiYj ] + E[Ej Yi] + E[EiEj ]− E[Yi]E[Yj ] (4.29)

= E[YiYj ] + E[Ei]E[Yj ] + E[Ej ]E[Yi] + E[Ei]E[Ej ]− E[Yi]E[Yj ] (4.30)

= E[YiYj ]− E[Yi]E[Yj ] (4.31)

= σYiYj

. (4.32)


We can therefore approximate the empirical quantities involving quantized measurements us-ing non-quantized quantities, as follows

µyq ≈ µy, (4.33)

σ2yq ≈ σ2

y +1

12, (4.34)

σyqi yj ≈ σyiyj , (4.35)

σyqi yqj≈ σyiyj . (4.36)

Consequently, in the case of successive prediction, to predict the measurements of theK bandswe have ΛK = {y0,y

q1, . . . ,y

qK−1}. The measurements y0 (or equivalently x0) must be made

available to the decoder as side information, whereas yq1, . . . ,yqi are recovered by the decoder

prior to predicting yi+1. The minimal set of required statistics (assuming that we use theapproximations (4.33)–(4.36)) is ΘΛK = {µyi , σ2

yi, σyiyj : i, j = 0, 1, . . . ,K, i 6= j} \ σ2

yK. In the

case of the considered 4-band multispectral image, this set consists of 13 statistical parameters(i.e., 4 means, 3 variances and 6 covariances). Note, however, that since the reference blueband (i.e., y0 or x0) is available at the decoder, its mean µy0

and variance σ2y0

may be calculateddirectly at the decoder and hence need not be transmitted by the encoder. A diagram of thewhole system which uses the successive prediction method is shown in Fig. 4.3.

1∆i

Axi + wi Q(·)

Reference band x0

Means,Variances,Covariances

Side information (Λ,ΘΛ)

Encoder

Successiveprediction

Q(·) Bitplaneprediction

Encoder

+

Syndromedecoding

Imagereconstruction

(WTV)

Compression Decompression

Decoder

xi yi qi

yi

qi

qi

syndromes (si)

syndromes (si) error syndromes (serrori)

qi

q1, . . . , qi−1

xi

FIGURE 4.3: System diagram for the proposed compression-decompressionmethod using successive prediction.


4.2.2 WTV optimization

Once all bitplanes have been successfully decoded, the quantized measurements yqi , i = 1, 2, 3,

are used to reconstruct the image by solving

xi = arg minx

∥∥yqi −1

∆iAx−wi

∥∥2

2+ λWTV (x), (4.37)

where λ = 0.1 was tuned experimentally using a small part of the data. The weight for theWTV regularizer (see (3.57)) at the image pixel at coordinate (x, y) = (s, t) is chosen accordingto

W xs,t = W y

s,t =

{0.2, if Φ(X0s,t) > T,

1, otherwise,(4.38)

where X0s,t is the value of the image pixel of the reference band x0 at coordinate (x, y) = (s, t),Φ(Xks,t) is the the norm of the 2D gradient at (x, y) = (s, t)

Φ(X0s,t) =√

(X0s,t −X0s−1,t)2 + (X0s,t −X0s,t−1)2, (4.39)

and T = 0.3 is a threshold qualifying which gradient norms are considered to be significantand whose value was tuned experimentally.

Several approaches exist to solve (4.37); we use the FISTA-based approach in [50].

4.3 Simulation Results

The results for the compression of the image crop in Fig.4.1 are reported in Table 4.1. Thefirst row under linear prediction lists the performance of the linear prediction using the sideinformation and prediction parameters (i.e., without syndrome decoding and reconstruction),quantifying the quality of predicting each band from the reference blue band. As expected,the quality of prediction matches the spectral distances between the predicted bands and blueband: the closest green band is easiest to predict, followed by the red band and the furthestinfrared band.

For each of the linear and successive predictions, we used two different methods to set the∆ parameter. In the first method we chose the same ∆ for all 3 bands, achieving the over-all target compression rate of 2 bpp. As expected, the reconstruction quality is fairly similaramong the bands, while the bit budget is spread unevenly with the easier to predict bandsconsuming lower rates. In the second approach, the ∆ was set for each band individually insuch a manner so as to achieve a 2 bpp compression of each band. In this case, as expected,


TABLE 4.1: Decoding PSNR at 2 bpp (512× 512 image crop)

PSNR (dB) BPP

green red infrared green red infrared overhead overall

Benchmark [6] 37.79 32.76 34.24 2.00 2.00 2.00 — 2.00

Linear predictionPrediction only 33.46 28.53 27.52 — — — — —∆green = ∆red = ∆infrared = 10.395 39.44 38.93 39.46 1.50 2.19 2.28 0.00781 2.00∆green = 6.995; ∆red = 12.275; ∆infrared = 13 42.17 38.08 38.13 1.99 1.99 2.00 0.00781 2.00

Successive prediction∆green = ∆red = ∆infrared = 9.925 39.76 39.04 39.57 1.55 2.22 2.17 0.01823 2.00∆green = 7.055; ∆red = 12.05; ∆infrared = 11.675 42.09 38.20 38.98 1.98 1.98 1.99 0.01823 2.00

the reconstruction quality of the easier bands to predict improves (at the expense of the harderbands to predict, whose reconstruction quality deteriorates). Note that the overhead associ-ated with transmitting the prediction statistical parameters is included in a separate column,and is therefore not counted in the calculation of the reported per-band bpp values.

As apparent from Table 4.1 the proposed approach significantly outperforms the bench-mark in terms of quality of reconstruction. The ranges of gains in reconstruction PSNR aresummarized in Table 4.2. The performance of the linear and successive predictions is quitesimilar (with successive prediction doing slightly better than linear prediction in most cases).

TABLE 4.2: Improvement over benchmark (512× 512 image crop compressed atrate 2 bpp)

Prediction same ∆ variable ∆s

Linear 1.7 - 5.2 dB 3.9 - 5.3 dBSuccessive 2.0 - 6.3 dB 4.3 - 5.4 dB

A similar set of experiments was conducted for the entire 7040 × 7936 image, consideringlinear and successive prediction and setting the ∆s in the two different ways. The targetedcompression rate this time was 1.68 bpp. These large-scale simulations were performed usingNSERC’s Strategic Network for Smart Applications on Virtual Infrastructure (SAVI) testbed[51]. The results are shown in Table 4.3.

As expected, the behavior is similar to the smaller cropped image. A common ∆ for allbands leads to similar reconstruction quality at different compression rates, whereas varying∆ for each band, such that the rate is the same, leads to variations in reconstruction quality.The performance of the linear and successive prediction is quite similar. As with the smallercrop, the proposed approach outperforms [6]. The improvements in reconstruction quality aresummarized in Table 4.4.


TABLE 4.3: Decoding PSNR at 1.68 bpp (full 7040× 7936 image)

PSNR (dB) BPP

green red infrared green red infrared overhead overall

Benchmark [6] 39.06 37.60 35.80 1.68 1.68 1.68 — 1.68

Linear predictionPrediction only 37.05 31.67 27.32 — — — — —∆green = ∆red = ∆infrared = 9.095 41.83 41.05 40.04 1.17 1.70 2.15 0.00781 1.68∆green = 5.225; ∆red = 9.4; ∆infrared = 14.385 44.93 40.86 37.83 1.67 1.67 1.68 0.00781 1.68

Successive prediction∆green = ∆red = ∆infrared = 8.7325 42.07 41.26 40.26 1.20 1.70 2.09 0.01823 1.68∆green = 5.275; ∆red = 8.95; ∆infrared = 13.15 44.90 41.12 38.30 1.66 1.66 1.66 0.01823 1.68

TABLE 4.4: Improvement over benchmark (7040 × 7936 image compressed atrate 1.68 bpp)

Prediction same ∆ variable ∆s

Linear 2.8 - 4.2 dB 2.0 - 5.9 dBSuccessive 3.0 - 4.4 dB 2.5 - 5.8 dB

54

Chapter 5

Conclusion

5.1 Summary

In this work we considered the problem of compressing correlated sources in resource scarceenvironments, focusing on the setting in which the decoder has access to side information.Our proposed approach establishes a very low-complexity and rate-efficient source encoderby combining ideas from compressed sensing and distributed source coding.

Compression consists of acquisition of random compressive measurements, calculation ofprediction information and generation of linear channel code syndromes. By carefully imple-menting each stage of the compression process, the computational complexity of compressinga source of size n can be made as low as O(n log n). The complexity is largely shifted to thedecoding process which consists of source predictions, syndrome decoding and source recon-struction through the process of sparse optimization. Therefore, the proposed approach isparticularly well-suited for applications requiring highly efficient encoding (in terms of com-putational complexity, memory and rate-distortion trade-off), and where the decoding is as-sumed to be performed in resource rich environments.

The decoder leverages the structure of the sources and the nature of their correlations forthe source prediction, as well as to guide their optimization-based reconstruction. As such,the proposed method affords a certain level of flexibility; it allows for the use of differentprediction models and reconstruction algorithms.

The relation between the source prediction error and the probability of bit error betweenthe quantized measurements and their predicted counterpart is established in Thm. 1. It playsa central role in determining the required code rates for recovering the quantized measure-ments at the decoder via distributed source coding. Incorporation of priors in the LDPC-basedsyndrome decoding is made possible due to Thm. 2, allowing for the selection of higher coderates and hence improved compression.

Chapter 5. Conclusion 55

This work was originally motivated by remote sensing applications in which signals mustbe acquired and compressed on board computationally and memory constrained satellites. Wedemonstrated the application of the proposed approach to the problem of multispectral im-age compression. Our results exhibit significant improvements in terms of the rate-distortiontrade-off when compared to other low-complexity methods, such as [6].

5.2 Practical Considerations

In the description of the proposed method in Chapter 3, as well as in detailing its application toimage compression in Chapter 4, we sidestepped several issues of practical importance. Someof these relate specifically to the proposed method, while others are more general problemswhich can often be present in various communications systems. In either case, designing a realworld system which incorporates the developed method requires that the following practicalmatters be addressed:

• Availability of side information : Throughout this work it was assumed that the de-coder has access to side information in the form of a source correlated with the sourcesto be compressed. Depending on the application, the side information may be naturallyavailable at the decoder. At other times, it may be necessary to compress it losslessly bya suitable low-complexity encoder.

• Choice of side information: The quality of the prediction of the sources (or their mea-surements) on the basis of the side information plays a central role in the resulting rate-distortion trade-off. It is therefore important to carefully choose the source to be used asthe side information (assuming that the application permits such choice).

• Encoder-decoder synchronization: The proposed approach assumes that the encoderand decoder can make use of randomness in a synchronized manner. In particular, therandomly generated sensing operator and dither vector used by the decoder to decom-press a source must match those used by the encoder to compress the same source. As-suming that the encoder and decoder use a seed to generate (pseudo)random values, itis essential that there exists a way to match the used seeds. A simple approach would beto append the seed’s value to the compressed sources, which can then be read and usedby the decoder to generate randomness.

• Encoder-decoder communication: In the problem setting considered in this work, it isassumed that the compressed sources and side information are available to the decoder


unperturbed by noise. In practice, the communication between the encoder and decodercan be subject to (possibly considerable amount of) noise (as is indeed the case in themultispectral image compression example).

• Compression rate: Choosing the compression rate requires proper selection of the val-ues ofm and ∆. In the multispectral image compression example, these parameters weretuned experimentally through trial-and-error (i.e., for a fixed target rate, numerous com-binations of m and ∆ values were tested until the best combination, the one minimizingthe distortion, was found). Clearly such an approach is not practical as it involves re-peated encoding/decoding of the same block. Considering that the problem of dynamicrate control is not exclusive to our method, it may be possible to use already establishedtechniques and ideas to guide the selection of these parameters. Ideally, however, itwould be best if the encoder could set the rate based upon a rate-distortion functionrelating m and ∆ (and the prediction error ε) to the reconstruction quality (see furtherdiscussion below).

5.3 Future work

The probabilities given in Thm. 1 assume the use of a Gaussian sensing matrix. However,as previously mentioned, in resource constraint environments, other choices, such as the ran-domized subsampled WHT, may be better suited. Although the results given by the Thm. 1appeared to perform well in our experiments (which used the WHT), an extension of the the-orem to the WHT and other non-Gaussian matrices is of practical interest. As a first step, itcould be useful to investigate the relationship between the distributions of signal measure-ments acquired using the WHT and Gaussian sensing operators (see (3.30)).

The use of priors in the LDPC decoding as described in Section 3.3.2 allows for the use ofhigher code rates, relying on the empirical observation that setting bit priors using Thm. 2 im-proves LDPC decoding. It may however be possible to use these calculations in a more directway. In particular, instead of designing code rates based on the BSC model with crossoverprobabilities determined by Thm. 1 and then choosing rates in a less conservative manner, itcould be beneficial to model the channel as a non-BSC (e.g., as an equivalent Gaussian channel)in the hopes of further improving the compression rate

Determining the rate-distortion function for our approach is a question of theoretical andpractical importance. The trade-off between the rate and distortion is governed by the choiceof the number of measurements m and the resolution parameter ∆. The choice of the former


parameter affects the rate by influencing the sizes of syndromes (i.e, the syndromes grow lin-early in the block length m) and the distortion through the optimization process. A choice of alarger resolution parameter ∆ results in a coarser quantization of the measurements, leadingto a lower encoding rate at the cost of a reduced reconstruction PSNR. As we experimentallyobserved, different combinations of m and ∆ can result in the same overall rate, but differentdistortions. Furthermore, even if one these parameters is fixed, it may not be immediately ob-vious how to determine the other one, since the resulting rate also depends on the predictionerror ε (i.e., the channel code rates, which dictate the syndrome sizes, are determined basedon Thm. 1). Aside from its dependence on the source statistics and the used prediction model,the prediction error may also depend on m and ∆. Moreover, as discussed in Section 2.1.5, theeffects of quantization on compressed sensing reconstruction are also expected to play a partin this matter. Hence, it is likely that the interplay between these two parameters affects theresulting rate-distortion trade-off in a non-trivial way. Therefore, guided by information, cod-ing and quantized compressed sensing theory, it would be of utmost (theoretical and practical)interest to derive an expression relating m, ∆ and ε to the resulting distortion.

58

Bibliography

[1] Japan Aerospace Exploration Agency Earth Observation Research Center. “About ALOS- AVNIR-2". [Online]. Available: http://www.eorc.jaxa.jp/ALOS/en/about/avnir2.htm

[2] S. Rane, Y. Wang, P. Boufounos, and A. Vetro, “Wyner-Ziv coding of multispectral imagesfor space and airborne platforms,” in Proc. Picture Coding Symposium (PCS). Nagoya,Japan: IEEE, December 7-10 2010.

[3] Y. Wang, S. Rane, P. T. Boufounos, and A. Vetro, “Distributed compression of zerotrees ofwavelet coefficients,” in Proc. IEEE Int. Conf. Image Processing (ICIP), Brussels, Belgium,Sept. 11-14 2011.

[4] A. Abrardo, M. Barni, E. Magli, and F. Nencini, “Error-resilient and low-complexity on-board lossless compression of hyperspectral images by means of distributed source cod-ing,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 4, pp. 1892–1904, April 2010.

[5] D. Valsesia and E. Magli, “A novel rate control algorithm for onboard predictive coding ofmultispectral and hyperspectral images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 10,pp. 6341–6355, Oct 2014.

[6] D. Valsesia and P. T. Boufounos, “Universal encoding of multispectral images,” in Proc.IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China, March20-25 2016.

[7] P. Boufounos, “Universal rate-efficient scalar quantization,” IEEE Trans. Inf. Theory,vol. 58, no. 3, pp. 1861–1872, March 2012.

[8] D. Schonberg, S. Draper, and K. Ramchandran, “On compression of encrypted images,”in 2006 International Conference on Image Processing, Oct 2006, pp. 269–272.

[9] D. Schonberg, S. C. Draper, C. Yeo, and K. Ramchandran, “Toward compression of en-crypted images and video sequences,” IEEE Trans. Inf. Forensics Security, vol. 3, no. 4, pp.749–762, Dec 2008.

http://www.eorc.jaxa.jp/ALOS/en/about/avnir2.htm

BIBLIOGRAPHY 59

[10] D. Schonberg, S. Draper, and K. Ramchandran, “On compression of encrypted images,”in 2006 International Conference on Image Processing, Oct 2006, pp. 269–272.

[11] D. S. Taubman and M. W. Marcellin, “JPEG2000: standard for interactive imaging,” Proc.IEEE, vol. 90, no. 8, pp. 1336–1357, Aug 2002.

[12] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory,vol. 51, no. 12, pp. 4203–4215, Dec 2005.

[13] E. J. Candes and M. B. Wakin, “An introduction to compressive sampling,” IEEE SignalProcess. Mag., vol. 25, no. 2, pp. 21–30, March 2008.

[14] J. F. Claerbout and F. Muir, “Robust modeling with erratic data,” GEOPHYSICS, vol. 38,no. 5, pp. 826–844, 1973.

[15] E. J. Candes, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete andinaccurate measurements,” Comm. Pure Appl. Math., vol. 59, no. 8, pp. 1207–1223, 2006.

[16] P. Boufounos and R. Baraniuk, “Quantization of sparse representations,” in Data Compres-sion Conference, 2007. DCC ’07, March 2007, pp. 378–378.

[17] W. Dai, H. V. Pham, and O. Milenkovic, “Distortion-rate functions for quantized com-pressive sensing,” in 2009 IEEE Information Theory Workshop on Networking and InformationTheory, June 2009, pp. 171–175.

[18] ——, “A comparative study of quantized compressive sensing schemes,” in 2009 IEEEInternational Symposium on Information Theory, June 2009, pp. 11–15.

[19] J. Z. Sun and V. K. Goyal, “Optimal quantization of random measurements in compressedsensing,” in 2009 IEEE International Symposium on Information Theory, June 2009, pp. 6–10.

[20] A. Zymnis, S. Boyd, and E. Candes, “Compressed sensing with quantized measure-ments,” IEEE Signal Process. Lett., vol. 17, no. 2, pp. 149–152, Feb 2010.

[21] U. Kamilov, “Optimal quantization for sparse reconstruction with relaxed belief propa-gation,” Master’s thesis, EPFL/MIT, 2001.

[22] L. Jacques, K. Degraux, and C. De Vleeschouwer, “Quantized iterative hard thresholding:Bridging 1bit and high-resolution quantized compressed sensing,” in 10th internationalconference on Sampling Theory and Applications (SampTA2013), Bremen, Germany, Jul. 2013,pp. 105–108.

BIBLIOGRAPHY 60

[23] P. T. Boufounos, L. Jacques, F. Krahmer, and R. Saab, Quantization and Compressive Sensing.Cham: Springer International Publishing, 2015, pp. 193–237.

[24] A. Kipnis, G. Reeves, Y. C. Eldar, and A. J. Goldsmith, “Compressed sensing under opti-mal quantization,” in 2017 IEEE International Symposium on Information Theory (ISIT), June2017, pp. 2148–2152.

[25] J. N. Laska and R. G. Baraniuk, “Regime change: Bit-depth versus measurement-rate incompressive sensing,” IEEE Trans. Signal Process., vol. 60, no. 7, pp. 3496–3505, July 2012.

[26] L. Jacques, D. K. Hammond, and J. M. Fadili, “Dequantizing compressed sensing: Whenoversampling and non-gaussian constraints combine,” IEEE Trans. Inf. Theory, vol. 57,no. 1, pp. 559–571, Jan 2011.

[27] U. S. Kamilov, V. K. Goyal, and S. Rangan, “Message-passing de-quantization with appli-cations to compressed sensing,” IEEE Trans. Signal Process., vol. 60, no. 12, pp. 6270–6281,Dec 2012.

[28] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal,vol. 27, no. 3, pp. 379–423, 1948.

[29] E. Berlekamp, Algebraic coding theory, ser. McGraw-Hill series in systems science.McGraw-Hill, 1968.

[30] R. Gallager, Low-Density Parity-Check Codes, ser. M.I.T. Press research monographs. M.I.T.Press, 1963.

[31] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes un-der message-passing decoding,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 599–618, Feb2001.

[32] S. Ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density parity-check codesfor modulation and detection,” IEEE Trans. Commun., vol. 52, no. 4, pp. 670–678, 2004.

[33] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codesfor symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7,pp. 3051–3073, July 2009.

[34] D. J. C. MacKay, Information Theory, Inference & Learning Algorithms. New York, NY, USA:Cambridge University Press, 2002.

BIBLIOGRAPHY 61

[35] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans.Inf. Theory, vol. 19, no. 4, pp. 471–480, Sep. 1973.

[36] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side informationat the decoder,” IEEE Trans. Inf. Theory, vol. 22, no. 1, pp. 1–10, Jan 1976.

[37] A. Wyner, “Recent results in the shannon theory,” IEEE Trans. Inf. Theory, vol. 20, no. 1,pp. 2–10, Jan 1974.

[38] T. Ancheta, “Syndrome-source-coding and its universal generalization,” IEEE Trans. Inf.Theory, vol. 22, no. 4, pp. 432–436, Jul 1976.

[39] S. S. Pradhan and K. Ramchandran, “Distributed source coding using syndromes (DIS-CUS): design and construction,” IEEE Trans. Inf. Theory, vol. 49, no. 3, pp. 626–643, Mar2003.

[40] D. Valsesia and P. T. Boufounos, “Multispectral image compression using universal vectorquantization,” in Proc. IEEE Info. Theory Workshop, Cambridge, UK, Sept 11-14 2016.

[41] M. Goukhshtein, P. T. Boufounos, T. Koike-Akino, and S. Draper, “Distributed codingof multispectral images,” in 2017 IEEE International Symposium on Information Theory,Aachen, Germany, Jun. 2017, pp. 3240–3244.

[42] N. Ailon and H. Rauhut, “Fast and RIP-optimal transforms,” Discrete Comput. Geom.,vol. 52, no. 4, pp. 780–798, Dec. 2014.

[43] L. Schuchman, “Dither signals and their effect on quantization noise,” IEEE Trans. Com-mun. Technol., vol. 12, no. 4, pp. 162–165, December 1964.

[44] H. Imai and S. Hirakawa, “A new multilevel coding method using error-correctingcodes,” IEEE Trans. Inf. Theory, vol. 23, no. 3, pp. 371–377, May 1977.

[45] N. Parikh and S. Boyd, “Proximal algorithms,” Found. Trends Optim., vol. 1, no. 3, pp.127–239, Jan. 2014.

[46] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linearinverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.

[47] S. Rangan, “Generalized approximate message passing for estimation with random linearmixing,” in 2011 IEEE International Symposium on Information Theory Proceedings, July 2011,pp. 2168–2172.

BIBLIOGRAPHY 62

[48] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for com-pressed sensing,” Proceedings of the National Academy of Sciences of the United States of Amer-ica, vol. 106, no. 45, pp. 18 914–18 919, 2009.

[49] U. S. Kamilov, V. K. Goyal, and S. Rangan, “Message-passing de-quantization with appli-cations to compressed sensing,” IEEE Trans. Signal Process., vol. 60, no. 12, pp. 6270–6281,Dec 2012.

[50] U. S. Kamilov, “A parallel proximal algorithm for anisotropic total variation minimiza-tion,” IEEE Trans. Image Process., vol. 26, no. 2, pp. 539–548, 2017.

[51] T. Lin, B. Park, H. Bannazadeh, and A. Leon-Garcia, SAVI Testbed Architecture and Federa-tion. Cham: Springer International Publishing, 2015, pp. 3–10.

distributed coding of compressively sensed sources...(cs) and distributed source coding (dsc), the...

Documents