compressed sensing: basic results, applications and new...

Compressed Sensing:Basic Results, Applications and New Extensions

Duzhe WangDepartment of Statistics

University of [email protected]

Abstract

Compressed sensing is a popular signal sampling and recovery technique, whichhas a successful application in medical imaging. It builds upon the fundamental factthat many signals are sparse and then can be represented using only a small numberof non-zero coefficients in a suitable basis. The theory shows if the sensing matrixsatisfies the restricted isometry property, then the basis pursuit optimization cansuccessfully recovery sparse signal x. RIP, as a useful sufficient condition for exactor approximate recovery, has a close relationship with the Johnson-Lindenstrausslemma. Sensing matrix is usually selected as Gaussian ensemble or Rademacherensemble because of their universality of incoherence with a lot of sparsifyingmatrix, but these sensing matrices need a large amount of storage, which is notefficient in large-scale setting. In the end of this report, we will introduce a fastand efficient compressed sensing system using structurally random matrices.

1 Introduction

Compressed sensing is a fast growing field which has attracted considerable attention in signalprocessing, statistics, computer science and some other broader scientific community such as medicalimaging. The Nyquist-Shannon sampling theorem demostrates that signals can be exactly recoveredfrom a set of uniformly spaced samples taken at the so-called Nyquist rate of twice the highestfrequency present in the signal of interest. However, in many important and emerging applications,the resulting Nyquist rate is so high that we end up with far too many samples. To address thelogistical and computational challenges involved in dealing with such high-dimensional data, weoften depend on compression, which aims finding the most concise representation of a signal that isable to achieve a target level of acceptable distortion. One of the most popular techniques for signalcompression is known as transform coding, and typically relies on finding a basis that provides sparseor compressible representations for signals in a class of interest.

Leveraging the concept of transform coding, compressed sensing has emerged as a new frameworkfor signal acquisition. Compressed sensing enables a potentially large reduction in the sampling andcomputation costs for sensing signals that have a sparse or compressible representation. While theNyquist-Shannon sampling theorem states that a certain minimum number of samples is required inorder to perfectly capture an arbitrary bandlimited signal, when the signal is sparse in a known basiswe can vastly reduce the number of measurements that need to be stored. Consequently, when sensingsparse signals we might be able to do better than suggested by classical results. The fundamentalidea behind compressed sensing is rather than first sampling at a high rate and then compressing thesampled data, we would like to find ways to directly sense the data in a compressed form(i.e., at alower sampling rate).

Section 2 will introduce some background about sparsity, Section 3 and Section 4 are basic resultsof compressed sensing, restricted isometry property and Johnson-Lindenstrauss Lemma. We give a

powerful application of compressed sensing in Magnetic resonance imaging in Section 5. In Section6, we explain the new extension of sensing matrices in large-scale setting.

2 Background

2.1 Sparse signals

Definition 1. A signal x is k-sparse when it has at most k nonzeros, i.e., ||x||0 ≤ k. Let

Σk = x : ||x||0 ≤ k

denote the set of all k-sparse signals.

In general signals are not themselves sparse, but admit a sparse representation in some basis Φ. Inthis case, we will still refer to x as being k-sparse, with the understanding that we can express x asx = Φc where ||c||0 ≤ k. The multi-scale wavelet transform provides nearly sparse representationsfor natural images. An example is shown in Figure 1 below.

(a) Original image (b) Wavelet representation

Figure 1: Sparse representation of an image via multiscale transform.

2.2 Geometry of sparse signals

Sparsity is a highly nonlinear model since the choice of which dictionary elements are used canchange from signal to signal. This can be seen by observing that given a pair of k-sparse signals,a linear combination of the two signals will in general no longer be k-sparse, since their supportsmay not coincide. That is, for any x, z ∈ Σk, we do not necessarily have that x+ z ∈ Σk. This isillustrated in Figure 2, which shows Σ2 embedded in R3, i.e., the set of all 2-sparse signals in R3.

Figure 2: Union of subspaces defined by Σ2 ⊂ R3

2

2.3 Compressible signals

Another important point in practice is that few real-world signals are truly sparse, rather they arecompressible, meaning that they can be well-approximated by a sparse signal. See Figure 3 as anexample.

Figure 3: Compressible signal

In fact, we can quantify the compressibility by calculating the error incurred by approximating asignal x by some x ∈ Σk :

σk(x)p = min||x− x||p, s.t. x ∈ Σk (1)

3 An overview of compressed sensing

Compressed sensing builds upon the fundamental fact that many signals x ∈ CN are K-sparse, whichmeans they can be represented using only K (K N ) non-zero coefficients in a suitable basis. Inmathematical terms, the observed data y ∈ Cm is connected to the signal x ∈ CN of interest via

y = Ax (2)

wherem < N and the matrixA ∈ Cm×N models the linear measurement process. A is called sensingmatrix. Literature in compressed sensing community has proved that if A satisfies the restrictedisometry property(RIP), then the following convex relaxation problem (basis pursuit problem)

x = argmin||x||1, such thatAx = y (3)

can successfully recover K-sparse signal x.

Definition 2. A matrix A ∈ Rm×N is (ε, s)-RIP if for all x 6= 0 s.t. ||x||0 ≤ s, we have

(1− ε)||x||22 ≤ ||Ax||22 ≤ (1 + ε)||x||22 (4)

3.1 Main results

There are following three surprising results:

1. It’s possible to fully reconstruct any sparse signal if it was compressed by x→ Φx, whereΦ is a matrix which satisfies a condition so-called Restricted Isometry Property. A matrixthat have this property is guaranteed to have a low distortion of the norm of any sparserepresentable vector.

2. The reconstruction can be calculated in polynomial time by solving a linear program.

3. A random n× d matrix is likely to satisfy the RIP condition provided that n is greater thanorder of s log(d).

3

Theorem 1. Let ε < 1 and let W be a (ε, 2s)-RIP matrix. Let x be a vector s.t. ||x||0 ≤ s, lety = Wx be the compression of x, and let

x ∈ argmin||v||0v : Wv = y

be a reconstructed vector, then x = x.

Proof. If x 6= x, since x satisfies the constraints in the optimization problem that defines x we clearlyhave that ||x||0 ≤ ||x||0 ≤ s. Thereofore, ||x− x||0 ≤ 2s and we can apply the RIP inequality on thevector x− x. But since W (x− x) = 0 we get that |0− 1| ≤ ε, which leads to a contradiction.

Note the reconstruction in Theorem 1 is non-efficient because we need to minimize a combinatorialobjective. In fact, we have the following strong result, which holds even if x is not a sparse vector.

Theorem 2. Let ε < 11+√2

and let Φ be a (ε, 2s)-RIP matrix, let x be any blue arbitrary vector anddenote

xs ∈ argmin||x− v||1v : ||v||0 ≤ s

and letx ∈ argmin||x||1x : Φx = y

be the reconstructed vector, then

||x− x||2 ≤ 2(1− ρ)−1s−1/2||x− xs||1

where ρ =√

2ε/(1− ε).

Proof. The proof is in [Candes, “The restricted isometry property and its implications for compressedsensing”].

The last theorem in this section tells us that random matrices with n ≥ Ω(n log d) are likely to be RIP.In fact, the theorem shows that multiplying a random matrix by an orthonormal matrix also providesan RIP matrix. This is important for compressing signals of the form x = Uα where x is not sparsebut α is sparse. In that case, if W is a random matrix and we compress using y = Wx then this is thesame as compressing α by y = (WU)α and since WU is also RIP we can reconstruct α from y.

Theorem 3. Let Ψ be arbitrary fixed N ×N orthonormal matrix, let ε, δ be scalars in (0,1), let s bean integer in [N], and let M be an integer that satisfies

M ≥ 100s ln(40N/(δε))

ε2

Let Φ ∈ RM×N be a matrix, s.t. each element of Φ is distributed normally with zero mean andvariance of 1/M . Then with probability of at least 1 − δ over the choice of Φ, the matrix ΦΨ is(ε, s)-RIP.

Proof. The proof is due to [Baraniuk, Davenport, DeVore and Wakin, “A simple proof of the RIPfor random matrices”. The idea is to combine Johnson-Lindenstrauss lemma with a simple coveringargument.

4 More on RIP and JL lemma

In this section, we’ll discuss more about RIP condition and the Johnson-Lindenstrauss Lemma. Firstwe give the distributional Johnson-Lindenstrauss property in the following.

Definition 3. A random matrix ΦM×N ∼ D satisfies the (ε, δ)-distributional JL property if for anybluefixed x ∈ RN , with probability greater than 1− δ,

(1− ε)||x||22 ≤ ||Φx||22 ≤ (1 + ε)||x||22 (5)

4

4.1 Distributional JL and RIP

The following two theorems show the RIP condition and the distributional Johnson-Lindenstraussproperty are equivalent in some sense.Theorem 4. Suppose ε < 1 and M ≥ C1(ε)K log(NK ). If Φ satisfies the ( ε2 , δ)-distributional JLproperty with δ = e−Mε, then with probability at least 1− e−Mε/2, Φ satisfies (ε,K)-RIP.Theorem 5. Suppose ΦM×N satisfies (ε, 2K)-RIP, let Dε = diag(ε1, ..., εN ) be a diagonal matrixof i.i.d. Rademacher RVs, then ΦDε satisfies the (3ε, 3e−cK)-distributional JL property.

4.2 Johnson-Lindenstrauss lemma is tight

Johnson-Lindenstrauss lemma says suppose 0 < ε < 12 , let x1, ...., xN ⊂ Rn, and m = 20 logN

ε2 ,then there exists a mapping f : Rn → Rm, s.t. for any (i, j),

(1− ε)||xi − xj ||22 ≤ ||f(xi)− f(xj)||22 ≤ (1 + ε)||xi − xj ||22 (6)

It is a natural and often-asked question whether Johnson-Lindenstrauss lemma is tight: does thereexist some x of size M such that the map f in the Johnson-Lindenstrauss lemma must have m =Ω(ε−2 lnM)? Larsen and Nelson states for any n > 1, 0 < ε < 1

2 , and N > nC for some constantC > 0, then JL lemma is optimal in the case where f is linear.

5 Compressed sensing MRI

In this section, we’ll introduce a successful application of compressed sensing in medical imaging.

Magnetic resonance imaging(MRI) is a common technology in medical imaging used for varioustasks such as brain imaging and dynamic heart imaging. The signal measured by the MRI system isthe Fourier transform of the magnitude of the magnetization, and the traditional MRI reconstructionmethod is inverse Fourier transform. See Figure 4.

Figure 4: Traditional MRI image reconstruction

In traditional approaches, the measurement time to produce high-resolution images can be excessivein clinical situations. For instance, heart patients can not be expected to hold their breath for toolong a time, and children are too impatient to sit still for more than about two minutes. In suchsituations, the use of compressed sensing to achieve high-resolution images based on few samplesappears promising.

5.1 Sparsity of medical imaging

In general, medical images are not sparse, but have a sparse representation by Wavelet transform. SeeFigure 5.

5.2 Mathematical framework of CSMRI

θ ∈ argminθ

1

2||Fu(Ψθ)− y||22 + λ||θ||1 (7)

where Ψ is a sparsifying basis, θ is the transform coefficient vector, Fu is the undersampled Fouriertransform, y is the samples we have acquired in the K-space. Hence, Ψθ is the estimated image.

5

Figure 5: Sparse representation of brain image

6 Compressed sensing with structurally random matrices

6.1 Structurally random ensemble system

Current sensing matrices like Gaussian ensembles or Bernoulli ensembles are well known as theyare universally incoherent with all other sparsifying bases. However, it’s quite costly to realizerandom matrices in practical sensing applications as they require very high computational complexityand huge memory buffering due to their completely unstructured nature, for example, to process a512× 512 image with 64K measurements, a Bernoulli random matrix requires a gigabyte of storage,so it’s very expensive and in many cases unrealistic.

Therefore, one of remaining challenges for CS in practice is to design a sensing framework that has afollowing features:

1. Optimal or near optimal sensing performance: the number of measurements for exactrecovery approaches the the minimal bound, on the order of O(K logN);

2. Low complexity, fast computation and block-based processing support: these features of thesensing matrix are desired for large-scale, real-time sensing applications.

3. Hardware/optics implementation friendliness: entries of the sensing matrix only take valuesin the set −1, 1, 0.

To satisfy the aforementioned wish list, the authors propose a framework called structurally randommatrix(SRM), defined as a product of three matrices

Φ =

√N

MDFR (8)

where

1. R ∈ RN×N is either a uniform random permutation matrix or a diagonal random matrixwhose diagonal entries Rii are i.i.d. Bernoulli random variables with identical distributionP (Rii = 1) = P (Rii = −1) = 1

2 . We refer to the permutation matrix as the globalrandomizer and the diagonal random matrix as the local randomizer.

2. F ∈ RN×N is an orthonormal matrix, in practice, it could be FFT. The purpose of thematrix F is to spread the information of the signal’s samples over all measurements.

3. D ∈ RM×N is a subsampling matrix. D selects a random subset of rows of the matrix FR.If the probability of selecting a row is M

N , the numbers of rows selected would be M onaverage. In matrix representation, D is simply a random subset of M rows of the identitymatrix of size N ×N .

Equivalently, the proposed sensing algorithm SRM contains three steps:

1. Step 1( Prerandomization): randomize a target signal by either flipping its sample signs oruniformly permuting its sample locations. This step corresponds to multiplying the signalwith the matrix R;

6

2. Step 2( Transformation): Apply a fast transform F to the randomized signal;

3. Step 3( Subsampling): randomly pick up M measurements out of N transform coefficients.This step corresponds to multiplying the transform coefficients with the matrix D.

The following Figure illustrates the steps described in the above.

Figure 6: Structurally sensing matrix

6.2 Theoretical analysis

Conventional compressed sensing reconstruction algorithm is employed to recover the transformcoefficient vector α by solving the l1 minimization. The following theorem shows the coher-ence of the structurally random matrix and the sparsifying matrix. Note the optimal coher-ence(Gaussian/Rademacher random matrices) is O(

√(logN)/N)

Theorem 6. Assume that the maximum absolute entries of a structurally random matrix ΦM×N andan orthogonal matrix ΨN×N is not larger than 1/

√logN , then with high probability, coherence of

ΦM×N and ΨN×N is not larger than O(√

(logN)/s) where s is the average number of nonzeroentries per row of ΦM×N .

The last theorem below gives the number of measurement which is required for exact recovery. Notethe optimal number of measurements required by Gaussian/Rademacher blue dense random matricesis O(K logN), which is smaller than the number of measurements required by the structurallyrandom matrix. This corresponds to the larger coherence stated in Theorem 6.Theorem 7. With the previous assumption, sampling a signal using a structurally random matrix guar-antees exact reconstruction( by Basis pursuit) with high probability, provided M ∼ (KN/s) log2N .

7 Conclusions and discussion

Compressed sensing is an exciting field nowadays. It not only has theoretical mathematical guaranteesfor its performance, but also has a wide application in different areas such as medical imaging. Onefuture direction I am very interested in is how to propose new signal recover algorithms instead of L1

minimization, for example, using non-convex regularizers.

References[1] Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin. A simple proof of the

restricted isometry property for random matrices. Constructive Approximation, 28(3):253–263,Dec 2008.

[2] Emmanuel J. Candes. The restricted isometry property and its implications for compressedsensing. Comptes Rendus Mathematique, 346(9):589 – 592, 2008.

[3] T. T. Do, L. Gan, N. H. Nguyen, and T. D. Tran. Fast and efficient compressive sensing usingstructurally random matrices. IEEE Transactions on Signal Processing, 60(1):139–154, Jan2012.

7

[4] Felix Krahmer and Rachel Ward. New and improved johnson-lindenstrauss embeddings via therestricted isometry property. CoRR, abs/1009.0744, 2010.

[5] Kasper Green Larsen and Jelani Nelson. The Johnson-Lindenstrauss Lemma Is Optimal forLinear Dimensionality Reduction. In Ioannis Chatzigiannakis, Michael Mitzenmacher, YuvalRabani, and Davide Sangiorgi, editors, 43rd International Colloquium on Automata, Languages,and Programming (ICALP 2016), volume 55 of Leibniz International Proceedings in Informatics(LIPIcs), pages 82:1–82:11, Dagstuhl, Germany, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuerInformatik.

[6] Shai Shalev-Shwartz. Compressed sensing: Basic results and self contained proofs.

8

compressed sensing: basic results, applications and new...

Documents