using deep learning to accelerate sparse recoverywotaoyin/papers/pdf/alista_slides_tamu.pdf ·...

32
Using Deep Learning to Accelerate Sparse Recovery Wotao Yin Joint work with Xiaohan Chen , Jialin Liu , Zhangyang Wang UCLA Math Texas A&M CSE Texas A&M U — February 20, 2019 1 / 32

Upload: others

Post on 19-Mar-2020

2 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Using Deep Learning to Accelerate Sparse Recovery

Wotao Yin†

Joint work with Xiaohan Chen�, Jialin Liu†, Zhangyang Wang�

†UCLA Math �Texas A&M CSE

Texas A&M U — February 20, 2019

1 / 32

Page 2: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

This talk is based on the following papers:

• X. Chen, J. Liu, Z. Wang, and W. Yin, Theoretical linear convergenceof unfolded ISTA and its practical weights and thresholds, Advances inNeural Information Processing Systems (NeurIPS), 2018.

• J. Liu, X. Chen, Z. Wang, W. Yin, ALISTA: analytic weights are asgood as learned weights in LISTA, International Conference on LearningRepresentations (ICLR), 2019.

X. Chen and J. Liu are equal first authors in both papers.2 / 32

Page 3: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Overview

Recover a sparse x∗

b := Ax∗ + white noise

where A ∈ Rm×n and b ∈ Rm are given.

Known as compressed sensing, feature selection or LASSO.A fundamental problem with numerous applications in signal processing,inverse problems, and statistical/machine learning.

3 / 32

Page 4: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Application: Examples

MRI Reconstruction

Radar Sensing

4 / 32

Page 5: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Our methods improve upon classical analytical sparse recovery algorithms by

• recovering a signal closer to x∗ (higher quality)

• reducing the total number of iterations to just 15–20 (fast recovery)

Our methods improve upon existing deep learning-based recovery algorithms„e.g., LISTA (Gregor & LeCun’10), by

• learning (much) fewer parameters (faster training)

• adding support detection (faster recovery)

• proving linear convergence and robustness (theoretical guarantee!)

5 / 32

Page 6: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

This talk is based on two recent papers:

• Xiaohan Chen*, Jialin Liu*, Zhangyang Wang, Wotao Yin. “Theoreticallinear convergence of unfolded ISTA and its practical weights andthresholds.” NIPS’18.

• Jialin Liu*, Xiaohan Chen*, Zhangyang Wang, Wotao Yin. “ALISTA:Analytic weights are as good as learned weights in LISTA.” to appear inICLR’19.

* denotes equal-contribution first authors.

6 / 32

Page 7: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Outline

• Review LASSO model and ISTA method

• LISTA: classic, a series of parameter elimination

• Theoretical results

• How to make it robust

7 / 32

Page 8: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

LASSO and ISTA

LASSO model:

xlasso ← minimizex

12‖b−Ax‖

22 + λ‖x‖1

where λ is a model parameter, tuned by hand or cross validation.

Forward-backward splitting gives ISTA:

x(k+1) = η λL

(x(k) + 1

LAT (b−Ax(k))

)Sublinearly converges to xlasso with an eventual linear speed, not x∗.

FPC (fixed-point continuation): faster by using large λ and scheduling itsreduction. Proves finite support detection and eventual linear convergence.

8 / 32

Page 9: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Relax ISTA

Rewrite ISTA asx(k+1) = ηθ(W1b+W2x

(k)),

where W1 = 1LAT ,W2 = In − 1

LATA and θ = λ

L.

9 / 32

Page 10: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Gregor & LeCun’10: Learned ISTA (LISTA)

Unfold K iteration of ISTA

Free W k1 ,W

k2 and θk, k = 0, . . . ,K − 1 as parameters.

Learn them from training set D = {(bi, x∗i )}

minimize{Wk

1 ,Wk2 ,θ

k}

∑(b,x∗)∈D

∥∥xK(b)− x∗∥∥2

2.

10 / 32

Page 11: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Just generate synthetic sparse signals, train it like a neural network.

Training is very slow. But, K = 16 is enough. Better denoising quality.

0 100 200 300 400 500 600 700 800

-40

-30

-20

-10

0

NM

SE

(d

B)

ISTA ( = 0.1)

ISTA ( = 0.05)

ISTA ( = 0.025)

LISTA

11 / 32

Page 12: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

However, it does not scale

To iterate K iterations, the total number of parameters is

O(n2K +mnK).

Too many parameters and too many hours to learn!

12 / 32

Page 13: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Coupling W1, W2If we need xK → x? uniformly for all sparse signals and no measurement noise,then we must have:

• W k2 +W k

1 A→ I,• θk → 0.

True under learning:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0

0.5

1

1.5

2

2.5

3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0

0.2

0.4

0.6

0.8

1

13 / 32

Page 14: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Therefore, we enforce the following coupling in all layers:

W k2 = In −W k

1 A,

yielding the iteration:

x(k+1) = ηθk (x(k) +W k1 (b−Ax(k))).

Parameter reduction

O(n2K +mnK)→ O(mnK),

significant especially if m� n. Also, helps to stabilize training.

14 / 32

Page 15: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Support selection

Inspired by FPC (Hale, Yin, Zhang’08) and Linearized Bregman (Osher etal’10).

Idea: at each iteration, let the largest components bypass soft-thresholding.

Selection of the largest by fraction, hand tuned.

We obtained both empirical and theoretical improvements.

15 / 32

Page 16: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Empirical results

• We compare results using normalized MSE (NMSE) in dB.

NMSE(x, x∗) = 20 log10 (‖x− x∗‖2/‖x∗‖2)

• Notation:• Original LISTA: LISTA;• LISTA with weight coupling: LISTA-CP;• LISTA with support selection: LISTA-SS;• LISTA with both structures: LISTA-CPSS;

• Setting:• m = 250, n = 500, sparsity s ≈ 50.• Aij ∼ N (0, 1/

√m), iid. A is column-normalized.

• Magnitudes are sampled from standard Gaussian.• Measurement noise levels are measured by signal-to-noise ratio.

16 / 32

Page 17: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Weight coupling

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

-50

-40

-30

-20

-10

0

ISTA

FISTA

AMP

LISTA

LISTA-CP

Weight coupling stabilizes intermediate results.No change in final recovery quality.

17 / 32

Page 18: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

(Adding) support selection

Noiseless case (SNR=∞)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

-70

-60

-50

-40

-30

-20

-10

0

ISTA

FISTA

AMP

LISTA

LAMP

LISTA-CP

LISTA-SS

LISTA-CPSS

← LISTA and LISTA-CP

← LISTA-SS← LISTA-CPSS

18 / 32

Page 19: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

(Adding) support selection

Noisy case (SNR=30)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

-40

-30

-20

-10

0

ISTA

FISTA

AMP

LISTA

LAMP

LISTA-CP

LISTA-SS

LISTA-CPSS

19 / 32

Page 20: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Natural image compressive sensing reconstruction

(a) Ground truth (b) 20% sample rate (c) 30% sample rate

(d) 40% sample rate (e) 50% sample rate (f) 60% sample rate

20 / 32

Page 21: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Theory: convergence analysis

Theorem (Convergence of LISTA-CP)Suppose K =∞ and let {x(k)}∞k=1 be generated by LISTA-CP. There exists asequence of parameters Θ(k) = {W i

1 , θi}k−1i=0 such that

‖x(k)(Θ(k), b, x0)− x∗‖2 ≤ C1 exp(−ck) + C2σ, ∀k = 1, 2, · · · ,

holds for all (x∗, ε) that are sparse and bounded, where c, C1, C2 > 0 areconstants that depend only on A and the distribution of x∗, and σ is the noiselevel.

The error bound consists of two parts:

• the error that linearly converges to zero;• the irreducible error caused by the measurement noise.

21 / 32

Page 22: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Theory: convergence analysis

Theorem (Convergence of LISTA-CPSS)Suppose K =∞ and let {x(k)}∞k=1 be generated by LISTA-CPSS. There existsa sequence of parameters Θ(k) = {W i

1 , θi}k−1i=0 such that

‖x(k)(Θ(k), b, x0)− x∗‖2 ≤ C1 exp(−k−1∑t=0

ctss

)+ Cssσ, ∀k = 1, 2, · · ·

holds for all (x∗, ε) satisfying some assumptions, where ckss ≥ c for all k, ckss > c

for large enough k, and Css < C2.

The convergence rate is better: ckss > c for large enough k. The acceleration ismore significant in deeper layers.The recovery error is better: Css < C2.

22 / 32

Page 23: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Tie W1 across the iterations

In the proofs, we chose one W k1 independent on layer k.

So, we use just one W for all iterations:

O(mnK)→ O(mn),

yielding tied LISTA (TiLISTA):

x(k+1) = ηθk (x(k) + γkWT (b−Ax(k))).

We learn step sizes {γk}k, thresholds {θk}k and just one matrix W .Tied LISTA works as well as LISTA.

23 / 32

Page 24: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Analytic LISTA (ALISTA)

Proofs also reveal that W needs to have small mutual coherence to A. So, wetried to solve for W , independent of training data.

Two steps:

1. Pre-compute W :

W ∈ argminW∈Rm×n

∥∥WTA∥∥2F, s.t. (W:,j)TA:,j = 1, ∀j = 1, 2, · · · , n,

which is a standard convex quadratic program and easy to solve.2. With W fixed, learn {γk, θk}k from data (back-propagation)

24 / 32

Page 25: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Analytic LISTA (ALISTA)

For the resulting ALISTA network:

1. The layer-wise weights W depends on model A, but not on training data.2. Step sizes γk and thresholds θk are learned from data, but they are only a

small number of scalars.

25 / 32

Page 26: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Numerical evaluation

Noiseless case(SNR=∞)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

-75

-65

-55

-45

-35

-25

-15

-5

NM

SE

(dB

)

ISTA

FISTA

LISTA

LISTA-CPSS

TiLISTA

ALISTA

Noisy case(SNR=30dB)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

-50

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

NM

SE

(dB

)

ISTA

FISTA

LISTA

LISTA-CPSS

TiLISTA

ALISTA

26 / 32

Page 27: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Numbers of parameters to train

K: number of layers. A has M rows and N columns.

Original LISTA O(KM2 +K +MN)LISTA-CPSS O(KNM +K)

TiLISTA O(NM +K)ALISTA O(K)

A 16-layer ALISTA network only takes around 0.1 hours (6 minutes) oftraining, to achieve the comparable performance as LISTA-CPSS, which takesaround 1.5 hours to train.

27 / 32

Page 28: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Extension to convolutional A

Our main results can be directly extended to very large convolutions (circulantmatrices). They can handle large images.

Problem: forming a full matrix W is impossible, even for 100× 100 imagingproblems.

Approach: use a convolution W , find a nearly optimal one, minimize coherenceby FFTs.

Theoretical guarantee: we can get accurate approximation when the image (weconsider 2D conv) is large enough.

28 / 32

Page 29: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

An end-to-end robust modelMutual coherence minimization can also be solved by unrolling an algorithm!

The coherence minimization can be relaxed as

arg minW∈Rm×n

∥∥Q� (ATW − In)∥∥2F,

where � is the Hadamard product and Q is a weight matrix that put morepenalty on errors on diagonals. It can be solved by gradient descent:

W (k+1) = W (k) − γ(k)A(Q2 � (ATW (k) − In)).

Figure: One Layer of the Encoder.

29 / 32

Page 30: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Robust ALISTA: An end-to-end robust model

We feed encoder perturbed models A = A+ εA so that W is robust to modelperturbations to some extent.The encoder takes A and returns W . It is obtained by unrolling the gradientdescent in the last slide.The decoder takes W , A, b and returns x. It is the ALISTA model.

Figure: Robust ALISTA: cascaded Encoder-Decoder Structure.

30 / 32

Page 31: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Numerical results

We perturb A0 element-wise with Gaussian σ ≤ 0.03. Perturbed A is thencolumn normalized. Testing model is A, perturbed from A0.

W matrices in non-robust LISTA methods are obtained using A0.

-0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035

-70

-50

-30

-10

10

30

50

NM

SE

(d

B)

31 / 32

Page 32: Using Deep Learning to Accelerate Sparse Recoverywotaoyin/papers/pdf/ALISTA_slides_TAMU.pdf · Using Deep Learning to Accelerate Sparse Recovery Wotao Yin† Joint work with Xiaohan

Summary

There is huge room of speed improvement by adapting an algorithm to asubset of optimization problems.

We can integrate data-driven (slow, adaptive) and analytic (fast, universal)approaches to obtain fast and adaptive algorithms.

While optimization helps deep learning, deep learning (ideas) can also helpoptimization.It is a part of the bigger picture called “differential programming”, a hot risingfield in deep learning theory.

Thank you!

32 / 32