short-and-sparse deconvolution a geometric approach · yenson lau1,4, qing qu2,4, han-wen (henry)...
TRANSCRIPT
Short-and-Sparse Deconvolution – A Geometric ApproachYenson Lau†,?, Qing Qu‡,?, Han-Wen (Henry) Kuo†, Pengcheng Zhou†, Yuqian Zhang�, and John Wright†
†Columbia University, ‡New York University, �Cornell University, ?denote equal contribution
Short-and-Sparse (SaS) Model
•Model signals containing repeated (short) motifs:
500 1000 1500 2000
200
400
600
800
1000
1200
1400
1600 50 100 150 200 250 300 350 400
50
100
150
200
250
300
350
400
≈
≈
≈
~
~
~
y ≈ a0︸︷︷︸short kernel
~ x0︸︷︷︸sparse signal
Problem: SaS Deconvolution (SaSD)
Given convolution y = a0 ~ x0 ∈ Rm between shorta0 ∈ Rn0 (n0 � m), and sparse x0 ∈ Rm, recoverboth a0 and x0.
Two Intrinsic Properties of SaSD
−1 1
1 −2 0
−1 0 2
=~
~
y = αs`[a0] (1/α)s−`[x0]~
• Symmetry: All scale & shifts of (a0,x0) are equivalentsolutions;
Spiky Generic Tapered Generic Lowpass
µ ≈ 0 µ ≈ n−1/20 µ ≈ const.
θ ≈ n−1/20 θ ≈ n
−3/40 θ ≈ n−1
0
a0
x0
shift-coherence
allowablesparsity
- Intrinsic shift symmetry leads to nonconvex problems.
• Sparsity-coherence tradeoff. For a0 ∈ Sn0−1, largercoherence of a0 (in other words, smoother)
µ(a0) = max6̀=0|〈a0, s`[a0]〉| ∈ (0, 1],
results in sparser x0 to be recovered, and vice versa.- In practice, kernels tend to be smooth with µ(a0). ≈ 1.
Problem Formulation
•Bilinear Lasso (BL). Natural nonconvex objective
mina∈Sn−1, x
ΨBL(a,x) := 12‖y − a~ x‖2︸ ︷︷ ︸
data fidelity
+ λ · ‖x‖1︸ ︷︷ ︸sparsity
.
- Optimize with a ∈ Rn and n = 3n0 − 2, to eliminate boundary effects(shift-truncations).
- Break scale symmetry by spherical constraint a ∈ Sn−1;•Approximate Bilinear Lasso (ABL) [2]
ΨABL(a,x) := 12‖x‖2 − 〈y,a~ x〉 + 1
2‖y‖2 + λ ‖x‖1 ,
- minxΨABL(a,x) has closed-form solution;- ΨABL(a,x) ≈ ΨBL(a,x) when µ(a0)↘ 0. The approximation breakswhen µ(a0) is large.
Nonconvex Optimization Landscape
•Nonconvex landscape over the sphere. StudyΨBL(a) := min
xΨBL(a,x), ΨABL(a) := min
xΨABL(a,x).
- The objectives are convex w.r.t. x;- dim(a)� dim(x), the space of a is where measure concentrates.
(a) ΨABL(a), µ(a0) ≈ 0 (b) ΨBL(a), µ(a0) ≈ 0 (c) ΨBL(a), µ(a0) ≈ 1
•Benign geometry over subspace of shifts [2]SI .=
{ ∑`∈I α`s` [a0] : α` ∈ R
} ⋂ Sn−1
if |I| ∈ O(n0θ), λ ≈ (n0θ)−1/2, and θn0 / µ−1/2(a0).Near s`1[a0]
s`1[a0]
NearS{`1,`2}
s`1[a0]s`2[a0]
S{`1,`2,`3}
s`1[a0] s`2[a0]
s`3[a0]
ϕABL(a)
Sp−1
- Every local minimizer is a shift of the kernel a0;- Saddle point exhibits negative curvature in symmetry breaking directions;- Theoretical justified for ΨABL(a) with µ(a0) ≈ 0 [2], less theoreticallywell-understood when ΨBL(a) or µ(a0) ≈ 1.
Convolutional Dictionary Learning
•To recover multiple unknown kernels, we optimize
minak,xk
12
∥∥∥∥∥∥y −N∑k=1ak ~ xk
∥∥∥∥∥∥2
+ λN∑k=1‖xk‖1 , s.t. ‖ak‖ = 1
by adapting similar ideas for recovering a single kernel.
Optimization from Geometric Intuitions
• Initialize. Use trunk of y (sum of truncated shifts)
data y kernel a0 sparse x0
windowed data a(−1) initialize a(0)
= ∗
≈
αisi[a0] + αjsj[a0]
y consistsof shifts
a(0) nearsubspace Si,j
where a(0) = −PSn−1∇ΨBL(PSn−1(a(−1))
).
•Alternating descent method (ADM)1 Fix a and take proximal gradient on x:
x ← prox (x− τ · ∇xΨBL(a,x))2 Fix x and take a Riemannian gradient on a:
a ← PSn−1 (a− t · gradaΨBL(a,x))3 Alternate until convergence.
•Heuristics for dealing coherent problems
(a) Without momentum (b) With momentum- Momentum acceleration: deal with ill-conditioning by adding momentumterm on descent steps for both a and x.
z(k+1) ← z(k) − α∇f (z(k))︸ ︷︷ ︸gradient direction
+ β(z(k) − z(k−1)
)︸ ︷︷ ︸
inertial term
,
- Homotopy method: adaptively shrink λ through the solution path (a,x)of ADM for optimizing ϕBL(a,x)
(a) ΨBL(a), λ = 5× 10−1 (b) ΨBL(a), λ = 5× 10−2 (c) ΨBL(a), λ = 5× 10−3
•Numerical comparison
(a) Incoherent Kernel (b) Coherent Kernel
(c) Incoherent Kernel (d) Coherent Kernel
Applications
•Calcium Imagingraw vs. estimated calcium sequence
estimated spike train x
estimated kernel a
calcium image Y
kernel Ak (k = 1, 2)
activation map Xk (k = 1, 2)
reconstructionY k = Ak ~Xk (k = 1, 2)
•Defects Detection in Scan Tunneling Microscopy (STM)STM image Y
kernel Ak (k = 1, 2)
activation map Xk (k = 1, 2)
reconstructionY k = Ak ~Xk (k = 1, 2)
• Super-resolution Microscopy Imagingorignal image deconvolved image
• Spike Sorting
References
[1] Y. Lau, Q. Qu, and H. Kuo, P. Zhou, Y. Zhang, and J. Wright, “Short-and-sparse deconvolution – Ageometric approach”, arXiv preprint arXiv:1908.10959, submitted to ICLR’20, 2019.
[2] H. Kuo and Y. Lau, Y. Zhang, and J. Wright, “Geometry and Symmetry in Short-and-Sparse Deconvolution”.ICML, 2019.
[3] Q. Qu and Y. Zhai, X. Li, Y. Zhang, and Z. Zhu, “Geometric analysis of nonconvex optimization landscapesfor overcomplete learning”. submitted to ICLR’20, 2019.
https://deconvlab.github.io/
https://github.com/qingqu06/
sparse_deconvolution