the lost honour of 2-based regularizationthe lost honour of ℓ2-based regularization uri ascher...

Post on 28-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

. . . . . .

.

.

. ..

.

.

The Lost Honour of ℓ2-Based Regularization

Uri Ascher

Department of Computer ScienceUniversity of British Columbia

October 2012

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 1 / 44

. . . . . .

Outilne

.. Outline

Motivation-introduction

Poor data

Highly ill-conditioned large problems

Conclusions

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 2 / 44

. . . . . .

Motivation-Introduction

.. Outline motivation-introduction

Sparse solutions

Piecewise smooth surface reconstruction

ℓ1-based and ℓ2-based regularizations

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 3 / 44

. . . . . .

Motivation-Introduction Sparse solutions

.. Jpeg image compression

Represent raw image using discrete cosine transform (DCT) orwavelet transform.

Retain only largest coefficients, setting the rest to 0.

Below, compression factor ≈ 1/16:

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 4 / 44

. . . . . .

Motivation-Introduction Sparse solutions

.. How to obtain sparse representation?

Mathematically, consider the problem

minu

∥Wu∥ps.t. ∥Ju− b∥2 ≤ ϵ

where J is an m× n real matrix, m ≤ n, and ϵ ≥ 0 depends on noise.A related formulation:

minu

1

2∥Ju− b∥22 + βR(u).

Ideally we want p = 0, meaning min number of nonzero components,i.e., find sparsest model representation. But this problem is NP-hard.

Using p = 2, i.e., ℓ2 norm, z = Wu typically is not sparse.

Using p = 1, i.e., ℓ1 norm, z = Wu often is sparse!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 5 / 44

. . . . . .

Motivation-Introduction Sparse solutions

.. How to obtain sparse representation?

Mathematically, consider the problem

minu

∥Wu∥ps.t. ∥Ju− b∥2 ≤ ϵ

where J is an m× n real matrix, m ≤ n, and ϵ ≥ 0 depends on noise.A related formulation:

minu

1

2∥Ju− b∥22 + βR(u).

Ideally we want p = 0, meaning min number of nonzero components,i.e., find sparsest model representation. But this problem is NP-hard.

Using p = 2, i.e., ℓ2 norm, z = Wu typically is not sparse.

Using p = 1, i.e., ℓ1 norm, z = Wu often is sparse!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 5 / 44

. . . . . .

Motivation-Introduction Sparse solutions

.. How to obtain sparse representation?

Mathematically, consider the problem

minu

∥Wu∥ps.t. ∥Ju− b∥2 ≤ ϵ

where J is an m× n real matrix, m ≤ n, and ϵ ≥ 0 depends on noise.A related formulation:

minu

1

2∥Ju− b∥22 + βR(u).

Ideally we want p = 0, meaning min number of nonzero components,i.e., find sparsest model representation. But this problem is NP-hard.

Using p = 2, i.e., ℓ2 norm, z = Wu typically is not sparse.

Using p = 1, i.e., ℓ1 norm, z = Wu often is sparse!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 5 / 44

. . . . . .

Motivation-Introduction Sparse solutions

.. How to obtain sparse representation?

Mathematically, consider the problem

minu

∥Wu∥ps.t. ∥Ju− b∥2 ≤ ϵ

where J is an m× n real matrix, m ≤ n, and ϵ ≥ 0 depends on noise.A related formulation:

minu

1

2∥Ju− b∥22 + βR(u).

Ideally we want p = 0, meaning min number of nonzero components,i.e., find sparsest model representation. But this problem is NP-hard.

Using p = 2, i.e., ℓ2 norm, z = Wu typically is not sparse.

Using p = 1, i.e., ℓ1 norm, z = Wu often is sparse!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 5 / 44

. . . . . .

Motivation-Introduction Piecewise smooth surface reconstruction

.. Image denoising in 2D

Find u(x , y) that cleans given image b(x , y).

Cameraman 256× 256. Left: clean image. Right: with noise added.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 6 / 44

. . . . . .

Motivation-Introduction Piecewise smooth surface reconstruction

.. Image deblurring in 2D

Find u(x , y) that sharpens given image b(x , y).

Boat 256× 256: clean and blurred.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 7 / 44

. . . . . .

Motivation-Introduction Regularization

..

Tikhonov regularization: encourage piecewise

smoothness

Denote by u discretized mesh function of u(x , y), likewise data b.

Tikhonov: min 12∥Ju− b∥2 + βR(u) .

Consider

R(u) discretization of1

p

∫Ω|∇u|p

Using p = 2, i.e., ℓ2 norm on gradient, then solution is smearedacross discontinuities, i.e., where u(x , y) has jumps.

Using p = 1, i.e., ℓ1 norm on gradient (total variation), then solutionsharper near jumps but often is blocky: sparse gradient!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 8 / 44

. . . . . .

Motivation-Introduction Regularization

..

Tikhonov regularization: encourage piecewise

smoothness

Denote by u discretized mesh function of u(x , y), likewise data b.

Tikhonov: min 12∥Ju− b∥2 + βR(u) .

Consider

R(u) discretization of1

p

∫Ω|∇u|p

Using p = 2, i.e., ℓ2 norm on gradient, then solution is smearedacross discontinuities, i.e., where u(x , y) has jumps.

Using p = 1, i.e., ℓ1 norm on gradient (total variation), then solutionsharper near jumps but often is blocky: sparse gradient!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 8 / 44

. . . . . .

Motivation-Introduction Regularization

..

Tikhonov regularization: encourage piecewise

smoothness

Denote by u discretized mesh function of u(x , y), likewise data b.

Tikhonov: min 12∥Ju− b∥2 + βR(u) .

Consider

R(u) discretization of1

p

∫Ω|∇u|p

Using p = 2, i.e., ℓ2 norm on gradient, then solution is smearedacross discontinuities, i.e., where u(x , y) has jumps.

Using p = 1, i.e., ℓ1 norm on gradient (total variation), then solutionsharper near jumps but often is blocky: sparse gradient!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 8 / 44

. . . . . .

Motivation-Introduction Regularization

.. Huber function regularization

[Ascher, Haber & Huang, 2006; Haber, Heldman & Ascher, 2007; Huang &

Ascher, 2008]: modify total variation.

Use the Huber function

ρ(s) =

s, s ≥ γ,s2/(2γ) + γ/2, s < γ

Then ∇R(u) ← −∇ ·(

min1γ,

1

|∇u| ∇u

)– a particular anisotropic diffusion operator.

Choose γ adaptively, close to total variation:

γ =h

|Ω|

∫Ω|∇u|.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 9 / 44

. . . . . .

Motivation-Introduction Regularization

.. Image deblurring in 2D: example

type = ‘disk’, radius = 5, η = 1, β = 10−4.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 10 / 44

. . . . . .

Motivation-Introduction Regularization

.. Surface triangle mesh denoising

[Huang & Ascher, 2008; 2009]: use an even sharper (non-convex) regularizer.

Left: true. Center: corrupted. Right: reconstructed

Left: corrupted. Right: reconstructed

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 11 / 44

. . . . . .

Motivation-Introduction Regularization

.. p < 1

For sharper reconstruction can consider ℓp, 0 < p < 1 instead ofp = 1.

However, convexity is lost, yielding both computational andtheoretical difficulties [Saab, Chartrand & Yilmaz, 2008; Chartrand, 2009].

[Levin, Fergus, Durand & Freeman, 2007]: choose p = 0.8 for an imagedeblurring application.

But as it turns out, there is nothing special about p = 0.8, and inmost graphics applications p = 1 is fine (and easier to work with).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 12 / 44

. . . . . .

Motivation-Introduction Regularization

.. p < 1

For sharper reconstruction can consider ℓp, 0 < p < 1 instead ofp = 1.

However, convexity is lost, yielding both computational andtheoretical difficulties [Saab, Chartrand & Yilmaz, 2008; Chartrand, 2009].

[Levin, Fergus, Durand & Freeman, 2007]: choose p = 0.8 for an imagedeblurring application.

But as it turns out, there is nothing special about p = 0.8, and inmost graphics applications p = 1 is fine (and easier to work with).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 12 / 44

. . . . . .

Motivation-Introduction Regularization

.. p < 1

For sharper reconstruction can consider ℓp, 0 < p < 1 instead ofp = 1.

However, convexity is lost, yielding both computational andtheoretical difficulties [Saab, Chartrand & Yilmaz, 2008; Chartrand, 2009].

[Levin, Fergus, Durand & Freeman, 2007]: choose p = 0.8 for an imagedeblurring application.

But as it turns out, there is nothing special about p = 0.8, and inmost graphics applications p = 1 is fine (and easier to work with).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 12 / 44

. . . . . .

Motivation-Introduction Regularization

.. p < 1

For sharper reconstruction can consider ℓp, 0 < p < 1 instead ofp = 1.

However, convexity is lost, yielding both computational andtheoretical difficulties [Saab, Chartrand & Yilmaz, 2008; Chartrand, 2009].

[Levin, Fergus, Durand & Freeman, 2007]: choose p = 0.8 for an imagedeblurring application.

But as it turns out, there is nothing special about p = 0.8, and inmost graphics applications p = 1 is fine (and easier to work with).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 12 / 44

. . . . . .

Motivation-Introduction Regularization

.. Virtues of ℓ1-based regularization

Lots of exciting recent work on sparse recovery of signals, includingcompressive sensing, using ℓ1. Texts: [Mallat, 2009; Elad, 2010]. Somekey papers: [Donoho & Tanner, 2005; Donoho, 2006; Candes, Wakin &

Boyd, 2008; Juditsky & Nemirovski, 2008].

Lots of exciting recent work on total variation-based methods. Texts:[Chan & Shen, 2005; Osher & Fedkiw, 2003]. Some key papers: [Rudin,

Osher & Fatemi, 1992].

Perhaps ℓ1-based regularization methodology ought to altogetherreplace the veteran ℓ2-based regularization approaches.

A step too far?! – This does not quite agree with our experience inseveral situations.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 13 / 44

. . . . . .

Motivation-Introduction Regularization

.. Virtues of ℓ1-based regularization

Lots of exciting recent work on sparse recovery of signals, includingcompressive sensing, using ℓ1. Texts: [Mallat, 2009; Elad, 2010]. Somekey papers: [Donoho & Tanner, 2005; Donoho, 2006; Candes, Wakin &

Boyd, 2008; Juditsky & Nemirovski, 2008].

Lots of exciting recent work on total variation-based methods. Texts:[Chan & Shen, 2005; Osher & Fedkiw, 2003]. Some key papers: [Rudin,

Osher & Fatemi, 1992].

Perhaps ℓ1-based regularization methodology ought to altogetherreplace the veteran ℓ2-based regularization approaches.

A step too far?! – This does not quite agree with our experience inseveral situations.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 13 / 44

. . . . . .

Motivation-Introduction Regularization

.. Virtues of ℓ1-based regularization

Lots of exciting recent work on sparse recovery of signals, includingcompressive sensing, using ℓ1. Texts: [Mallat, 2009; Elad, 2010]. Somekey papers: [Donoho & Tanner, 2005; Donoho, 2006; Candes, Wakin &

Boyd, 2008; Juditsky & Nemirovski, 2008].

Lots of exciting recent work on total variation-based methods. Texts:[Chan & Shen, 2005; Osher & Fedkiw, 2003]. Some key papers: [Rudin,

Osher & Fatemi, 1992].

Perhaps ℓ1-based regularization methodology ought to altogetherreplace the veteran ℓ2-based regularization approaches.

A step too far?! – This does not quite agree with our experience inseveral situations.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 13 / 44

. . . . . .

Motivation-Introduction Regularization

.. Virtues of ℓ1-based regularization

Lots of exciting recent work on sparse recovery of signals, includingcompressive sensing, using ℓ1. Texts: [Mallat, 2009; Elad, 2010]. Somekey papers: [Donoho & Tanner, 2005; Donoho, 2006; Candes, Wakin &

Boyd, 2008; Juditsky & Nemirovski, 2008].

Lots of exciting recent work on total variation-based methods. Texts:[Chan & Shen, 2005; Osher & Fedkiw, 2003]. Some key papers: [Rudin,

Osher & Fatemi, 1992].

Perhaps ℓ1-based regularization methodology ought to altogetherreplace the veteran ℓ2-based regularization approaches.

A step too far?! – This does not quite agree with our experience inseveral situations.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 13 / 44

. . . . . .

Motivation-Introduction Regularization

.. Notation

We saw four regularization methods with variations:12∥Wu∥22 – denote by L2

∥Wu∥1 – denote by L112

∫Ω |∇u|2 – denote by L2G∫

Ω |∇u| – denote by L1G (TV)

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 14 / 44

. . . . . .

Motivation-Introduction Poor data

.. Another image deblurring example

Goal: recover clean image given noisy, blurred data.

Blurring kernel: e−∥x∥22/2σ with σ = 0.01; the blurred data is further

corrupted by 1% white noise.

Left: ground truth. Right: Blurred and noisy image

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 15 / 44

. . . . . .

Motivation-Introduction Poor data

.. Another image deblurring example cont.

Try (i) RestoreTools [Hansen, Nagy & O’Leary, 2006], which is L2-typerecovery strategy (p = 2,W = I ); (ii) GPSR [Figuerido, Nowak & Wright,

2007], which employs a wavelet L1 recovery algorithm; and (iii) astraightforward L1G code.

Left: L2. Center: L1. Right: L1G.

Note the L2 recovery costs a tiny fraction compared to the L1 solve, andthe result is not worse.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 16 / 44

. . . . . .

Poor data

.. Outline

Motivation-introduction

Poor data

Highly ill-conditioned large problems

Conclusions

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 17 / 44

. . . . . .

Poor data

.. Outline poor data

Outliers

Relatively high noise level

Rare (sparse) data

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 18 / 44

. . . . . .

Poor data outliers, high noise level, rare data

.. Poor data

Outliers: ℓ1-based data fitting is good for outliers! (Known fordecades; for point clouds, see LOP [Lipman, Cohen-Or, Levin &

Tal-Ezer, 2007].)Can see this by considering overdetermined n ×m problem

minx∥Jx− y∥1,

J “long and skinny”. This can be cast as a linear programmingproblem: optimal solution x gives at least m zero residual rows.Outliers usually correspond to non-basic residual rows, hence theirvalues matter less.

High noise level: ℓ1-based no longer has particular advantage.

Rare (sparse) data given only at a few locations: ℓ1-based no longerhas particular advantage.

Note: what counts as “poor” depends on the circumstances.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 19 / 44

. . . . . .

Poor data outliers, high noise level, rare data

.. Poor data

Outliers: ℓ1-based data fitting is good for outliers! (Known fordecades; for point clouds, see LOP [Lipman, Cohen-Or, Levin &

Tal-Ezer, 2007].)Can see this by considering overdetermined n ×m problem

minx∥Jx− y∥1,

J “long and skinny”. This can be cast as a linear programmingproblem: optimal solution x gives at least m zero residual rows.Outliers usually correspond to non-basic residual rows, hence theirvalues matter less.

High noise level: ℓ1-based no longer has particular advantage.

Rare (sparse) data given only at a few locations: ℓ1-based no longerhas particular advantage.

Note: what counts as “poor” depends on the circumstances.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 19 / 44

. . . . . .

Poor data outliers, high noise level, rare data

.. Poor data

Outliers: ℓ1-based data fitting is good for outliers! (Known fordecades; for point clouds, see LOP [Lipman, Cohen-Or, Levin &

Tal-Ezer, 2007].)Can see this by considering overdetermined n ×m problem

minx∥Jx− y∥1,

J “long and skinny”. This can be cast as a linear programmingproblem: optimal solution x gives at least m zero residual rows.Outliers usually correspond to non-basic residual rows, hence theirvalues matter less.

High noise level: ℓ1-based no longer has particular advantage.

Rare (sparse) data given only at a few locations: ℓ1-based no longerhas particular advantage.

Note: what counts as “poor” depends on the circumstances.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 19 / 44

. . . . . .

Poor data outliers, high noise level, rare data

.. Poor data

Outliers: ℓ1-based data fitting is good for outliers! (Known fordecades; for point clouds, see LOP [Lipman, Cohen-Or, Levin &

Tal-Ezer, 2007].)Can see this by considering overdetermined n ×m problem

minx∥Jx− y∥1,

J “long and skinny”. This can be cast as a linear programmingproblem: optimal solution x gives at least m zero residual rows.Outliers usually correspond to non-basic residual rows, hence theirvalues matter less.

High noise level: ℓ1-based no longer has particular advantage.

Rare (sparse) data given only at a few locations: ℓ1-based no longerhas particular advantage.

Note: what counts as “poor” depends on the circumstances.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 19 / 44

. . . . . .

Poor data outliers, high noise level, rare data

.. Poor data

Outliers: ℓ1-based data fitting is good for outliers! (Known fordecades; for point clouds, see LOP [Lipman, Cohen-Or, Levin &

Tal-Ezer, 2007].)Can see this by considering overdetermined n ×m problem

minx∥Jx− y∥1,

J “long and skinny”. This can be cast as a linear programmingproblem: optimal solution x gives at least m zero residual rows.Outliers usually correspond to non-basic residual rows, hence theirvalues matter less.

High noise level: ℓ1-based no longer has particular advantage.

Rare (sparse) data given only at a few locations: ℓ1-based no longerhas particular advantage.

Note: what counts as “poor” depends on the circumstances.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 19 / 44

. . . . . .

Poor data Rare data

.. Rare (sparse) data: a simple example

Recover signal u∗(t) on [0, 1] discretized at n = 512 equidistantpoints, from m≪ n data pairs (ti , ui )

mi=1.

Draw locations ti at random from the n-mesh. Calculate ui as u∗(ti )plus 5% white noise.

Tune β in min 12∥Ju− b∥2 + βR(u) by discrepancy principle.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

t

u

TrueDataL1GL2G

Using m=9 data pairs, with βL1G=.08, βL2G=.04

For m=9, no significant quality difference between ℓ1-based and ℓ2-based.Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 20 / 44

. . . . . .

Poor data Rare data

.. Rare (sparse) data: a simple example

Recover signal u∗(t) on [0, 1] discretized at n = 512 equidistantpoints, from m≪ n data pairs (ti , ui )

mi=1.

Draw locations ti at random from the n-mesh. Calculate ui as u∗(ti )plus 5% white noise.

Tune β in min 12∥Ju− b∥2 + βR(u) by discrepancy principle.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

t

u

TrueDataL1GL2G

Using m=9 data pairs, with βL1G=.08, βL2G=.04

For m=9, no significant quality difference between ℓ1-based and ℓ2-based.Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 20 / 44

. . . . . .

Poor data Rare data

.. Rare (sparse) data: a simple example cont.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

t

u

TrueDataL1GL2G

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

t

u

TrueDataL1GL2G

Using m=28 data pairs, with βL1G=.08. Left: βL2G=.002,Right: βL2G=.02

For m=28, the ℓ1-based method performs better than the ℓ2-based one.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 21 / 44

. . . . . .

Poor data Rare data

.. Example: Edge aware resampling (EAR)

[Huang, Wu, Gong, Cohen-Or, Ascher & Zhang, 2012; Huang, Li, Zhang, Ascher

& Cohen-Or, 2009]

Left: Point cloud for fin shape 150 angle at edge; 1.0%noise. Right: Profile of normals using classical PCA.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 22 / 44

. . . . . .

Poor data Rare data

.. EAR example cont.

Alternative: ℓ1 for normals [Avron, Sharf, Greif & Cohen-Or, 2010].

Left: Normals after applying ℓ1-minimization. Right:Normals after applying EAR.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 23 / 44

. . . . . .

Highly ill-conditioned large problems

.. Outline

Motivation-introduction

Poor data

Highly ill-conditioned large problems

Conclusions

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 24 / 44

. . . . . .

Highly ill-conditioned large problems

.. Outline highly ill-conditioned large problems

Computational difficulty with L1

Computed myography and inverse potential

Nonlinearity, EIT and DC resistivity

Assessment

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 25 / 44

. . . . . .

Highly ill-conditioned large problems Computational difficulties with ℓ1

.. Computational difficulty with L1

minu

1

2∥Ju− b∥2 + βR(u)

Several famous codes for L1 employ gradient projection withacceleration (because such methods extend well for constrainednon-smooth problems).

However, such methods have poor convergence properties for large,highly ill-conditioned problems.

The situation is better for L1G (total variation and variants),although still L2G is faster.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 26 / 44

. . . . . .

Highly ill-conditioned large problems Computational difficulties with ℓ1

.. Computational difficulty with L1

minu

1

2∥Ju− b∥2 + βR(u)

Several famous codes for L1 employ gradient projection withacceleration (because such methods extend well for constrainednon-smooth problems).

However, such methods have poor convergence properties for large,highly ill-conditioned problems.

The situation is better for L1G (total variation and variants),although still L2G is faster.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 26 / 44

. . . . . .

Highly ill-conditioned large problems Computational difficulties with ℓ1

.. Computational difficulty with L1

minu

1

2∥Ju− b∥2 + βR(u)

Several famous codes for L1 employ gradient projection withacceleration (because such methods extend well for constrainednon-smooth problems).

However, such methods have poor convergence properties for large,highly ill-conditioned problems.

The situation is better for L1G (total variation and variants),although still L2G is faster.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 26 / 44

. . . . . .

Highly ill-conditioned large problems Computed myography

.. Source localization in electromyography

[van den Doel, Ascher & Pai, 2008, 2011]

Determine electric activity of individual muscles in human limb usingsEMG.

Applications: prosthetic control, muscle function assessment.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 27 / 44

. . . . . .

Highly ill-conditioned large problems Computed myography

..

Example: MRI data segmented into geometry

model

3D model of the upper arm constructed from MRI data of a humansubject segmented into different anatomical regions: brachialis, biceps,triceps, fat and skin, and bone.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 28 / 44

. . . . . .

Highly ill-conditioned large problems Computed myography

.. Reconstructions

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 29 / 44

. . . . . .

Highly ill-conditioned large problems Computed myography

.. CMG problem

Potential problem

−∇ · (σ∇v) = u in Ω,

subject to Neumann boundary conditions.

Inverse problem: recover source u from measurements of v onboundary.

In previous notation, J = QA−1, with A the discretization matrix ofthe potential problem and Q the data projector operator.

Highly ill-posed: many different sources explain data. Regularizationshould select solution that agrees with a priori information.

Model consists of set of discrete tripoles: use L1 regularization?

Typical 3D calculation: 50,000 finite elements. L1 codes based ongradient projection are hopeless!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 30 / 44

. . . . . .

Highly ill-conditioned large problems Computed myography

.. CMG problem

Potential problem

−∇ · (σ∇v) = u in Ω,

subject to Neumann boundary conditions.

Inverse problem: recover source u from measurements of v onboundary.

In previous notation, J = QA−1, with A the discretization matrix ofthe potential problem and Q the data projector operator.

Highly ill-posed: many different sources explain data. Regularizationshould select solution that agrees with a priori information.

Model consists of set of discrete tripoles: use L1 regularization?

Typical 3D calculation: 50,000 finite elements. L1 codes based ongradient projection are hopeless!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 30 / 44

. . . . . .

Highly ill-conditioned large problems Computed myography

.. CMG problem

Potential problem

−∇ · (σ∇v) = u in Ω,

subject to Neumann boundary conditions.

Inverse problem: recover source u from measurements of v onboundary.

In previous notation, J = QA−1, with A the discretization matrix ofthe potential problem and Q the data projector operator.

Highly ill-posed: many different sources explain data. Regularizationshould select solution that agrees with a priori information.

Model consists of set of discrete tripoles: use L1 regularization?

Typical 3D calculation: 50,000 finite elements. L1 codes based ongradient projection are hopeless!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 30 / 44

. . . . . .

Highly ill-conditioned large problems Computed myography

.. CMG problem

Potential problem

−∇ · (σ∇v) = u in Ω,

subject to Neumann boundary conditions.

Inverse problem: recover source u from measurements of v onboundary.

In previous notation, J = QA−1, with A the discretization matrix ofthe potential problem and Q the data projector operator.

Highly ill-posed: many different sources explain data. Regularizationshould select solution that agrees with a priori information.

Model consists of set of discrete tripoles: use L1 regularization?

Typical 3D calculation: 50,000 finite elements. L1 codes based ongradient projection are hopeless!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 30 / 44

. . . . . .

Highly ill-conditioned large problems Computed myography

.. CMG problem

Potential problem

−∇ · (σ∇v) = u in Ω,

subject to Neumann boundary conditions.

Inverse problem: recover source u from measurements of v onboundary.

In previous notation, J = QA−1, with A the discretization matrix ofthe potential problem and Q the data projector operator.

Highly ill-posed: many different sources explain data. Regularizationshould select solution that agrees with a priori information.

Model consists of set of discrete tripoles: use L1 regularization?

Typical 3D calculation: 50,000 finite elements. L1 codes based ongradient projection are hopeless!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 30 / 44

. . . . . .

Highly ill-conditioned large problems Inverse potential problem

.. Inverse potential problem

Study the simpler problem with σ ≡ 1 on a square (2D):

−∆v = u(x), x ∈ Ω.

Generate data using 64× 64 grid, pollute by 1% white noise. Groundtruth for sought source u are various tripole distributions.

Left: ground truth. Center: L2G. Right: L1G.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 31 / 44

. . . . . .

Highly ill-conditioned large problems Inverse potential problem

.. More example instances

Left: ground truth. Center: L2G. Right: L1G.

Next, consider a point charge pair for a source:

Left: ground truth. Center: L2G. Right: L1G.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 32 / 44

. . . . . .

Highly ill-conditioned large problems Inverse potential problem

.. More example instances

Left: ground truth. Center: L2G. Right: L1G.

Next, consider a point charge pair for a source:

Left: ground truth. Center: L2G. Right: L1G.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 32 / 44

. . . . . .

Highly ill-conditioned large problems Inverse potential problem

.. L1 and L2 for a point charge pair

Left: L2. Center: L1. Right: weighted L1.

Indeed, the L1 methods give sparse solutions, but not very sparse and notvery good ones.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 33 / 44

. . . . . .

Highly ill-conditioned large problems Inverse potential problem

.. Simple analysis

minu

1

2∥Ju− b∥2 + β∥Wu∥1

Singular value decomposition:

J = UΣV T

with Σ = diag σi, σ1 ≥ · · · ≥ σm ≥ 0, and U,V orthogonal.Special case: W = V T . Then for z = Wu, c = UTb, consider

minz

1

2∥Σz− c∥2 + β∥z∥1

Representation for true model z∗ = Wu∗ satisfies σiz∗i = ci + εi .

Consider a sparse representation

z∗i = 1 if i ∈ T , = 0 if i /∈ T .Under what conditions can we expect to stably calculate z with thesame sparsity (i.e. zi = 0 iff i /∈ T )?

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 34 / 44

. . . . . .

Highly ill-conditioned large problems Inverse potential problem

.. Simple analysis

minu

1

2∥Ju− b∥2 + β∥Wu∥1

Singular value decomposition:

J = UΣV T

with Σ = diag σi, σ1 ≥ · · · ≥ σm ≥ 0, and U,V orthogonal.Special case: W = V T . Then for z = Wu, c = UTb, consider

minz

1

2∥Σz− c∥2 + β∥z∥1

Representation for true model z∗ = Wu∗ satisfies σiz∗i = ci + εi .

Consider a sparse representation

z∗i = 1 if i ∈ T , = 0 if i /∈ T .Under what conditions can we expect to stably calculate z with thesame sparsity (i.e. zi = 0 iff i /∈ T )?

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 34 / 44

. . . . . .

Highly ill-conditioned large problems Inverse potential problem

.. Simple analysis for L1

Assume noise with mean 0 and covariance ρ2I , and define

σ+ = maxi /∈T

σi , σ− = mini∈T

σi .

Theorem:Using L1, the true and reconstructed models, z∗ and z, are expectedto have the same zero structure only if either σ+ ≤ σ− or

ρ ≤σ2−√

σ2+ − σ2

.

(Can’t get the sparse computed z stably for just any sparse z∗.)

Further difficulties arise in the latter case in determining theregularization parameter β.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 35 / 44

. . . . . .

Highly ill-conditioned large problems Inverse potential problem

.. Simple analysis for L1

Assume noise with mean 0 and covariance ρ2I , and define

σ+ = maxi /∈T

σi , σ− = mini∈T

σi .

Theorem:Using L1, the true and reconstructed models, z∗ and z, are expectedto have the same zero structure only if either σ+ ≤ σ− or

ρ ≤σ2−√

σ2+ − σ2

.

(Can’t get the sparse computed z stably for just any sparse z∗.)

Further difficulties arise in the latter case in determining theregularization parameter β.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 35 / 44

. . . . . .

Highly ill-conditioned large problems Inverse potential problem

.. Simple analysis for L1

Assume noise with mean 0 and covariance ρ2I , and define

σ+ = maxi /∈T

σi , σ− = mini∈T

σi .

Theorem:Using L1, the true and reconstructed models, z∗ and z, are expectedto have the same zero structure only if either σ+ ≤ σ− or

ρ ≤σ2−√

σ2+ − σ2

.

(Can’t get the sparse computed z stably for just any sparse z∗.)

Further difficulties arise in the latter case in determining theregularization parameter β.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 35 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. Nonlinear inverse problems

Nonlinear objective function and ℓ1 constraints.

Note: when the constraint (solid red) is non-linear, it does notnecessarily intersect the level set of ∥u∥1 at a vertex, so the solutionis not necessarily sparse.

Still, L1G can be very useful due to less smearing across jumps!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 36 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. Nonlinear inverse problems

Nonlinear objective function and ℓ1 constraints.

Note: when the constraint (solid red) is non-linear, it does notnecessarily intersect the level set of ∥u∥1 at a vertex, so the solutionis not necessarily sparse.

Still, L1G can be very useful due to less smearing across jumps!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 36 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. Nonlinear inverse problems

Nonlinear objective function and ℓ1 constraints.

Note: when the constraint (solid red) is non-linear, it does notnecessarily intersect the level set of ∥u∥1 at a vertex, so the solutionis not necessarily sparse.

Still, L1G can be very useful due to less smearing across jumps!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 36 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. EIT and DC resistivity

[Haber, Heldman & Ascher, 2007; van den Doel & Ascher, 2011; Haber, Chung

& Herrmann, 2011; Roosta-Khorasani, van den Doel & Ascher, 2012]

Forward problem:

∇ · (σ∇v i ) = qi , x ∈ Ω, i = 1, . . . , s,

∂v i

∂ν|∂Ω = 0.

Take Ω to be a square. Construct s data sets, choosing differentcurrent patterns:

qi (x) = δx,p

iLL

− δx,p

iRR

,

where piLL and piR

R are located on the left and right boundaries resp.Place each at

√s different, equidistant locations, 1 ≤ iL, iR ≤

√s.

Predict data by measuring field at specified locations:

F (u) = (F1(u), . . . ,Fs(u))T .

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 37 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. EIT and DC resistivity: incorporating bounds

Often we have a priori information that σmin ≤ σ(x) ≤ σmax.

Predict data by measuring field at specified locations:

F (u) = (F1(u), . . . ,Fs(u))T , whereu(x) = ψ−1(σ(x)), with transfer functionψ(t) = .5(σmax − σmin) tanh(t) + .5(σmax + σmin).

In the following experiments use

σmin = 1, σmax = 10,

and calculate on a 64× 64 grid.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 38 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. EIT and DC resistivity: incorporating bounds

Often we have a priori information that σmin ≤ σ(x) ≤ σmax.

Predict data by measuring field at specified locations:

F (u) = (F1(u), . . . ,Fs(u))T , whereu(x) = ψ−1(σ(x)), with transfer functionψ(t) = .5(σmax − σmin) tanh(t) + .5(σmax + σmin).

In the following experiments use

σmin = 1, σmax = 10,

and calculate on a 64× 64 grid.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 38 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

..

EIT and DC resistivity example: multiple

right hand sides

Note: having many experiments here helps, both theoretically andpractically! Thus, may want s large, although this brings majorcomputational difficulties.

Use stochastic methods for model reduction in case that s is large.

Synthesize experiment’s data by using “true” u(x), calculating dataon twice-as-fine grid, adding 3% Gaussian noise.

Generalized Tikhonov:

minu

1

2∥F (u)− b∥2 + βR(u)

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 39 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. L2G vs L1G: results

Left: true. Center: s = 4, L2G. Right: s = 4, L1G.

Left: true. Center: s = 64, L2G. Right: s = 64, L1G.Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 40 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. L2G vs L1G: results cont.

Decrease noise to 1%, increase√

s.

2

3

4

5

6

7

8

9

2

3

4

5

6

7

8

9

Left: s = 1024, L2G. Right: s = 1024, L1G.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 41 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. L2G vs L1G results: observations

Indeed, better results for larger s.

For s = 64 the L1G results are somewhat better than L2G.

Upon increasing s further, and using finer grids, and having lessnoise, L1G becomes significantly better than L2G.

Expensive computing may be encountered when trying to obtain moreaccurate solutions. Even more so in 3D. Adaptive grids (meshes) andrandom sampling of right hand side combinations help decrease thepain significantly!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 42 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. L2G vs L1G results: observations

Indeed, better results for larger s.

For s = 64 the L1G results are somewhat better than L2G.

Upon increasing s further, and using finer grids, and having lessnoise, L1G becomes significantly better than L2G.

Expensive computing may be encountered when trying to obtain moreaccurate solutions. Even more so in 3D. Adaptive grids (meshes) andrandom sampling of right hand side combinations help decrease thepain significantly!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 42 / 44

. . . . . .

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

.. L2G vs L1G results: observations

Indeed, better results for larger s.

For s = 64 the L1G results are somewhat better than L2G.

Upon increasing s further, and using finer grids, and having lessnoise, L1G becomes significantly better than L2G.

Expensive computing may be encountered when trying to obtain moreaccurate solutions. Even more so in 3D. Adaptive grids (meshes) andrandom sampling of right hand side combinations help decrease thepain significantly!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 42 / 44

. . . . . .

Highly ill-conditioned large problems Is ℓ1-based worthwhile?

..

ℓ1-based regularization for large, highly

ill-conditioned problems?

Yes, L1G, aka total variation, is worth considering!

although, it does not always deliver and it’s often not cheap to workwith.

Bigger doubts linger regarding L1 (when relevant), not only becauseof the possibly prohibitive cost:

For the inverse potential problem, can show based on physical groundsthat the γ-criterion of Juditsky & Nemirovski is violated.Furthermore, our numerical results for the inverse potential problem donot show positive evidence.Can show that in some situations, using L1 can be highly restrictiveand ineffective, regardless of cost.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 43 / 44

. . . . . .

Highly ill-conditioned large problems Is ℓ1-based worthwhile?

..

ℓ1-based regularization for large, highly

ill-conditioned problems?

Yes, L1G, aka total variation, is worth considering!

although, it does not always deliver and it’s often not cheap to workwith.

Bigger doubts linger regarding L1 (when relevant), not only becauseof the possibly prohibitive cost:

For the inverse potential problem, can show based on physical groundsthat the γ-criterion of Juditsky & Nemirovski is violated.Furthermore, our numerical results for the inverse potential problem donot show positive evidence.Can show that in some situations, using L1 can be highly restrictiveand ineffective, regardless of cost.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 43 / 44

. . . . . .

Highly ill-conditioned large problems Is ℓ1-based worthwhile?

..

ℓ1-based regularization for large, highly

ill-conditioned problems?

Yes, L1G, aka total variation, is worth considering!

although, it does not always deliver and it’s often not cheap to workwith.

Bigger doubts linger regarding L1 (when relevant), not only becauseof the possibly prohibitive cost:

For the inverse potential problem, can show based on physical groundsthat the γ-criterion of Juditsky & Nemirovski is violated.Furthermore, our numerical results for the inverse potential problem donot show positive evidence.Can show that in some situations, using L1 can be highly restrictiveand ineffective, regardless of cost.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 43 / 44

. . . . . .

Conclusions Conclusions

.. Conclusions

In many situations, ℓ1-based regularization is well-worth using. Suchtechniques can provide exciting advances (e.g. in model reduction).

However, such techniques are not suitable for all problems, and it isdangerous (and may consume many student-years) to apply themblindly.

In practice, always consider first using ℓ2-based regularization,because they are simpler and more robust. Only upon deciding thatthese are not sufficiently good for the given application, proceed toexamine ℓ1-based alternatives (when this makes sense).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 44 / 44

. . . . . .

Conclusions Conclusions

.. Conclusions

In many situations, ℓ1-based regularization is well-worth using. Suchtechniques can provide exciting advances (e.g. in model reduction).

However, such techniques are not suitable for all problems, and it isdangerous (and may consume many student-years) to apply themblindly.

In practice, always consider first using ℓ2-based regularization,because they are simpler and more robust. Only upon deciding thatthese are not sufficiently good for the given application, proceed toexamine ℓ1-based alternatives (when this makes sense).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 44 / 44

. . . . . .

Conclusions Conclusions

.. Conclusions

In many situations, ℓ1-based regularization is well-worth using. Suchtechniques can provide exciting advances (e.g. in model reduction).

However, such techniques are not suitable for all problems, and it isdangerous (and may consume many student-years) to apply themblindly.

In practice, always consider first using ℓ2-based regularization,because they are simpler and more robust. Only upon deciding thatthese are not sufficiently good for the given application, proceed toexamine ℓ1-based alternatives (when this makes sense).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 44 / 44

top related