image processing and computational mathematics · 7/10/2012 · image processing and computational...

Image Processing and Computational Mathematics

Tony F. Chan Hong Kong University of Science and Technology

Talk prepared in collaboration with

Xavier Bresson, Department of Computer Science, City University of Hong Kong

60th SIAM Annual Meeting

July 10, 2012

Input also from Ernie Esser, UC Irvine and Luminita Vese, UCLA.

(SIAM founded ? 1952. TC since January 1952.)

-  Relatively new to SIAM Community -  SIIMS founded 2007. Now 2nd highest impact journal in

all Applied Math (out of 245). #1 in Imaging Science SIIMS proposal

-  Imaging journals: SIAM SIIMS IEEE TIP JMIV IJCV

-  Imaging conferences: SIAM Imaging conference Scale-Space conference IEEE ICIP

- The new “CFD”. Part of “ Data Science”

Imaging science

IEEE TRANSACTIONS ON IMAGE PROCESSING A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY Volume 7, Number 3, March 1998

What SIAM brings to IP

Image Processing and Applied Maths

Math IP

PDEs (CFD, viscosity solution) Calculus of variations

Differential geometry (level set method) Smooth & non-smooth functional analysis

(wavelets/Besov, total variation/BV)

Compressed sensing, L1 optimization (Bregman) Convexification for geometric problems

Relaxation for graph optimization problems

Timeline

1985 1990 2000 2005 2010 1995

Mumford-Shah segmentation

Perona-Malik AD Rudin-Osher-Fatemi

Denoising/Segmentation Chan-Shen

Chan-Shen-Zhou TVWI

Candes-Romberg-Tao Donoho (CS)

Inpainting/CS

Kass-Witkin-Terzopoulos Snake

Caselles-Kimmel-Sapiro Geodesic Active Contour

Chan-Vese ACWE

Vese-Chan multiphase

Segmentation

Chan-Esedoglu TVL1

Chan-Esedoglu-Nikolova CMS Bresson-Esedoglu-Osher-etal

Geometric convexification

Rudin-Osher-Fatemi Gradient Flow

Osher-Burger-Goldfarb-Xu-Yin Bregman

Goldstein-Osher Split-Bregman

TV Optimization

IEEE special issue SIAG IS SIAM SIIMS

SIAM IS Conf Scale Space Conf

IP journals/confs

Bertalmio-Sapiro- Caselles-Ballester

Chan-Golub-Mulet Newton

IP challenges

-  Design non-linear models to handle image discontinuities (edges)

-  Make the most of geometry

-  Develop fast algorithms to deal with important data volumes

(RGB, time, multispectral components)

[Ron Kimmel]

PDEs in Image Processing

-  Scale space filtering [Witkin 1983] -  Optimal approximations by piecewise smooth functions and

associated variational problems [Mumford, Shah 1989] -  Scale-space and edge detection using anisotropic diffusion [Perona,

Malik 1990] -  Motion of level sets by mean curvature motion [Evan, Spruck 1991] -  Nonlinear total variation based noise removal algorithms [Rudin,

Osher, Fatemi 1992] -  Axioms and fundamental equations of image processing [Alvarez,

Guichard, Lions, Morel 1993] -  Feature-oriented image enhancement using shock filters [Osher,

Rudin 1990]

Perona & Malik ‘90

Model properties: -  Remove noise while preserving edges -  Introduce feedback: adapt diffusity g

to evolving image u(x,t)

Nonlinear diffusion equation: ∂tu = div(g(|∇u|2)∇u)

where •  is a fuzzy edge detector •  diffusivity g is e.g. |∇u|2

g(|∇u|2) = 1

1 + |∇u|2/λ2

Existence of unique smooth solution (using Gaussian regularization in g): [Catte, Lions, Morel, Coll 1992]

Rudin, Osher & Fatemi ‘92 Variational model: min

uTV (u) +

2Ku− u02

-  Allows for edge capturing (discontinuities along level curves) -  TVD schemes popular for shock capturing -  TV controls both size of discontinuity and geometry of boundaries

Gradient flow: ∂tu = div(∇u

|∇u| )− (KKu−Ku0)∂u

∂n= 0

anisotropic diffusion data fidelity

Inpainting

Unifying TV restoration and inpainting processes:

E∪D|∇u|+ λ

E|u− u0|2

−div(∇u

|∇u| ) + λ(u− u0) = 0

λE x ∈ E

0 x ∈ D

From Geary Gallery

Not reliable in non-repeating

regions.

Original !Image!Texture !

Synthesis!Elastica !

Inpainting!

Combining Texture Synthesis (Efros & Leung 97) & Elastica Inpainting (C, Kang, Shen 2001) (C, Ni, Roble SIGGRAPH 2007)

Original! Result!

image provided by Doug Enright!

PDEs & differential geometry in IP

-  Level set method: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations [Osher & Sethian ‘88]

-  Image segmentation: [Mumford & Shah ‘89] [Chan & Vese ‘01] [Vese & Chan ‘02] [Chung & Vese ’03] [Lie, Lysaker & Tai ‘06]

Mumford & Shah ‘89 Mumford-Shah optimization problem:

mins,C

Ω(s− u0)

2 + λ

Ω\C|∇s|2 + γ|C|

Conjecture [MS ‘89] (still open): There exists (at least) one minimizer s.t. 1) 2) C is made up of a finite union of -regular arcs, with some special cases: crack tip and triple junction (where three branches meet with angle)

s ∈ C1(Ω \ C)C1

Original image Discontinuity part (edges) Smooth part

[Hewer, Kenney & Manjunath]

Chan & Vese ‘01 - Two-phase piecewise constant Mumford-Shah model:

minC,c1,c2

γ|C|+

inside(C)|u0 − c1|2 +

outside(C)|u0 − c2|2

data fidelity geometric regularization (length)

- Powerful technique: level set method [Osher & Sethian ‘88]

Inside C

Outside C

Outside C 0<!

|| Normal

(( divKCurvaturen!

0),(|),( == yxyxC !

boundary of an open domain

Variational formulation and level set method [Zhao, Chan, Merriman and Osher ’96]

The Heaviside function

The level set formulation of the active contour model:

0 if ,00 if ,1

The Euler-Lagrange equations:

Length: |C| =

Ω|∇H(φ)|

Area (inside(C)): |C| =

ΩH(φ)

minφ,c1,c2

Ω|∇H(φ)|+

Ω|u0 − c1|2H(φ) +

Ω|u0 − c2|2(1−H(φ))

∂tφ = δε(φ)γdiv(

|∇φ| )− |u0 − c1|2 + |u0 − c2|2

Europe Galaxy

Extension

- Image segmentation using a multilayer level-set approach [Chung & Vese ‘03] Ωi = x : li ≤ φ(x) ≤ li+1

Ωi = x : φ(x) = li

- A variant of the level set method and applications to image segmentation [Lie, Lysaker & Tai ‘06]

- A multiphase level set framework for image segmentation using the Mumford and Shah model [Vese & Chan ‘02] n level set functions → 2n regions

Geometric harmonic analysis in IP

-  JPEG (DCT), JPEG2000 (DWT) -  Wavelets [Mallat ’89, Coifman, Meyer, Wickerhauser ‘92] -  Curvelets [Starck, Candes, Donoho ‘02], bandelets [Le

Pennec, Mallat ’05] -  Denoising by soft-thresholding [Donoho ‘95] -  Besov spaces and wavelet shrinkage [Chambolle, De Vore,

Lee, Lucier ‘98] -  ENO wavelets [Chan, Zhou ‘03]

-  Will not elaborate in this talk.

BV, TV analysis in IP

-  Image Recovery via Total Variation Minimization and Related Problems [Chambolle, Lions 1997]

-  Functions of Bounded Variation and Free Discontinuity Problems [Ambrosio, Fusco, Pallara 2000]

-  Oscillating patterns in image processing and nonlinear evolution equations [Meyer 2001]

-  TV-L1: Aspects of total variation regularized L1 function approximation [Chan, Esedo!lu 2005]

-  The discontinuity set of solutions of the TV denoising problem and some extensions [Caselles, Chambolle, Novaga 2007]

Structure-texture decomposition [Meyer ‘01]

- [Rudin, Osher, Fatemi ‘92] decomposes image f into u in BV and v in L2: min

(u,v)∈BV×L2:f=u+v

|∇u|+ λv22

- [Meyer ‘01] decomposes f into u in BV and v in the dual space G: min

(u,v)∈BV×G:f=u+v

|∇u|+ λvG

Norm of G:

- Numerical implementation [Vese & Osher ‘03, Aujol & Chambolle ‘05]

ROF Meyer

vG := inf gL∞ : v = div(g), g = (g1, g2), g1, g2 ∈ L∞, |g(x)| =|g1|2 + |g2|2(x)

=> a signal with large oscillations has a small G-norm

TV-L1 [Chan-Esedoglu ’05] - [Rudin, Osher, Fatemi ‘92] preserves edges but loses contrast [Strong-Chan ‘96, Bellettini-Caselles-Novaga ’02].

TV-L1: min(u,v)∈BV×L1:f=u+v

|∇u|+ λv1

Model properties: - robust to contrast and geometry perturbation in the presence of noise - does not perturb a clean image in the absence of noise - Cleaner image multiscale decomposition than ROF - Data driven scale selection (detection of meaningful objects in images)

See also [Tadmor-Nezzar-Vese ‘03]

Theorem: If u0 = 1D, D is convex and smooth then

u =1− |∂D|

2λ|D| Contrast Lost

Scale-Space Generated with ROF and TV-L1

Multiscale Decomposition with TV-L1

TV-L1 1

Image Processing and Applied Maths

Math IP

PDEs (CFD, viscosity solution) Calculus of variations

Differential geometry (level set method) Smooth & non-smooth functional analysis

(wavelets/Besov, total variation/BV)

Compressed sensing, L1 optimization (Bregman) Convexification for geometric problems

Relaxation for graph optimization problems

TV Wavelet inpainting [Chan, Shen & Zhou ‘04]

Original image downloaded from internet Damaged image: 50% of wavelet coefficients (including low frequencies) are randomly lost.

Can you recognize this person?

•  Model I (for noise-less images):

No parameters. Problem dimension << #coefs.

•  Model II (for noisy images):

minβj,k

|∇u| s.t. u =

βj,kΦj,k, βj,k = β0j,k (observed coefs)

minβj,k

|∇u|+ λIu− u02 (u0 observed image)

wavelets

Restoration result

Original image Damaged image, 50% of wavelet coefficients are randomly lost. PSNR = 10.9

Model II: Recovered image, keeping undamaged coefficients unchanged. PSNR 18.8.

PSNR vs %Coefs Retained X-axis: percentage of randomly retained wavelet coefficients Y-axis: improvements in PSNR after inpainted by Model I (green) and Model II (red). As less coefficients are missed, the improvement in PSNR becomes larger.

An Extreme Example Upper left: Original clear square. Upper right: Lost all but one nonzero coefficients in the low-low frequency subband, while keep all high frequencies (PSNR=11.2dB). Lower left: Recovered by Model I. Perfect reconstruction (PSNR=61dB). Lower Right: Cross-section at x=128. The inpainted profile is visually indistinguishable from the original. A preview of Compressed Sensing!

Compressed Sensing in IP [Candes, Romberg & Tao ‘06, Donoho ’06]

Reconstruct the signal u in Rn from m measurements f with m<<n :

Assume u is sparse then the problem is:

Theorem: For m>=c.log n, the signal u can be reconstructed exactly (under reasonable assumptions) solving the problem:

Ψu0 s.t. f = Au

L1 relaxation (tight) minu

Ψu1 s.t. f = Au

Ψu1 = TV (u) =

|∇u|

=> The CS problem is equivalent to the TV wavelet inpaiting problem.

(system is underdetermined) f = Au, u ∈ Rn, f ∈ Rm, A ∈ Rn×m

For A = RΦand ": sensing basis (wavelet, Fourier) R: measurement extractor

Reconstruction

CS reconstruction of an MR image using 30% of the k-space data [Goldstein & Osher ‘09]

CS reconstruction of an MR image using 30% of the Fourier space data [Zhang, Burger, Bresson & Osher ‘10]

Convexification of geometric problems in IP

-  Basic IP problems (segmentation, registration, surface reconstruction) are defined as non-convex optimization problems.

-  Curse of non-convexity: local minimizers/dependence of initialization to capture meaningful solution

-  Convexification: change non-convex geometric problem into convex relaxation problem which admits tight or exact relaxation [Strang ‘83, Chan, Esedoglu & Nikolova, ‘06, Bresson, Esedoglu, Vandergheynst, Thiran, Osher ’07]

TV-L1 and shape denoising Shape denoising (non-convex) TV-L1 (convex)

Per(Σ) + λ|Σ∆S|

Ω|∇u|+

Ω|u− u0| u0 = 1S

Theorem: For any minimizer u* of TV-L1 then the thresholded function is a global minimizer of the shape denoising problem.

Proof: TV L1(u) =

Per(u(x) > µ

Σ(µ)

) + |Σ(µ)∆S|dµ

Coarea formula Layer cake formula

We have the same geometric problem for each level set of u

Equivalence between convex problem (min over function in R) and non-convex problem (min over geometric sets)

Image segmentation

Level set method (non-convex):

minφ,c1,c2

γ|∇Hε(φ)|+

ΩHε(φ)|u0 − c1|2 +

Ω(1−Hε(φ))|u0 − c2|2

Why non-convexity is not desirable? Because some local minimizers are not meaningful:

Segmentation (non-convex) [Chan, Vese ’01]:

minΣ,c1,c2

γPer(Σ) +

Σ|u0 − c1|2 +

Ω\Σ|u0 − c2|2

Convexification [Chan, Esedoglu, Nikolova ’06, Strang ‘83]

minu,c1,c2

Fseg = γ

Ω|∇u|+

Ω|u0 − c1|2u+

Ω|u0 − c2|2(1− u)

Theorem: For any minimizer u* of Fseg then the thresholded function is a global minimizer of the segmentation problem.

Proof:

Coarea formula Layer cake formula

We have the same geometric problem for each level set of u

Fseg(u) =

Per(Σ(µ)) +

Σ(µ)|u0 − c1|2 +

Ω\Σ(µ)|u0 − c2|2

Full convex relaxation of [Chan & Vese ‘01] w.r.t. [Brown-Chan-Bresson ‘12]:

(Σ, c1, c2)

Generalization -  Growing literature to reformulate most fundamental non-convex IP

problems as convex problems. No theoretical guarantee for exact relaxation but numerics show tight approximate solutions => the new level set method ?

-  Registration (optical flow-based registration [Horn & Schunck ‘81])

-  Surface Reconstruction [Savadjiev, Ferrie & Siddiqi ’03]

-  Multi-class segmentation [Potts ‘52]

Per(Σ)−

∂Σ< NΣ, Np >

minΣi1≤i≤n

Perwp(Σi) + Areawia(Σi)

∪ni=1Σi = Ω, Σi ∩ Σj = ∅ ∀i = j

Ω|∇u|+

Ω|I1 − I2(u)|2

L1 relaxation of graph cut problems Some graph cut problems are geometric problems => use L1 relaxation ideas to design better graph algorithms.

Data clustering can be cast as graph cut problems [Cheeger ‘70]:

Cut(Ω,Ωc)

min(|Ω|, |Ωc|)

Balanced cut problems are NP-hard ⇒ continuous relaxations are required:

1) L2 relaxation [Shi, Malik ‘00]:

(Rayleigh quotient)

but L2 relaxation is not tight:

2) L1 relaxation [Szlam-Bresson ’10, Chung ’98]:

=> exact relaxation but non-convex optimization problem

minf :V→R

||f −mean(f)||22, fH1 =

wi,j |fi − fj |2

minf :V→R

f −median(f)1, fTV =

wi,j |fi − fj |

Cut(Ω,Ωc) =

i∈Ω, j∈Ωc

wi,jwith

Unsupervised and transductive learning problems

- Unsupervised clustering of high-dimensional data [Szlam-Bresson ‘10]

- Transductive clustering [Bresson, Tai, Chan & Szlam‘12]

MNIST: Clustering of Digit Numbers 70,000 28 # 28 images (dim=784)

Error of classification: -  [Shi & Malik]: 19.1% -  [Szlam & Bresson]: 13.1%

minΩ1,...,ΩK

cut(Ωk,Ωck)

min(|Ωk|, |Ωck|)

∪Kk=1Ωk = V and Ωi ∩ Ωj = ∅ ∀i = j

[2] Belkin, Problems of learning on manifolds, 2003

=> L1 works significantly better than L2 for small sets of labels.

and given labels lkKk=1

Non-smooth/TV optimization in IP -  Nonlinear total variation based noise removal algorithms[Rudin,

Osher, Fatemi 1992]: discretized PDEs -  A nonlinear primal-dual method for total variation-based image

restoration [Chan, Golub, Mulet 1996]: primal-dual algorithm -  An algorithm for total variation minimization and applications

[Chambolle 2004]: dual algorithm -  Smooth minimization of non-smooth functions [Nesterov 2005]: -  The split Bregman method for L1 regularized problems [Goldstein,

Osher 2009]: ADMM for L1 problems - A general framework for a class of first order primal-dual algorithms

for convex optimization in imaging science [Esser, Zhang, Chan 2009]

-  A first-order primal-dual algorithm for convex problems with applications to imaging [Chambolle, Pock 2011]: accelerated primal-dual algorithm

L1 Minimization and Convex Optimization

-  Many IP problems are formulated as L1 problems:

Basis Pursuit:

TV Denoising:

Geometric Convexification:

u1 such that Au = f

∇u1 +λ

2u− f2

-  Challenges: Such problems are large scale, nondifferentiable but convex and separable.

-  Renewed interest in classical convex optimization methods for L1 problems: new applications, improved algorithms and new theoretical developments

∇u1 + λ < u,w(x) >

Algorithm Connections, Old and New Augmented Lagrangian/ Method of Multipliers

Bregman Iteration (Adds back the noise! Very effective for L1 minimization problems) [Osher, Burger, Goldfarb, Xu, Yin 2005]

Example: Basis Pursuit minu

u1 such that Au = f

Alternating Direction Method of Multipliers (ADMM); Douglas Rachford Splitting

Split Bregman (Effective for TV minimization problems and much more generally applicable) [Goldstein, Osher 2008]

Example: TV Denoising minu

∇u1 +λ

2u− f2

uk+1 = argminu

µu1 +1

2Au− fk2

fk+1 = f + (fk −Auk+1)

minu,z

z1 +λ

2u− f2 s.t. z = ∇u

↔uk+1 = argminu

< pk, f −Au > +1

2Au− f2

pk+1 = pk + (f −Auk+1)

(uk+1, zk+1) = argminu,z

z1 +λ

2u− f2

+ pk, z −∇u+ δ

2∇u− z2

pk+1 = pk + δ(zk+1 −∇uk+1)

zk+1 = argminz

z1 +δ

2z −∇uk +

uk+1 = argminu

2u− f2 + δ

2∇u− zk+1 − pk

pk+1 = pk + δ(zk+1 −∇uk+1)

Algorithm Connections, Old and New

Uzawa’s Method Linearized Bregman (Efficient for large scale problems) [Darbon, Osher, Yin, Goldfarb 2007]

Proximal Forward Backward Splitting Fixed Point Continuation (FPC) and many related iterative thresholding methods for L1 minimization [Hale, Yin, Zhang 2007]

Arrow-Hurwicz Primal Dual Hybrid Gradient (PDHG) (Excellent for TV denoising) [Zhu, Chan 2008]

Preconditioned Method of Multipliers/ Proximal Method of Multipliers

Bregman Operator Splitting (BOS) (Proposed for nonlocal TV; very generally applicable) [Zhang, Burger, Bresson, Osher 2009]

↔↔↔↔

Quadratic Penalty Penalty method (Reformulates total variation penalty as constrained optimization problem) [Wang, Yin, Zhang 2007] ↔

Preconditioned ADMM (Linearization of penalty also related to surrogate function and optimization transfer)

Split Inexact Uzawa, Modified PDHG, Chambolle/Pock Method, He/Yuan Variants... (Many versions of this very versatile method) [Zhang, Burger, Osher, Esser, Chan, Chambolle, Pock...2009]

Newton-like Methods CGM [Chan, Golub & Mulet '95] Semismooth Newton for TV [Hintermuller, Stadler] (Uses second order information, can be superlinearly convergent)

image processing and computational mathematics · 7/10/2012 · image processing and computational...

Documents

computational & applied mathematics

computational models of information processing

essential mathematics for computational design

computational mathematics

johann radon institute for computational and applied...

what is computational engineering mathematics?

computational mathematics an...

department of computational and applied mathematics

signal processing, image analysis, computational modeling...

the iv applied mathematics, modeling, and computational...

computational science, engineering & mathematics:...

computational discovery in pure mathematics

essential mathematics for computational design mathematics...

computational tools for image processing

introduction to computational mathematics and...

computational algebraic problems in variational pde image...

notes: special topics in computational mathematics ......

journal of computational mathematics

computational modelling in elementary mathematics...

into colorado state university · mathematics...