in search of provably effective visual data representationsnon-convex objects are harder. two...

51
John Wright Electrical Engineering Columbia University In Search of Provably Effective Visual Data Representations

Upload: others

Post on 02-Dec-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

John Wright

Electrical Engineering

Columbia University

In Search of Provably Effective Visual Data Representations

Page 2: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

??

Images

Videos

Web data

> 1M pixels

> 1B voxels

> 10B+ websites

Datasets are massive, high-dimensional…

Page 3: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

??

Images

Videos

Web data

> 1M pixels

> 1B voxels

> 10B+ websites

… intrinsic structures are low-dimensional

How can we exploit low-dimensional structure in high-dimensional data?

Page 4: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Recognition Inpainting Denoising

Compression Transmission Stabilization Repair

Indexing Ranking Search Collaborative filtering…

Images

Videos

Web data

Good solutions enable many applications

??

Page 5: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Real application data often contain missing observations, corruption or even malicious errors and noise. Classical algorithms (e.g., least squares, PCA) break down …

But … it is not easy…

Images

Videos

Web data

??

Page 6: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

This talk

Images

Videos

Web data

??

How do we develop provably correct and efficient algorithms for recovering low-dimensional structure from corrupted high-dimensional observations?

Page 7: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Given with low-rank, sparse, recover .

Numerous approaches to robust PCA in the literature:

• Multivariate trimming [Gnanadeskian + Kettering ’72] • Random sampling [Fischler + Bolles ’81] • Alternating minimization [Ke + Kanade ’03] • Influence functions [de la Torre + Black ’03]

Can we give an efficient, provably correct algorithm?

… … …

Formulation – Robust PCA?

Page 8: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Why is the problem hard?

+ + or

Some very sparse matrices are also low-rank:

Certain sparse error patterns make recovering impossible:

+ =

Can we recover that are incoherent with the standard basis?

Can we correct whose support is not adversarial?

Page 9: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Singular vectors of not too spiky:

Uniform model on error support, signs and magnitudes arbitrary:

not too cross-correlated:

Can we recover that are incoherent with the standard basis from almost all errors ?

Incoherence condition on singular vectors, singular values arbitrary:

Incoherence condition: [Candès + Recht ‘08]

When is there hope? (In) coherence

Page 10: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Naïve optimization approach

Look for a low-rank that agrees with the data up to some sparse error :

… and how should we solve it?

Page 11: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Naïve optimization approach

Look for a low-rank that agrees with the data up to some sparse error :

Convex relaxation

Nuclear norm heuristic: [Fazel, Hindi, Boyd ‘01], [Recht, Fazel, Parillo ‘08]. Convex surrogate for LR+S: [Chandrasekharan, Sanghavi, Parillo, Wilsky ‘11].

… and how should we solve it?

Page 12: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

“Convex optimization recovers matrices of rank from errors corrupting entries”

[Candès, Li, Ma, and W., ’09].

Theory – Correct recovery

Page 13: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Video = Low-rank appx. + Sparse error

Static camera surveillance video

200 frames, 144 x 172 pixels,

Significant foreground motion

RPCA

Applications – Background modeling from video

Real time (recursive) versions: Vaswani et. al., Balzano et. al. ‘12.

Page 14: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Peng, Ganesh, W., Ma. CVPR 2010

Application – Faces from the internet

Page 15: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Input: faces detected by a face detector ( )

Average

Peng, Ganesh, W., Ma. CVPR 2010

Application – Faces detected

Page 16: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Output: aligned faces ( )

Average

Peng, Ganesh, W., Ma. CVPR 2010

Application – Faces aligned

Page 17: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Output: clean, low-rank faces ( )

Average

Peng, Ganesh, W., Ma. CVPR 2010

Application – Faces repaired and cleaned

Page 18: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Gloria Macapagal Arroyo

Jennifer Capriati

Laura Bush

Serena Williams

Barack Obama

Ariel Sharon

Arnold Schwarzenegger

Colin Powell

Donald Rumsfeld

George W Bush

Gerhard Schroeder

Hugo Chavez

Jacques Chirac

Jean Chretien

John Ashcroft

Junichiro Koizumi

Lleyton Hewitt

Luiz Inacio Lula da Silva

Tony Blair

Vladimir Putin

Average face before alignment & repairing

Peng, Ganesh, W., Ma. CVPR 2010

Application – Celebrities from the internet

Page 19: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Gloria Macapagal Arroyo

Jennifer Capriati

Laura Bush

Serena Williams

Barack Obama

Ariel Sharon

Arnold Schwarzenegger

Colin Powell

Donald Rumsfeld

George W Bush

Gerhard Schroeder

Hugo Chavez

Jacques Chirac

Jean Chretien

John Ashcroft

Junichiro Koizumi

Lleyton Hewitt

Luiz Inacio Lula da Silva

Tony Blair

Vladimir Putin

Average face after alignment & repair

Peng, Ganesh, W., Ma. CVPR 2010

Application – Celebrities from the internet

Page 20: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

BIG PICTURE – Parallelism of Sparsity and Low-Rank

Degeneracy of

Measure

Convex Surrogate

Compressed Sensing

Error Correction

Domain Transform

Mixed Structures

Sparse Vector

individual signal

L0 norm

L1 norm

Low-Rank Matrix

correlated signals

Nuclear norm

Page 21: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

A SUITE OF POWERFUL REGULARIZERS

• [Zhou et. al. ‘09] Spatially contiguous sparse errors via MRF

• [Bach ’10] – structured relaxations from submodular functions

• [Negahban+Yu+Wainwright ’10] – geometric analysis of recovery

• [Becker+Candès+Grant ’10] – algorithmic templates

• [Xu+Caramanis+Sanghavi ‘11] column sparse errors L2,1 norm

• [Recht+Parillo+Chandrasekaran+Wilsky ’11] – compressive sensing of various structures

• [Candes+Recht ’11] – compressive sensing of decomposable structures

• [McCoy+Tropp’11] – decomposition of sparse and low-rank structures

• [Wright+Ganesh+Min+Ma, ISIT’12] – superposition of decomposable structures

For robust recovery of a family of low-dimensional structures:

Take home message: Let the data and application tell you the structure…

Page 22: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Can we provably detect object instances under varying lighting?

Conceptual Problem:

Given (information about) an object , can we provably detect or recognize the object from a new image ?

What information do we need?

What computational complexity?

What algorithmic performance guarantee?

Page 23: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

The goal – practical?

Many engineering approaches to mitigate illumination: Quotient images [Riklin-Raviv, Shashua ‘99] [Wang, Li, Wang ‘04]

Nonlinear features SIFT [Lowe ‘99], HOG [Dalal, Triggs ‘05] LBP [Ojala, Pietikäinen, Harwood ’94, Ahonen, Hadid, Pietikainen ‘06]

Total variation minimization [Yin et. Al. ]

Or use heuristic / intuitive physics … [Murase, Nayar ‘96], [Belhumeur , Kriegman ‘98], [Basri, Jacobs ‘03], [Ramamoorthi ‘01], [Basri, Frolova ‘04], [ W. et. al. ‘09], many others…

Wang, Li, Wang ‘04

Ahonen, Hadid, Pietikainen ‘06

Page 24: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

The goal – practical?

Many engineering approaches to mitigate illumination: Quotient images [Riklin-Raviv, Shashua ‘99] [Wang, Li, Wang ‘04]

Nonlinear features SIFT [Lowe ‘99], HOG [Dalal, Triggs ‘05] LBP [Ojala, Pietikäinen, Harwood ’94, Ahonen, Hadid, Pietikainen ‘06]

Total variation minimization [Yin et. Al. ]

Or use heuristic / intuitive physics … [Murase, Nayar ‘96], [Belhumeur , Kriegman ‘98], [Basri, Jacobs ‘03], [Ramamoorthi ‘01], [Basri, Frolova ‘04], [ W. et. al. ‘09], many others…

Wang, Li, Wang ‘04

Ahonen, Hadid, Pietikainen ‘06

Labeled Faces in the Wild results http://vis-www.cs.umass.edu/lfw/results.html

Page 25: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

The goal – practical?

Many engineering approaches to mitigate illumination: Quotient images [Riklin-Raviv, Shashua ‘99] [Wang, Li, Wang ‘04]

Nonlinear features SIFT [Lowe ‘99], HOG [Dalal, Triggs ‘05] LBP [Ojala, Pietikäinen, Harwood ’94, Ahonen, Hadid, Pietikainen ‘06]

Total variation minimization [Yin et. Al. ]

Or use heuristic / intuitive physics … [Murase, Nayar ‘96], [Belhumeur , Kriegman ‘98], [Basri, Jacobs ‘03], [Ramamoorthi ‘01], [Basri, Frolova ‘04], [ W. et. al. ‘09], many others…

Wang, Li, Wang ‘04

Ahonen, Hadid, Pietikainen ‘06

Page 26: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Ex: Face recognition with -regression:

If test image is also of subject , then

Linear subspace model for images of same face under varying illumination:

Subject i training

for some .

Can represent any test image wrt the entire training set as :

Sparse coefficients

Combined training dictionary

Test image

[W., Yang, Ganesh, Sastry, Ma, ‘09]

Page 27: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Underdetermined system of linear equations in unknowns :

Seek the sparsest solution:

Solution is not unique … but

should be sparse: ideally, only supported on images of the same subject

expected to be sparse: occlusion only affects a subset of the pixels

convex relaxation

Ex: Face recognition with -regression:

Page 28: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Ex: Face recognition with -regression:

If is “nice” and the model fits, can make strong statements about the performance of regression.

Page 29: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Geometry of illumination variations

Distant illumination identified with a Riemann integrable, nonnegative function .

Assume a linear sensor response. The image can often be written as

What is the set of possible images of ?

Page 30: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Let , and

The set is a convex cone. [Belhumeur + Kriegman ’98]

Geometry of illumination variations

Distant illumination identified with a Riemann integrable, nonnegative function .

Assume a linear sensor response. The image can often be written as

What is the set of possible images of ?

.

Page 31: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Geometry of illumination variations

Ambient cone model. Illumination is the sum of ambient and directional (arbitrary) components: and corresponding cone of possible images:

Page 32: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Extreme rays

Suppose . The can be interpreted as images under point illumination : Set (the ambient image)

We can build a -approximation to using these quantities:

Lemma. Suppose , with Riemann integrable. Then

Page 33: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Approximating a convex cone

Is there any special structure we can use here?

Approximation of general convex bodies in high dimensions is a disaster:

Extreme rays lie on a low dimensional submanifold of !

Theorem 2. [Bronstein, Ivanov ‘76] Let be a convex body. There exists an -approximation to in Hausdorff distance, with

vertices. For the unit sphere, this is optimal to within a constant.

Page 34: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Nine? [Lee + Kriegman ‘05] . Thirty eight? [Wagner, W., Zhou, Ganesh, Ma ‘12]

Sufficient? Good recognition rates in moderate datasets.

How many point illuminations?

Normal

Glasses

Sunglasses

Others

99.4% rec. rate

98.3% rec. rate

81.0% rec. rate

53.5% rec. rate

(116 subjects)

Page 35: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Low-light scenarios are harder.

Ambient level for cone

Non-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are obstructed. Global – Maximum length of a shadow-casting edge boundary.

What is the difficulty here?

vs

vs

Page 36: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Covering numbers

If we define an illumination covering number ,

This implies in general, and for convex objects .

Page 37: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

A different norm? Better rates in :

Suggests using regression [W., Yang, Ganesh, Sastry, Ma ‘09]

Prune? Reduce number of samples … but in any , need to get started: In general, cannot be too small due to moving shadow boundaries:

How can we build efficient detectors?

vertices on convex hull of radius- digital circle .

Page 38: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

How can we build efficient detectors?

Reduce dimensionality? Seek with much simpler than , but still guaranteed to suffice for the task. What sense of approximation? For guaranteed detection, control

Hausdorff discrepency: Simpler in what sense?

• Efficient storage: specify with only a few real numbers • Efficient evaluation:

Compute in time .

Computing , is dominant cost in state of the art nonnegative least squares algorithms (e.g., [Dhillon et. al. ‘10]).

Page 39: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

LR+S captures intuitive physics

Calculations with the Lambertian sphere suggest that sets of images of smooth, near-convex objects should be approximately low-rank: [Basri+Jacobs ’03,Ramamoorthi ‘04].

Cast shadows are often sparse [W., Yang, … Ma, ‘09] , [Candes, Li, Ma, W. ‘11]:

Observations used in photometric stereo [Wu et. Al. ‘11] :

How can we exploit this physical intuition, while guaranteeing quality of approximation?

Page 40: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Cone-preserving dimensionality reduction

A low-rank and sparse decomposition that provably preserves detection performance.

Would like to solve

… but constraint is very complicated. Relax to:

Page 41: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Results: Complexity Reduction

Hausdorff distance bound

Relative complexity

- covering radius. Smaller greater redundancy.

Page 42: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Summary so far …

• Sufficient sample densities: some first results for nonconvex objects.

• Cone-preserving complexity reduction. Both exploit low-dimensional structure …

Page 43: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Learning simple signal models?

with sparse - most of the are zero.

Good model for many types of imagery data, especially if we can learn the dictionary :

=

Data Dictionary

Page 44: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Ambiguities: or ?

Learning the basis of sparsity?

D = L+ S

k column subspaces of

Peculiar geometry:

Page 45: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

When is dictionary learning well-posed?

k column subspaces of

k+1 points per subspace

Solution is unique:

Page 46: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

How can we learn a good dictionary?

Recently: Supervised variants [Mairal et. al. ‘08], structured dictionaries [Rubenstein et. al. ‘10], highly scalable variants [Mairal et. al. ‘10] … and many, many more…

Alternating directions to minimize sparsity surrogate [Engan et. al., ‘99, Aharon et. al. ’05, Yaghoobi ‘10]

Page 47: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Provable solution in the complete case

Rows of are sparse vectors in a known subspace.

If , then whp. rows of are the sparsest vectors in .

Page 48: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Uniqueness – square dictionaries

[Spielman, Wang, W. ‘11]: Decomposition essentially unique from random observations.

[Aharon, Elad, Bruckstein ‘05]: Decomposition is essentially unique from strategically located observations.

Overcomplete:

Square:

Page 49: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Provable solution in the complete case

If the expected nonzeros per column is smaller than the algorithm succeeds whp: Sample requirement .

Page 50: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Summary and Open Questions

First results on quality of approximation for varying illumination, nonconvex objects. Dedicated cone-preserving dimensionality reduction. Many promising tools for exploiting low-dimensional structure in high-dimensional data:

More general decompositions Convexifications of the dictionary learning problem Many open questions: optimality, practicality, operational constants.

Page 51: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

Guaranteed Illumination Models for Nonconvex Objects, Zhang, Mu, Kuo, W., Arxiv ’13

Compressive Principal Component Pursuit, W., Ganesh, Min, Ma, I&I ‘13

Local correctness of -minimization for dictionary learning, Geng, W., Arxiv ‘11

Exact Recovery of Sparsely-Used Dictionaries, Spielman, Wang, W., COLT ’12

Robust Principal Component Analysis? Candes, Li, Ma, W. JACM ’11

Robust Face Recognition via Sparse Representation, W., Yang, Ganesh,, Sastry, Ma, PAMI ‘09

Towards a Practical Automatic Face Recognition … Wagner, W., Ganesh, Zhou, Mobahi, Ma,

PAMI ‘12

Provable Representations for Visual Data Thanks to …