image parsing: unifying segmentation and detection

Image Parsing: Unifying Segmentation and

DetectionZ. Tu, X. Chen, A.L. Yuille and S-C.

HzICCV 2003 (Marr Prize) & IJCV

2005

Sanketh Shetty

Outline

• Why Image Parsing?• Introduction to Concepts in DDMCMC• DDMCMC applied to Image Parsing• Combining Discriminative and

Generative Models for Parsing• Results• Comments

Image Parsing

Image I

Parse Structure W

Optimize p(W|I)

Properties of Parse Structure

• Dynamic and reconfigurable– Variable number of nodes and node types

• Defined by a Markov Chain– Data Driven Markov Chain Monte Carlo

(earlier work in segmentation, grouping and recognition)

Key Concepts• Joint model for Segmentation &

Recognition– Combine different modules to obtain cues

• Fully generative explanation for Image generation– Uses Generative and Discriminative Models

+ DDMCMC framework– Concurrent Top-Down & Bottom-Up Parsing

Pattern Classes

62 characters

Faces

Regions

• Key Concepts:– Markov Chains– Markov Chain Monte Carlo

• Metropolis-Hastings [Metropolis 1953, Hastings 1970]

• Reversible Jump [Green 1995]– Data Driven Markov Chain Monte Carlo

MCMC: A Quick Tour

Markov Chains

Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005

Markov Chain Monte Carlo


Metropolis-Hastings Algorithm


Metropolis-Hastings Algorithm

Proposal Distribution

Invariant Distribution


Reversible Jumps MCMC

• Many competing models to explain data– Need to explore this complicated state space


DDMCMC Motivation


DDMCMC Motivation

Generative Modelp(I|W)p(W)

State Space

DDMCMC Motivation

Generative Modelp(I|W)p(W)

State Space

Discriminative Modelq( wj | I ) Dramatically reduce search space by focusing

sampling to highly probable states.

DDMCMC Framework

• Moves:– Node Creation– Node Deletion– Change Node Attributes

Transition Kernel

Satisfies detailed balanced equation

Full Transition Kernel

Convergence to p(W|I)

Monotonically at a geometric rate

Criteria for Designing Transition Kernels

Image Generation ModelRegions:

Constant IntensityTexturesShading

State of parse graph

62 characters

Faces

3 Regions

UniformDesigned to penalize high model complexity

Shape Prior

Faces

3 Regions

Shape Prior: Text

Intensity Models

Intensity Model: Faces

Discriminative Cues Used• Adaboost Trained

– Face Detector– Text Detector

• Adaptive Binarization Cues• Edge Cues

– Canny at 3 scales• Shape Affinity Cues• Region Affinity Cues

Transition Kernel Design• Remember

Possible Transitions

1. Birth/Death of a Face Node2. Birth/Death of Text Node3. Boundary Evolution4. Split/Merge Region5. Change node attributes

Face/Text Transitions

Region Transitions

Change Node Attributes

Basic Control Algorithm

Results

Comments• Well motivated but very complicated approach to THE HOLY GRAIL

problem in vision– Good global convergence results for inference with very minor

dependence on initial W.– Extensible to larger set of primitives and pattern types.

• Many details of the algorithm are missing and it is hard to understand the motivation for choices of values for some parameters

• Unclear if the p(W|I)’s for configurations with different class compositions are comparable.

• Derek’s comment on Adaboost false positives and their failure to report their exact improvement

• No quantitative results/comparison to other algorithms and approaches

– It should be possible to design a simple experiment to measure performance on recognition/detection/localization tasks.

Thank You

image parsing: unifying segmentation and detection

Documents

markov chain monte carlonotes

node typesdefined

generative models

ddmcmc frameworkmoves

ddmcmc motivationnotes

unifying segmentation

complicated approach

different class compositions