consistency potentials for scene understandingsgould/papers/consistency... · 2013-12-02 ·...
TRANSCRIPT
Stephen Gould, ANU
Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
2 December 2013
Consistency Potentials for Scene Understanding:
from Pairwise to Higher-order
2
Multi-class Pixel Labeling
FG/BG segmentation [Boykov and Jolly, 2001;
Rother et al., 2004]
Geometric context [Hoiem et al., 2005]
Semantic segmentation [He et al., 2004; Shotton et
al., 2006; Gould et al., 2009]
Digital photo montage [Agarwala et al., 2004]
Stereo reconstruction [Scharstein and Szeliski,
2005]
Denoising and
Inpainting
Label every pixel in an image with a class label
from some pre-defined set, i.e.,
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
3
Pixelwise Pixel Labeling
x4 x5 x6
x7 x8 x9
x1 x2 x3
y4 y5 y6
y7 y8 y9
y1 y2 y3
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
4
Pixelwise Pixel Labeling
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
• Options for improving accuracy:
– (i) use more features, more data,
more complex models
– (ii) use priors to guide the labeling
towards a more plausible solution
• Most common priors enforce
smoothness (e.g., pairwise)
• Data dependent priors can take
into account image features
5
Introducing (Data Dependent) Priors
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
y2 y1
constraint on
joint assignment
6
Conditional Markov Random Fields
energy function x4 x5 x6
x7 x8 x9
x1 x2 x3
y4 y5 y6
y7 y8 y9
y1 y2 y3
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
• A pseudo-Boolean function is a mapping
• Can be written (uniquely) as a multi-linear polynomial or
(non-uniquely) in posiform
• A binary pairwise MRF is just a quadratic (bilinear)
pseudo-Boolean function (QPBF)
• Submodular QPBFs can be minimized by graph cuts
– identified by negative coefficients on pairwise terms
7
Binary CRFs and Pseudo-Boolean Fcns
[Boros and Hammer, 2001]
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
• construct a graph where
every st-cut corresponds
to a joint assignment to
the variables
• the cost of the cut should
equal the energy of the
assignment
• the minimum-cut then
corresponds to the energy
minimizing assignment
8
Graph-Cuts
t
s
1
0
v u
5 4
3 2
1
3
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
• construct a graph where
every st-cut corresponds
to a joint assignment to
the variables
• the cost of the cut should
equal the energy of the
assignment
• the minimum-cut then
corresponds to the energy
minimizing assignment
9
Graph-Cuts
t
s
1
0
v u
6 7
5
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
• Start with a pixel labeling problem
• Formulate as multi-label CRF inference
• (move-making: α-expansion, αβ-swap, ICM)
– Convert to a sequence of binary pairwise CRF
inference problems
– Write CRF as a quadratic pseudo-Boolean function
– Solve by finding the minimum cut (maximum flow)
• (relaxation)
• (approximation) 10
Energy Minimization via Graph-Cuts
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
[Boykov et al., 2001]
11
Contrast Sensitive Pairwise Smoothness
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
xi xj
yj yi
12
Pairwise Smoothness Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
Image Independent
(unary only)
Pairwise CRF
• Ideal: Suppose an oracle told us which pixels belong
together. Then all we would need to do is predict the
class labels.
• Problem: no over-segmentation algorithm is perfect.
Even if they were, our label predictions may be wrong.
• Solution: use superpixels as soft constraints.
13
Why Not Use Superpixels?
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
• Pairwise Potts Potential:
– Penalize if two pixels
disagree
• Higher-order Potts Potential:
– Penalize if any two pixels in a
clique disagree
– Penalty paid once
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013 14
Generalized Potts Model
yj yi
yj
yi
yb ya
yv yu
[Kohli et al., 2007]
15
Higher-order Smoothness Potentials
% disagreement
penalty
% disagreement
penalty
% disagreement
penalty
Generalized Potts [Kohli et al., 2007]
Robust Potts Model [Kohli et al., 2008;
Ladicky et al., 2009]
Arbitrary Concave [Kohli and Kumar, 2010;
Gould, 2011]
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
16
Binary Lower Linear Envelope MRFs
number of non-zero
labels
penalty
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
17
Inference (Binary Case) z
y1 y2 y3
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
18
Inference (Binary Case)
y1 y2 y3
z1 z2
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
negative
(submodular)
19
Inference (Full CRF---Binary Case) z1 z2
x4 x5 x6
x7 x8 x9
x1 x2 x3
y4 y5 y6
y7 y8 y9
y1 y2 y3
sum of submodular potentials is submodular
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
20
Learning (Binary Case)
[Gould, ICML 2011]
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
difference in
energy functions
21
Learning (Binary Case)
[Gould, ICML 2011]
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
• Sampled lower linear envelope (2nd order)
• Slope (1st order)
• Curvature (0th order)
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013 22
Learning Variants (Binary Case)
23
Weighted Smoothness Potentials
clique “weight”
penalty
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
weighted
clique
membership
Lower Linear Envelope Restricted Boltzmann Machine
24
Aside: Relationship to RBM
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
importantly, higher-order cliques can overlap
25
Defining the Higher-order Cliques
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
oversegment to
produce contiguous
superpixels
cluster (e.g, k-means) to
produce non-contiguous
regions
, … ,
• Aggregation by summation
• Aggregation by minimization
• Move-making (approximate) inference
26
Extending to Multiple Labels
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
[Park and Gould, ECCV 2012]
27
Dual Decomposition Inference
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
master
[Komodakis et al., PAMI 2010]
slave
slave
28
Dual Decomposition Inference (Details)
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
cost of setting i-th variable to
label for k-th linear function
cost of not setting i-th variable to
label for k-th linear function
diffe
rence
[i]
29
Semantic Segmentation Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
increasing pairwise prior
incre
asin
g h
igher-
ord
er
prior
30
Semantic Segmentation Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
increasing pairwise prior
incre
asin
g h
igher-
ord
er
prior
31
Semantic Segmentation Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
increasing pairwise prior
incre
asin
g h
igher-
ord
er
prior
32
Semantic Segmentation Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
increasing pairwise prior
incre
asin
g h
igher-
ord
er
prior
33
Semantic Segmentation Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
increasing pairwise prior
incre
asin
g h
igher-
ord
er
prior
34
“GrabCut” Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
increasing pairwise prior
incre
asin
g h
igher-
ord
er
prior
35
“GrabCut” Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
increasing pairwise prior
incre
asin
g h
igher-
ord
er
prior
36
“GrabCut” Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
increasing pairwise prior
incre
asin
g h
igher-
ord
er
prior
37
“GrabCut” Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
increasing pairwise prior
incre
asin
g h
igher-
ord
er
prior
38
Higher-order Matching Potentials
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
[Gould, CVPR 2012]
patch A
patch B
pixelwise
weight
weighted pairwise
disagreement
penalty
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013 39
Inference with Matching Potentials
non-submodular pairwise terms
• Problem: non-submodular terms (in move making
steps when labels already agree before the move)
• Solution: approximate with (tight) upper-bound
by setting z = 1 [Gould, CVPR 2012]
40
Cross-Image Consistency Potentials
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
[Rivera and Gould, DICTA 2011]
P Q
41
Cross-Image Results
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013
unary
pair
wis
e
matc
h
+ more
• Priors/constraints provide a mechanism for scene
understanding that simply adding more features cannot
• Many other (higher-order) consistency potentials, e.g.,
– Cardinality [Tarlow et al., 2010], label co-occurrence [Ladicky et al.,
2010], label cost [Delong et al., 2010], densely connected
[Krahenbuhl and Koltun, 2011], connectivity [Vincete et al., 2008]
• Biggest challenge is in learning the parameters of these
– Currently, piecewise learning and cross-validation works best
• Opportunities: higher-order (supermodular) loss
functions [Tarlow and Zemel, 2011; Pletscher and Kohli, 2012]
42
Summary and Challenges for (Higher-
order) Consistency Potentials
Stephen Gould @ Graphical Models for Scene Understanding: Challenges and Perspectives, ICCV 2013