energy minimization - university of chicagottic.uchicago.edu/~rurtasun/courses/cv/lecture15.pdfp =...
TRANSCRIPT
Energy Minimization
Raquel Urtasun
TTI Chicago
Feb 28, 2013
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 1 / 48
The ST-mincut problem
Suppose we have a graph G = {V ,E ,C}, with vertices V , Edges E andcosts C .
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 2 / 48
The ST-mincut problem
An st-cut (S,T) divides the nodes between source and sink.
The cost of a st-cut is the sum of cost of all edges going from S to T
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 3 / 48
The ST-mincut problem
The st-mincut is the st-cut with the minimum cost
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 4 / 48
Back to our energy minimization
Construct a graph such that
1 Any st-cut corresponds to an assignment of x
2 The cost of the cut is equal to the energy of x : E(x)
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 5 / 48
St-mincut and Energy Minimization
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 6 / 48
How are they equivalent?
A B
C D
0 1
0
1 xi
xj
= A + 0 0
C-A C-A
0 1
0
1
0 D-C
0 D-C
0 1
0
1
0 B+C-A-D
0 0
0 1
0
1 + +
if x1=1 add C-A
if x2 = 1 add D-C
B+C-A-D ! 0 is true from the submodularity of !ij
A = !ij (0,0) B = !ij(0,1) C = !ij
(1,0) D = !ij (1,1)
!ij (xi,xj) = !ij(0,0) + (!ij(1,0)-!ij(0,0)) xi + (!ij(1,0)-!ij(0,0)) xj + (!ij(1,0) + !ij(0,1) - !ij(0,0) - !ij(1,1)) (1-xi) xj
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 7 / 48
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48
Graph Construction
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 8 / 48
How to compute the St-mincut?
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 9 / 48
How does the code look like
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 10 / 48
How does the code look like
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 10 / 48
How does the code look like
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 10 / 48
How does the code look like
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 10 / 48
Graph cuts for multi-label problems
Exact Transformation to QPBF [Roy and Cox 98] [Ishikawa 03] [Schlesingeret al. 06] [Ramalingam et al. 08]
Very high computational cost
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 11 / 48
Computing the Optimal Move
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 12 / 48
Move Making Algorithms
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 13 / 48
Energy Minimization
Consider pairwise MRFs
E (f ) =∑
{p,q}∈N
Vp,q(fp, fq) +∑p
Dp(fp)
with N defining the interactions between nodes, e.g., pixels
Dp non-negative, but arbitrary.
This is the graph-cuts notation.
Important to notice it’s the same thing.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 14 / 48
Energy Minimization
Consider pairwise MRFs
E (f ) =∑
{p,q}∈N
Vp,q(fp, fq) +∑p
Dp(fp)
with N defining the interactions between nodes, e.g., pixels
Dp non-negative, but arbitrary.
This is the graph-cuts notation.
Important to notice it’s the same thing.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 14 / 48
Energy Minimization
Consider pairwise MRFs
E (f ) =∑
{p,q}∈N
Vp,q(fp, fq) +∑p
Dp(fp)
with N defining the interactions between nodes, e.g., pixels
Dp non-negative, but arbitrary.
This is the graph-cuts notation.
Important to notice it’s the same thing.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 14 / 48
Metric vs Semimetric
Two general classes of pairwise interactions
Metric if it satisfies for any set of labels α, β, γ
V (α, β) = 0 ↔ α = β
V (α, β) = V (β, α) ≥ 0
V (α, β) ≤ V (α, γ) + V (γ, β)
Semi-metric if it satisfies for any set of labels α, β, γ
V (α, β) = 0 ↔ α = β
V (α, β) = V (β, α) ≥ 0
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 15 / 48
Metric vs Semimetric
Two general classes of pairwise interactions
Metric if it satisfies for any set of labels α, β, γ
V (α, β) = 0 ↔ α = β
V (α, β) = V (β, α) ≥ 0
V (α, β) ≤ V (α, γ) + V (γ, β)
Semi-metric if it satisfies for any set of labels α, β, γ
V (α, β) = 0 ↔ α = β
V (α, β) = V (β, α) ≥ 0
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 15 / 48
Examples for 1D label set
Truncated quadratic is a semi-metric
V (α, β) = min(K , |α− β|2)
with K a constant.
Truncated absolute distance is a metric
V (α, β) = min(K , |α− β|)
with K a constant.
For multi-dimensional, replace | · | by any norm.
Potts model is a metric
V (α, β) = K · T (α 6= β)
with T (·) = 1 if the argument is true and 0 otherwise.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 16 / 48
Examples for 1D label set
Truncated quadratic is a semi-metric
V (α, β) = min(K , |α− β|2)
with K a constant.
Truncated absolute distance is a metric
V (α, β) = min(K , |α− β|)
with K a constant.
For multi-dimensional, replace | · | by any norm.
Potts model is a metric
V (α, β) = K · T (α 6= β)
with T (·) = 1 if the argument is true and 0 otherwise.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 16 / 48
Examples for 1D label set
Truncated quadratic is a semi-metric
V (α, β) = min(K , |α− β|2)
with K a constant.
Truncated absolute distance is a metric
V (α, β) = min(K , |α− β|)
with K a constant.
For multi-dimensional, replace | · | by any norm.
Potts model is a metric
V (α, β) = K · T (α 6= β)
with T (·) = 1 if the argument is true and 0 otherwise.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 16 / 48
Examples for 1D label set
Truncated quadratic is a semi-metric
V (α, β) = min(K , |α− β|2)
with K a constant.
Truncated absolute distance is a metric
V (α, β) = min(K , |α− β|)
with K a constant.
For multi-dimensional, replace | · | by any norm.
Potts model is a metric
V (α, β) = K · T (α 6= β)
with T (·) = 1 if the argument is true and 0 otherwise.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 16 / 48
Binary Moves
α− β moves works for semi-metrics
α expansion works for V being a metric
Figure: Figure from P. Kohli tutorial on graph-cuts
For certain x1 and x2, the move energy is sub-modular QPBF
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 17 / 48
Swap Move
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 18 / 48
Swap Move
[Source: P. Kohli]
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 18 / 48
Expansion Move
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 19 / 48
Expansion Move
[Source: P. Kohli]Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 19 / 48
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48
More formally
Any labeling can be uniquely represented by a partition of image pixelsP = {Pl |l ∈ L}, where Pl = {p ∈ P|fp = l} is a subset of pixels assignedlabel l .
There is a one to one correspondence between labelings f and partitions P.
Given a pair of labels α, β, a move from a partition P (labeling f ) to a newpartition P’ (labeling f ′) is called an α− β swap if Pl = P ′ for any labell 6= α, β.
The only difference between P and P ′ is that some pixels that were labeledin P are now labeled in P ′, and vice-versa.
Given a label l , a move from a partition P (labeling f ) to a new partition P ′
(labeling f ′) is called an α-expansion if Pα ⊂ P ′α and P ′
l ⊂ Pl .
An α-expansion move allows any set of image pixels to change their labelsto α.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 20 / 48
Example
Figure: (a) Current partition (b) local move (c) α− β-swap (d) α-expansion.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 21 / 48
Algorithms
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 22 / 48
Finding optimal Swap move
Given an input labeling f (partition P) and a pair of labels α, β we want tofind a labeling f that minimizes E over all labelings within one α− β-swapof f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gαβ = (Vαβ , Eαβ).
The structure of this graph is dynamically determined by the currentpartition P and by the labels α, β.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 23 / 48
Finding optimal Swap move
Given an input labeling f (partition P) and a pair of labels α, β we want tofind a labeling f that minimizes E over all labelings within one α− β-swapof f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gαβ = (Vαβ , Eαβ).
The structure of this graph is dynamically determined by the currentpartition P and by the labels α, β.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 23 / 48
Finding optimal Swap move
Given an input labeling f (partition P) and a pair of labels α, β we want tofind a labeling f that minimizes E over all labelings within one α− β-swapof f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gαβ = (Vαβ , Eαβ).
The structure of this graph is dynamically determined by the currentpartition P and by the labels α, β.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 23 / 48
Graph Construction
The set of vertices includes the two terminals α and β, as well as imagepixels p in the sets Pα and Pβ (i.e., fp ∈ {α, β}).
Each pixel p ∈ Pαβ is connected to the terminals α and β, called t-links.
Each set of pixels p, q ∈ Pαβ which are neighbors is connected by an edgeep,q
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 24 / 48
Computing the Cut
Any cut must have a single t-link not cut.
This defines a labeling
There is a one-to-one correspondences between a cut and a labeling.
The energy of the cut is the energy of the labeling.
See Boykov et al, ”fast approximate energy minimization via graph cuts”PAMI 2001.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 25 / 48
Properties
For any cut, then
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 26 / 48
Finding the optimal α expansion
Given an input labeling f (partition P) and a label α we want to find alabeling f that minimizes E over all labelings within one α-expansion of f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gα = (Vα, Eα).
The structure of this graph is dynamically determined by the currentpartition P and by the label α.
Different graph than the α− β swap.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 27 / 48
Finding the optimal α expansion
Given an input labeling f (partition P) and a label α we want to find alabeling f that minimizes E over all labelings within one α-expansion of f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gα = (Vα, Eα).
The structure of this graph is dynamically determined by the currentpartition P and by the label α.
Different graph than the α− β swap.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 27 / 48
Finding the optimal α expansion
Given an input labeling f (partition P) and a label α we want to find alabeling f that minimizes E over all labelings within one α-expansion of f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gα = (Vα, Eα).
The structure of this graph is dynamically determined by the currentpartition P and by the label α.
Different graph than the α− β swap.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 27 / 48
Finding the optimal α expansion
Given an input labeling f (partition P) and a label α we want to find alabeling f that minimizes E over all labelings within one α-expansion of f .
This is going to be done by computing a labeling corresponding to aminimum cut on a graph Gα = (Vα, Eα).
The structure of this graph is dynamically determined by the currentpartition P and by the label α.
Different graph than the α− β swap.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 27 / 48
Graph Construction
The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.
Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.
Each pixel p is connected to the terminals α and α, called t-links.
Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.
For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.
The set of edges is then
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48
Graph Construction
The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.
Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.
Each pixel p is connected to the terminals α and α, called t-links.
Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.
For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.
The set of edges is then
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48
Graph Construction
The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.
Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.
Each pixel p is connected to the terminals α and α, called t-links.
Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.
For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.
The set of edges is then
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48
Graph Construction
The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.
Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.
Each pixel p is connected to the terminals α and α, called t-links.
Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.
For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.
The set of edges is then
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48
Graph Construction
The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.
Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.
Each pixel p is connected to the terminals α and α, called t-links.
Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.
For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.
The set of edges is then
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48
Graph Construction
The set of vertices includes the two terminals α and α, as well as all imagepixels p ∈ P.
Additionally, for each pair of neighboring pixels p, q such that fp 6= fq wecreate an auxiliary node ap,q.
Each pixel p is connected to the terminals α and α, called t-links.
Each set of pixels p, q which are neighbors and fp = fq, we connect with andn-link.
For each pair of neighboring pixels such that fp 6= fq, we create a triplet{ep,a, ea,q, tαa }.
The set of edges is then
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 28 / 48
Graph Construction
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 29 / 48
Properties
There is a one-to-one correspondences between a cut and a labeling.
The energy of the cut is the energy of the labeling.
See Boykov et al, ”fast approximate energy minimization via graph cuts”PAMI 2001.
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 30 / 48
Global Minimization Techniques
Ways to get an approximate solution typically
Dynamic programming approximations
Sampling
Simulated annealing
Graph-cuts: imposes restrictions on the type of pairwise cost functions
Message passing: iterative algorithms that pass messages between nodes inthe graph.
Now we can solve for the MAP (approximately) in general energies. We can solve
for other problems than stereo
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 31 / 48
Let’s look at data/bechmarks
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 32 / 48
Benchmarks
Two benchmarks with very different characteristics
(Middlebury) (KITTI)
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 33 / 48
Middlebury Dataset
Middlebury Stereo Evaluation – Version 2
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48
Middlebury Dataset
Middlebury Stereo Evaluation – Version 2
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48
Middlebury Dataset
Middlebury Stereo Evaluation – Version 2
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48
Middlebury Dataset
Middlebury Stereo Evaluation – Version 2
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48
Middlebury Dataset
Middlebury Stereo Evaluation – Version 2
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 34 / 48
Benchmarks for Stereo and metrics
Middlebury Stereo Evaluation – Version 2
Best methods < 3% errors (for all non-occluded regions)
http://vision.middlebury.edu/stereo/data/
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 35 / 48
Benchmarks: KITTI Data Collection
Two stereo rigs (1392× 512 px, 54 cm base, 90◦ opening)
Velodyne laser scanner, GPS+IMU localization
6 hours at 10 frames per second!
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 36 / 48
The KITTI Vision Benchmark Suite
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 37 / 48
Novel Challenges
Fast guided cost-volume filtering (Rhemann et al., CVPR 2011)
Middlebury, Errors: 2.7%
Error threshold: 1 px (Middlebury) / 3 px (KITTI)
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 38 / 48
Novel Challenges
Fast guided cost-volume filtering (Rhemann et al., CVPR 2011)
Middlebury, Errors: 2.7% KITTI, Errors: 46.3%
Error threshold: 1 px (Middlebury) / 3 px (KITTI)
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 38 / 48
Novel Challenges
So what is the difference?
Middlebury
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
KITTI
Moving vehicle
Specularities
Sensor saturation
Large label set
Strong slants
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48
Novel Challenges
So what is the difference?
Middlebury
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
KITTI
Moving vehicle
Specularities
Sensor saturation
Large label set
Strong slants
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48
Novel Challenges
So what is the difference?
Middlebury
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
KITTI
Moving vehicle
Specularities
Sensor saturation
Large label set
Strong slants
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48
Novel Challenges
So what is the difference?
Middlebury
d = 50 px
d = 16 px
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
KITTI
d = 150 px
d = 0 px
Moving vehicle
Specularities
Sensor saturation
Large label set
Strong slants
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48
Novel Challenges
So what is the difference?
Middlebury
Laboratory
Lambertian
Rich in texture
Medium-size label set
Largely fronto-parallel
KITTI
Moving vehicle
Specularities
Sensor saturation
Large label set
Strong slants
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 39 / 48
Stereo Evaluation
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 40 / 48
MRFs for stereo
Global methods: define a Markov random field over
Pixel-level
Fronto-parallel planes
Slanted planes
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 41 / 48
Plane MRFs
First segment an image into small regions, i.e., superpixels
Assume that the 3D world is compose of small frontal/slanted planes
Good representation if the superpixels are small and respect boundaries
E (x1, · · · , xn) =∑i
C (xi ) +∑i
∑j∈Nj
C (xi , xj)
with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes
This are continuous variables. Is this a problem?
What can I do to solve this? Discretize the problem
The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel
Pairwise is typically smoothness
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48
Plane MRFs
First segment an image into small regions, i.e., superpixels
Assume that the 3D world is compose of small frontal/slanted planes
Good representation if the superpixels are small and respect boundaries
E (x1, · · · , xn) =∑i
C (xi ) +∑i
∑j∈Nj
C (xi , xj)
with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes
This are continuous variables. Is this a problem?
What can I do to solve this? Discretize the problem
The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel
Pairwise is typically smoothness
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48
Plane MRFs
First segment an image into small regions, i.e., superpixels
Assume that the 3D world is compose of small frontal/slanted planes
Good representation if the superpixels are small and respect boundaries
E (x1, · · · , xn) =∑i
C (xi ) +∑i
∑j∈Nj
C (xi , xj)
with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes
This are continuous variables. Is this a problem?
What can I do to solve this? Discretize the problem
The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel
Pairwise is typically smoothness
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48
Plane MRFs
First segment an image into small regions, i.e., superpixels
Assume that the 3D world is compose of small frontal/slanted planes
Good representation if the superpixels are small and respect boundaries
E (x1, · · · , xn) =∑i
C (xi ) +∑i
∑j∈Nj
C (xi , xj)
with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes
This are continuous variables. Is this a problem?
What can I do to solve this? Discretize the problem
The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel
Pairwise is typically smoothness
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48
Plane MRFs
First segment an image into small regions, i.e., superpixels
Assume that the 3D world is compose of small frontal/slanted planes
Good representation if the superpixels are small and respect boundaries
E (x1, · · · , xn) =∑i
C (xi ) +∑i
∑j∈Nj
C (xi , xj)
with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes
This are continuous variables. Is this a problem?
What can I do to solve this? Discretize the problem
The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel
Pairwise is typically smoothness
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48
Plane MRFs
First segment an image into small regions, i.e., superpixels
Assume that the 3D world is compose of small frontal/slanted planes
Good representation if the superpixels are small and respect boundaries
E (x1, · · · , xn) =∑i
C (xi ) +∑i
∑j∈Nj
C (xi , xj)
with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes
This are continuous variables. Is this a problem?
What can I do to solve this? Discretize the problem
The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel
Pairwise is typically smoothness
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48
Plane MRFs
First segment an image into small regions, i.e., superpixels
Assume that the 3D world is compose of small frontal/slanted planes
Good representation if the superpixels are small and respect boundaries
E (x1, · · · , xn) =∑i
C (xi ) +∑i
∑j∈Nj
C (xi , xj)
with xi ∈ < for the fronto-parallel planes, and xi ∈ <3 for the slanted planes
This are continuous variables. Is this a problem?
What can I do to solve this? Discretize the problem
The unitary are usually agreegation of cost over the local matching on thepixels in that superpixel
Pairwise is typically smoothness
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 42 / 48
Slanted-plane MRFs
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 43 / 48
A more sophisticated occlusion model
MRF on continuous variables (slanted planes) and discrete var. (boundary)
Combines depth ordering (segmentation) and stereo
!"#$%&'(
)*+,*$- !"#$"%&'()*+),-").&$-*%/01/2.&$*/"3/4*+,*$-
./0%1)*2'()*+),-"5*.&6"$4782/9*-:**$/4*+,*$-4/
;/4-&-*4/
<==.#48"$ >8$+* ?"2.&$&'
?"$6$#"#4/@&'8&9.*/
184='*-*/@&'8&9.*/)#2*'28A*.4/BC?D/EF'9*.&*GH/*-/&.I/JKLLM/
////////////////////////&$%/)NO?/EF=7&$-&H/*-/&.I/JKLKMP/
Takes as input disparities computed by any local algorithm
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 44 / 48
Energy of PCBP-Stereo
y the set of slanted 3D planes, o the set of discrete boundary variables
E (y, o) = Ecolor (o) + Ematch(y, o) + Ecompatibility (y, o) + Ejunction(o)
!"#"$%&'()$)&' *"+,$-'.)'/,'()0$%1%&'
*,2'"#%3,'
4)$)&'5"#"$%&".-'
!"#"$%&'
6"55"#"$%&'
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 45 / 48
Energy of PCBP-Stereo
y the set of slanted 3D planes, o the set of discrete boundary variables
E (y, o) = Ecolor (o) + Ematch(y, o) + Ecompatibility (y, o) + Ejunction(o)
!"#$$%$&'()*'+(#$,-.'(/0(*&1-'(2*,13#*'4(%31(
5/%1-'$2(64(3&4(%3'7+*&"(%$'+/2((((89/2*:$2(,$%*;"./63.(%3'7+*&"<(
=*,13#*'4(%31( >.3&'$2(1.3&$(
?#-&73'$2(@-32#3A7(0-&7A/&
B&(6/-&23#4(CB77.-,*/&D(E(F/#$"#/-&2(,$"%$&'(/)&,(6/-&23#4(
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 46 / 48
Energy of PCBP-Stereo
y the set of slanted 3D planes, o the set of discrete boundary variables
E (y, o) = Ecolor (o) + Ematch(y, o) + Ecompatibility (y, o) + Ejunction(o)
!"#$%&'()*&+,*-) .*&$/,%)0*&$,-)!"#$%&
1',2,',$3,)"2)+"#$%&'()*&+,*) 45"0*&$&')6)78$9,)6):33*#-8"$;)
<&/3=)
>:33*#-8"$?)
>78$9,?)
>5"0*&$&'?)
i j i j
i j
i j
on boundary
in both segments
@<0"-,)0,$&*/()
4A;)
4B;)
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 47 / 48
Energy of PCBP-Stereo
y the set of slanted 3D planes, o the set of discrete boundary variables
E (y, o) = Ecolor (o) + Ematch(y, o) + Ecompatibility (y, o) + Ejunction(o)
!""#$%&'()*'$(+,-.)-/,%'(&(0)
123'%%&*#/)",%/%)
!""#$%&'()4-'(5)
6,"7)8&(0/)9'3#,(,-)
:;,#&7)<=>?@)
A/(,#&B/)&23'%%&*#/)C$("D'(%)
Raquel Urtasun (TTI-C) Computer Vision Feb 28, 2013 48 / 48