Transcript

In Defense of 3D-Label StereoCarl Olsson, Johannes Ulen Yuri Boykov

Centre for Mathematical Sciences, Lund University, Sweden Department of Computer Science, University of Western Ontario, Canada

Overview

It is commonly believed that higher order smoothness should be modeled usinghigher order interactions. For example, 2nd-order derivatives for deformable(active) contours are represented by triple cliques. Similarly, the 2nd-orderregularization methods in stereo predominantly use MRF models with scalar(1D) disparity labels and triple clique interactions. In this paper we advocatea largely overlooked alternative approach to stereo where 2nd-order surfacesmoothness is represented by pairwise interactions with 3D-labels, e.g. tangentplanes. This general paradigm has been criticized due to perceived computationalcomplexity of optimization in higher-dimensional label space. Contrary to popularbeliefs, we demonstrate that representing 2nd-order surface smoothness with 3Dlabels leads to simpler optimization problems with (nearly) submodular pairwiseinteractions. Our theoretical and experimental results demonstrate advantagesover state-of-the-art methods for 2nd-order smoothness stereo.1 Code available at http://www.maths.lth.se/∼ulen/

Compared to other works

Woodford et. al. [1] Li and Zucker [2] This paper

• 1D-labels (disparity/depth) • 3D-labels (tangent planes) • 3D-labels (tangent planes)

• Triple cliques approximate2nd-order derivatives

• Local precomputed tangents2nd-order regularization

• Pairwise cliques approximate2nd-order derivative

• Reduction toQPBO fusion moves

• Belief propagation • QPBO fusions moves

• Hard QPBO problem • No solution guarantees • Submodularity propertieslead to simpler problems

• Generalization to higher orderinteractions

Background

We assign each pixel p a tangent plane. From the tangent planes it is straight forward to extract a cor-responding disparity or depth estimate. The underlying energy function is optimized by performing fusionmoves on proposed solutions (proposals).

Definition 1. Let D(p) be the disparity at pixel p. Furthermore let TpD : I 7→ R define the tangentat the point p seen as a function of the whole image, that is

TpD (x) = D (p) +∇D (p)T (x− p). (1)

We define a regularization between neighboring pixels as

Vpq = |TpD (q)−D (q) |. (2)

Vpq measures the curve’s deviation from the tangent plane. Using the Taylor expansion

D (q) ≈ D (p) +∇D (p)T (q − p) +1

2(q − p)T∇2D (p) (q − p), (3)

where ∇2D (p) is the Hessian at p, we see that

Vpq ≈ |1

2(q − p)T∇2D (p) (q − p)|. (4)

That is, Vpq measures the second derivative at p in the direction q − p of the underlying disparity function.

I qp

Vpq

D (p) D (q)

TpD (q)

I qp

d(p)phd(q)qh

Vpq

TpD (q) qh

Fig. 1: Left, Rectified cameras : Geometric interpretation of the smoothness term for parallel viewingrays. Right, Regular cameras : Smoothness term when the viewing rays are not parallel.

To make the energy discontinuity preserving we add a threshold t to the interaction,

Epq(D,P) := min(Vpq(D,P), t). (5)

Theoretical resultsProposition 2. If the proposal P is a plane then the fusion with any function D is a submodularmove for both Epq and Vpq.

Proof. Since P is a plane we have

TpP (q) = P (q) (6)

and therefore Vpq(P ,P) = 0. Furthermore,

Vpq(D,D) =∣∣TpD (q)−D (q)

∣∣ (7)

=∣∣TpD (q)− P (q) + TpP (q)−D (q)

∣∣ (8)

≤∣∣TpD (q)− P (q)

∣∣ +∣∣TpP (q)−D (q)

∣∣ (9)

= Vpq(D,P) + Vpq(P ,D) (10)

which shows that submodularity,

Vpq(D,D) + Vpq(P ,P) ≤ Vpq(P ,D) + Vpq(D,P), (11)

holds. The proof for Epq is given in the paper. ut

Proposition 3. If both D and P are convex (or alternatively both concave) between p and q then theinteractions Vpq and Vqp are submodular for the fusion move.

Generalization to higher dimensional labels

Label Pairwise Interaction Unary Term Submodular Proposals

Depth 1st derivative Depth Constant functionsTangent plane 2nd derivative Depth, 1st derivative Constant 1st derivative

2nd-order approximation 3rd derivative Depth, 1st, 2nd derivative Constant 2nd derivative... ... ... ...

Fig. 2: Characterization of pairwise interactions, unary terms and submodular proposals for different types of labels.

Results

Image Only data term With regularization.

Fig. 3: Result using regular cameras, picture of Skansen Lejonet in Gothenburg, Sweden.

Image Only data term With regularization.

Fig. 4: Result using regular cameras, picture of Orebro castle, Sweden.

(a) Image (b) Our (c) Woodford (d) Woodford 1op

(e) Ground truth (b) Our unlabelled (c) Woodford unlabelled (d) Woodford 1op unlabelled

Fig. 5: Result using rectified cameras. (b-d) are estimated disparity maps after fusing the 14 SegPln proposals. In (f-h) wepresent the unlabelled variables summed over all 14 proposals scaled 0–14. A white pixel would mean that fusing a proposalfor this pixel failed for every single proposal.

Tsukuba Venus Teddy Cones

Our 0.065 % 0.0264 % 0.127 % 0.0847 %Woodford 30.0 % 30.6 % 27.6 % 27.3 %Woodford 1op 0 % 0 % 0 % 0.0411 %

Fig. 6: Unlabelled for the 14 SegPln proposals on Middlebury.

Tsukuba Venus Teddy Cones Average

Our 21.3 25.5 29.4 36.5 28.2Woodford 106 139 143 181 142Woodford /Our 4.96 5.47 4.87 4.96 5.07

Fig. 7: Running time (s) using the convergence criteria in Woodford [1].

Tsukuba Venus Teddy ConesAverage

Non occ All Disc Non occ All Disc Non occ All Disc Non occ All Disc

Our 4.49 5.52 12.3 0.298 0.648 3.99 7.71 11.2 17.8 9.78 15.4 18.3 8.95Woodford 4.83 5.99 13.9 0.536 0.921 6.39 8.16 11.8 19.3 9.74 15.6 18.4 9.63

Fig. 8: Scores on Middlebury using the same proposals, lower is better. All values are % of pixels being ≥ 1 pixel incorrect for each of thethree classes. The classes are non occluded regions, all pixels and regions near depth discontinuities.

References

[1] O. Woodford, P. Torr, I. Reid, and A. Fitzgibbon, “Global stereo reconstruction under second order smoothness priors,” inIEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.

[2] G. Li and S. Zucker, “Differential geometric inference in surface stereo,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 32, no. 1, pp. 72–86, 2010.

IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013

Top Related