a linear approach to matching cuboids in rgbd...

A Linear Approach to Matching Cuboids in RGBD Images

Hao JiangBoston College

[email protected]

Jianxiong XiaoMassachusetts Institute of Technology

[email protected]

Abstract

We propose a novel linear method to match cuboids inindoor scenes using RGBD images from Kinect. Beyonddepth maps, these cuboids reveal important structures of ascene. Instead of directly fitting cuboids to 3D data, wefirst construct cuboid candidates using superpixel pairs ona RGBD image, and then we optimize the configurationof the cuboids to satisfy the global structure constraints.The optimal configuration has low local matching costs,small object intersection and occlusion, and the cuboidstend to project to a large region in the image; the numberof cuboids is optimized simultaneously. We formulate themultiple cuboid matching problem as a mixed integer linearprogram and solve the optimization efficiently with a branchand bound method. The optimization guarantees the globaloptimal solution. Our experiments on the Kinect RGBD im-ages of a variety of indoor scenes show that our proposedmethod is efficient, accurate and robust against object ap-pearance variations, occlusions and strong clutter.

1. Introduction

Finding three-dimensional structures and shapes fromimages is a key task in computer vision. Nowadays, we canobtain reliable depth map using low cost RGBD camerasfrom the digital consumer market, e.g. Microsoft Kinect,Asus Xtion and Primesense. Just like digital cameras thatcapture raw RGB data, these devices capture raw depthmaps along with RGB color images. RGBD images pro-vide a pointwise representation of a 3D space. We wouldlike to extract structures from such data.

Recently, there are a few heroic efforts in extractingstructures in RGBD images, e.g. [13, 12]. However, mostof these approaches group pixels into surface segments, i.e.the counterpart of image segmentation for RGB images. Al-though some noteworthy studies [8] infer support relationsin scenes, there is still very little volumetric reasoning usedin the 3D space, which should be even more important as thedepth is available, than pure image information with whichvolumetric reasoning is well studied [3, 9]. In this paper,

(a) (b) (c) (d)Figure 1. Given a color image and depth map we match cuboid-shaped objects in the scene. (a) and (b): The color image andaligned depth map from Kinect. (c): The cuboids detected bythe proposed method and projected onto the color image. (d):The cuboids in the scene viewed from another perspective. Thesecuboids reveal important structures of the scene.

we design an efficient algorithm to match cuboid structuresin an indoor scene using the RGBD images, as illustrated inFig. 1.

A cuboid detector has many important applications. Itis a key technique to enable a robot to manipulate box ob-jects [1]. Cuboids also often appear in man-made struc-tures [3, 6, 9]. A cuboid detector thus facilitates findingthese structures. Detecting cuboids from RGBD imagesis challenging due to heavy object occlusion, missing dataand strong clutter. We propose an efficient and reliable lin-ear method to make a first step towards solving this prob-lem. Our contribution is a novel linear approach that effi-ciently optimizes multiple cuboid matching in RGBD im-ages. The proposed method works on cluttered scenes inunconstrained settings. Our experiments on thousands ofimages in the NYU Kinect dataset [8] and other imagesshow that the proposed method is efficient and reliable.

2. Method

We optimize the matching of multiple cuboids in aRGBD image from Kinect. Our goal is to find a set ofcuboids that match the RGBD image and at the same timesatisfy the spatial interaction constraint. We construct a setof cuboid candidates and select the optimal subset. Thecuboid configuration is denoted by x. We formulate cuboidmatching into the following optimization problem,

minx

{U(x) + λP (x) + μN(x)− γA(x) + ξO(x)} (1)

s.t. Cuboid configuration x satisfies global constraints.

−1.5−1−0.500.511.5

−0.5

0

0.5

1

1.5

x

y

1

1.2

1.4

00.1

0.2

−2.7

−2.6

−2.5

−2.4

−2.3

xy

z

1.2

1.4

1.6

0.20.3

0.40.5

−3

−2.9

−2.8

−2.7

−2.6

xy

z

0.20.3

0.40.1

0.20.3

0.40.5

−2

−1.9

−1.8

−1.7

−1.6

xy

z

−0.2

0

0.2

−0.050

0.050.1

−2

−1.9

−1.8

−1.7

−1.6

xy

z

−0.8−0.7

−0.6

−0.4

−0.2

0

−2.4

−2.3

−2.2

−2.1

−2

−1.9

−1.8

xy

z

−0.8

−0.6

−0.4

00.1

0.2

−2.2

−2.1

−2

−1.9

−1.8

xy

z

−0.7−0.6

−0.5−0.4

−0.3

0.40.5

0.60.7

0.8

−2.8

−2.7

−2.6

−2.5

−2.4

xy

z

−2.9

−2.8

−2.7

−2.6

−2.5

−0.2

−0.1

0

0.1

0.55

0.6

xy

z

0.60.7

0.8−0.2

−0.1

0

0.1

−3.2

−3.1

−3

−2.9

xy

z

−0.7

−0.6

−0.5

−0.4

1.3

1.4

1.5

1.6

−1.08−1.06−1.04

xy

z

−1.6

−1.5

−1.4

−1.3

0.7

0.8

0.9

1

−0.54−0.52−0.5−0.48−0.46

xy

z

0.4

0.6

0.750.8

0.850.9

0.95

−2.2

−2.1

−2

−1.9

−1.8

−1.7

xy

z

1.2

1.3

1.4

1.5

1.6

−0.4

−0.3

1.45

1.5

1.55

1.6

1.65

1.7

xy

z

0.60.7

0.80.9

−0.2

−0.1

0

0.1

2.5

2.55

2.6

2.65

2.7

xy

z

Figure 2. Row 1: From left to right are the color image, normalimage with the three channels containing the x, y and z compo-nents, the superpixels by using both the color and normal images,and the superpixels by using the normal image only. The two su-perpixel maps are extracted with fixed parameters in this paper.Row 2: Left shows the cuboids constructed using neighboring pla-nar patches and projected on the image; right shows the top 200cuboid candidates. Row 3: 3D view of the cuboids in the colorimage in row 2. The red and blue dots are the points from theneighboring surface patches. Row 4: The normalized poses ofthese cuboids with three edges parallel to the xyz axises.

−1−0.500.51−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x

y

−0.500.511.522.5−1

−0.5

0

0.5

1

1.5

2

x

y

Figure 3. Top 200 cuboid candidates and the cuboids matched bythe proposed method and projected on the color images.

Here U(x) is the unary term that quantifies the local match-ing costs of the cuboids, P (x) is a pairwise term that quan-tifies the intersection between pairs of cuboids, N(x) isthe number of matched cuboids in the scene, A(x) quanti-fies the covered area of the projected cuboids on the imageplane, and O penalizes the occlusions among the cuboids.λ, μ, γ and ξ control the weight among different terms. Byminimizing the objective function, we prefer to find themultiple cuboid matching that has low local matching cost,small object intersection and occlusion, and covers a largearea in the image with a small number of cuboids. Besidesthe soft constraints specified by the objective function, wefurther enforce that the optimal cuboid configuration x sat-isfies hard constraints on cuboid intersection and occlusion.This optimization is a combinatorial search problem, andwe propose an efficient linear solution.

Figure 4. Sample cuboid matching results with the proposedmethod on our captured data.

Figure 5. Comparison with 2D cuboid detection method [3]. Theproposed method’s result (Row 1) is more reliable than the resultfrom 2D images only (Row 2).

3. ConclusionWe describe a novel method to match cuboids in RGBD

images. The proposed linear method guarantees the globaloptimization of cuboid matching. For more details, pleaserefer to [14].

References[1] R. Truax, R. Platt, J. Leonard, “Using Prioritized Relaxations to Locate Ob-

jects in Points Clouds for Manipulation”, ICRA’11.[2] R. Schnabel, R. Wessel, R. Wahl, and R. Klein, “Shape Recognition in 3D

Point-Clouds”, WSCG’08.[3] J. Xiao, B.C. Russell, and A. Torralba, “Localizing 3D Cuboids in Single-

view Images”, NIPS’12.[4] D.C. Lee, A. Gupta, M. Hebert, and T. Kanade. “Estimating Spatial Layout of

Rooms using Volumetric Reasoning about Objects and Surfaces”, NIPS’10.[5] S.J. Dickinson, D. Metaxas, A. Pentland, “The Role of Model-Based Seg-

mentation in the Recovery of Volumetric Parts From Range Data”, TPAMI,vol.19, no.3, 1997.

[6] J. Xiao and Y. Furukawa, “Reconstructing the world’s museums”, ECCV’12.[7] M. Bleyer, C. Rhemann, C. Rother, “Extracting 3D Scene-Consistent Object

Proposals and Depth from Stereo Images”, ECCV’12.[8] N. Silberman, D. Hoiem, P. Kohli and R. Fergus, “Indoor Segmentation and

Support Inference from RGBD Images”, ECCV’12.[9] J. Xiao, B. C. Russell, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba,

“Basic level scene understanding: from labels to structure and beyond”, SIG-GRAPH Asia’12.

[10] P.F. Felzenszwalb and D.P. Huttenlocher, “Efficient Graph-Based Image Seg-mentation”, IJCV, vol.59, no.2, 2004.

[11] A. Gupta and A.A. Efros and M. Hebert, “Blocks World Revisited: ImageUnderstanding Using Qualitative Geometry and Mechanics”, ECCV’10.

[12] H.S. Koppula, A. Anand, T. Joachims, A. Saxena, “Semantic Labeling of 3DPoint Clouds for Indoor Scenes”, NIPS’11.

[13] N. Silberman and R. Fergus, “Indoor Scene Segmentation using a Struc-tured Light Sensor”, ICCV Workshop on 3D Representation and Recognition,2011.

[14] H. Jiang, and J. Xiao, “A Linear Approach to Matching Cuboids in RGBDImages”, CVPR’13.

a linear approach to matching cuboids in rgbd...

Documents