a linear approach to matching cuboids in rgbd...

2
A Linear Approach to Matching Cuboids in RGBD Images Hao Jiang Boston College [email protected] Jianxiong Xiao Massachusetts Institute of Technology [email protected] Abstract We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect. Beyond depth maps, these cuboids reveal important structures of a scene. Instead of directly fitting cuboids to 3D data, we first construct cuboid candidates using superpixel pairs on a RGBD image, and then we optimize the configuration of the cuboids to satisfy the global structure constraints. The optimal configuration has low local matching costs, small object intersection and occlusion, and the cuboids tend to project to a large region in the image; the number of cuboids is optimized simultaneously. We formulate the multiple cuboid matching problem as a mixed integer linear program and solve the optimization efficiently with a branch and bound method. The optimization guarantees the global optimal solution. Our experiments on the Kinect RGBD im- ages of a variety of indoor scenes show that our proposed method is efficient, accurate and robust against object ap- pearance variations, occlusions and strong clutter. 1. Introduction Finding three-dimensional structures and shapes from images is a key task in computer vision. Nowadays, we can obtain reliable depth map using low cost RGBD cameras from the digital consumer market, e.g. Microsoft Kinect, Asus Xtion and Primesense. Just like digital cameras that capture raw RGB data, these devices capture raw depth maps along with RGB color images. RGBD images pro- vide a pointwise representation of a 3D space. We would like to extract structures from such data. Recently, there are a few heroic efforts in extracting structures in RGBD images, e.g. [13, 12]. However, most of these approaches group pixels into surface segments, i.e. the counterpart of image segmentation for RGB images. Al- though some noteworthy studies [8] infer support relations in scenes, there is still very little volumetric reasoning used in the 3D space, which should be even more important as the depth is available, than pure image information with which volumetric reasoning is well studied [3, 9]. In this paper, (a) (b) (c) (d) Figure 1. Given a color image and depth map we match cuboid- shaped objects in the scene. (a) and (b): The color image and aligned depth map from Kinect. (c): The cuboids detected by the proposed method and projected onto the color image. (d): The cuboids in the scene viewed from another perspective. These cuboids reveal important structures of the scene. we design an efficient algorithm to match cuboid structures in an indoor scene using the RGBD images, as illustrated in Fig. 1. A cuboid detector has many important applications. It is a key technique to enable a robot to manipulate box ob- jects [1]. Cuboids also often appear in man-made struc- tures [3, 6, 9]. A cuboid detector thus facilitates finding these structures. Detecting cuboids from RGBD images is challenging due to heavy object occlusion, missing data and strong clutter. We propose an efficient and reliable lin- ear method to make a first step towards solving this prob- lem. Our contribution is a novel linear approach that effi- ciently optimizes multiple cuboid matching in RGBD im- ages. The proposed method works on cluttered scenes in unconstrained settings. Our experiments on thousands of images in the NYU Kinect dataset [8] and other images show that the proposed method is efficient and reliable. 2. Method We optimize the matching of multiple cuboids in a RGBD image from Kinect. Our goal is to find a set of cuboids that match the RGBD image and at the same time satisfy the spatial interaction constraint. We construct a set of cuboid candidates and select the optimal subset. The cuboid configuration is denoted by x. We formulate cuboid matching into the following optimization problem, min x {U (x)+ λP (x)+ μN (x) γA(x)+ ξO(x)} (1) s.t. Cuboid configuration x satisfies global constraints.

Upload: vukhuong

Post on 29-Apr-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Linear Approach to Matching Cuboids in RGBD Imagessunw.csail.mit.edu/2013/papers/Jiang_44_SUNw.pdf · A Linear Approach to Matching Cuboids in RGBD Images ... fies the covered

A Linear Approach to Matching Cuboids in RGBD Images

Hao JiangBoston College

[email protected]

Jianxiong XiaoMassachusetts Institute of Technology

[email protected]

Abstract

We propose a novel linear method to match cuboids inindoor scenes using RGBD images from Kinect. Beyonddepth maps, these cuboids reveal important structures of ascene. Instead of directly fitting cuboids to 3D data, wefirst construct cuboid candidates using superpixel pairs ona RGBD image, and then we optimize the configurationof the cuboids to satisfy the global structure constraints.The optimal configuration has low local matching costs,small object intersection and occlusion, and the cuboidstend to project to a large region in the image; the numberof cuboids is optimized simultaneously. We formulate themultiple cuboid matching problem as a mixed integer linearprogram and solve the optimization efficiently with a branchand bound method. The optimization guarantees the globaloptimal solution. Our experiments on the Kinect RGBD im-ages of a variety of indoor scenes show that our proposedmethod is efficient, accurate and robust against object ap-pearance variations, occlusions and strong clutter.

1. Introduction

Finding three-dimensional structures and shapes fromimages is a key task in computer vision. Nowadays, we canobtain reliable depth map using low cost RGBD camerasfrom the digital consumer market, e.g. Microsoft Kinect,Asus Xtion and Primesense. Just like digital cameras thatcapture raw RGB data, these devices capture raw depthmaps along with RGB color images. RGBD images pro-vide a pointwise representation of a 3D space. We wouldlike to extract structures from such data.

Recently, there are a few heroic efforts in extractingstructures in RGBD images, e.g. [13, 12]. However, mostof these approaches group pixels into surface segments, i.e.the counterpart of image segmentation for RGB images. Al-though some noteworthy studies [8] infer support relationsin scenes, there is still very little volumetric reasoning usedin the 3D space, which should be even more important as thedepth is available, than pure image information with whichvolumetric reasoning is well studied [3, 9]. In this paper,

(a) (b) (c) (d)Figure 1. Given a color image and depth map we match cuboid-shaped objects in the scene. (a) and (b): The color image andaligned depth map from Kinect. (c): The cuboids detected bythe proposed method and projected onto the color image. (d):The cuboids in the scene viewed from another perspective. Thesecuboids reveal important structures of the scene.

we design an efficient algorithm to match cuboid structuresin an indoor scene using the RGBD images, as illustrated inFig. 1.

A cuboid detector has many important applications. Itis a key technique to enable a robot to manipulate box ob-jects [1]. Cuboids also often appear in man-made struc-tures [3, 6, 9]. A cuboid detector thus facilitates findingthese structures. Detecting cuboids from RGBD imagesis challenging due to heavy object occlusion, missing dataand strong clutter. We propose an efficient and reliable lin-ear method to make a first step towards solving this prob-lem. Our contribution is a novel linear approach that effi-ciently optimizes multiple cuboid matching in RGBD im-ages. The proposed method works on cluttered scenes inunconstrained settings. Our experiments on thousands ofimages in the NYU Kinect dataset [8] and other imagesshow that the proposed method is efficient and reliable.

2. Method

We optimize the matching of multiple cuboids in aRGBD image from Kinect. Our goal is to find a set ofcuboids that match the RGBD image and at the same timesatisfy the spatial interaction constraint. We construct a setof cuboid candidates and select the optimal subset. Thecuboid configuration is denoted by x. We formulate cuboidmatching into the following optimization problem,

minx

{U(x) + λP (x) + μN(x)− γA(x) + ξO(x)} (1)

s.t. Cuboid configuration x satisfies global constraints.

Page 2: A Linear Approach to Matching Cuboids in RGBD Imagessunw.csail.mit.edu/2013/papers/Jiang_44_SUNw.pdf · A Linear Approach to Matching Cuboids in RGBD Images ... fies the covered

−1.5−1−0.500.511.5

−0.5

0

0.5

1

1.5

x

y

1

1.2

1.4

00.1

0.2

−2.7

−2.6

−2.5

−2.4

−2.3

xy

z

1.2

1.4

1.6

0.20.3

0.40.5

−3

−2.9

−2.8

−2.7

−2.6

xy

z

0.20.3

0.40.1

0.20.3

0.40.5

−2

−1.9

−1.8

−1.7

−1.6

xy

z

−0.2

0

0.2

−0.050

0.050.1

−2

−1.9

−1.8

−1.7

−1.6

xy

z

−0.8−0.7

−0.6

−0.4

−0.2

0

−2.4

−2.3

−2.2

−2.1

−2

−1.9

−1.8

xy

z

−0.8

−0.6

−0.4

00.1

0.2

−2.2

−2.1

−2

−1.9

−1.8

xy

z

−0.7−0.6

−0.5−0.4

−0.3

0.40.5

0.60.7

0.8

−2.8

−2.7

−2.6

−2.5

−2.4

xy

z

−2.9

−2.8

−2.7

−2.6

−2.5

−0.2

−0.1

0

0.1

0.55

0.6

xy

z

0.60.7

0.8−0.2

−0.1

0

0.1

−3.2

−3.1

−3

−2.9

xy

z

−0.7

−0.6

−0.5

−0.4

1.3

1.4

1.5

1.6

−1.08−1.06−1.04

xy

z

−1.6

−1.5

−1.4

−1.3

0.7

0.8

0.9

1

−0.54−0.52−0.5−0.48−0.46

xy

z

0.4

0.6

0.750.8

0.850.9

0.95

−2.2

−2.1

−2

−1.9

−1.8

−1.7

xy

z

1.2

1.3

1.4

1.5

1.6

−0.4

−0.3

1.45

1.5

1.55

1.6

1.65

1.7

xy

z

0.60.7

0.80.9

−0.2

−0.1

0

0.1

2.5

2.55

2.6

2.65

2.7

xy

z

Figure 2. Row 1: From left to right are the color image, normalimage with the three channels containing the x, y and z compo-nents, the superpixels by using both the color and normal images,and the superpixels by using the normal image only. The two su-perpixel maps are extracted with fixed parameters in this paper.Row 2: Left shows the cuboids constructed using neighboring pla-nar patches and projected on the image; right shows the top 200cuboid candidates. Row 3: 3D view of the cuboids in the colorimage in row 2. The red and blue dots are the points from theneighboring surface patches. Row 4: The normalized poses ofthese cuboids with three edges parallel to the xyz axises.

−1−0.500.51−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x

y

−0.500.511.522.5−1

−0.5

0

0.5

1

1.5

2

x

y

Figure 3. Top 200 cuboid candidates and the cuboids matched bythe proposed method and projected on the color images.

Here U(x) is the unary term that quantifies the local match-ing costs of the cuboids, P (x) is a pairwise term that quan-tifies the intersection between pairs of cuboids, N(x) isthe number of matched cuboids in the scene, A(x) quanti-fies the covered area of the projected cuboids on the imageplane, and O penalizes the occlusions among the cuboids.λ, μ, γ and ξ control the weight among different terms. Byminimizing the objective function, we prefer to find themultiple cuboid matching that has low local matching cost,small object intersection and occlusion, and covers a largearea in the image with a small number of cuboids. Besidesthe soft constraints specified by the objective function, wefurther enforce that the optimal cuboid configuration x sat-isfies hard constraints on cuboid intersection and occlusion.This optimization is a combinatorial search problem, andwe propose an efficient linear solution.

Figure 4. Sample cuboid matching results with the proposedmethod on our captured data.

Figure 5. Comparison with 2D cuboid detection method [3]. Theproposed method’s result (Row 1) is more reliable than the resultfrom 2D images only (Row 2).

3. ConclusionWe describe a novel method to match cuboids in RGBD

images. The proposed linear method guarantees the globaloptimization of cuboid matching. For more details, pleaserefer to [14].

References[1] R. Truax, R. Platt, J. Leonard, “Using Prioritized Relaxations to Locate Ob-

jects in Points Clouds for Manipulation”, ICRA’11.[2] R. Schnabel, R. Wessel, R. Wahl, and R. Klein, “Shape Recognition in 3D

Point-Clouds”, WSCG’08.[3] J. Xiao, B.C. Russell, and A. Torralba, “Localizing 3D Cuboids in Single-

view Images”, NIPS’12.[4] D.C. Lee, A. Gupta, M. Hebert, and T. Kanade. “Estimating Spatial Layout of

Rooms using Volumetric Reasoning about Objects and Surfaces”, NIPS’10.[5] S.J. Dickinson, D. Metaxas, A. Pentland, “The Role of Model-Based Seg-

mentation in the Recovery of Volumetric Parts From Range Data”, TPAMI,vol.19, no.3, 1997.

[6] J. Xiao and Y. Furukawa, “Reconstructing the world’s museums”, ECCV’12.[7] M. Bleyer, C. Rhemann, C. Rother, “Extracting 3D Scene-Consistent Object

Proposals and Depth from Stereo Images”, ECCV’12.[8] N. Silberman, D. Hoiem, P. Kohli and R. Fergus, “Indoor Segmentation and

Support Inference from RGBD Images”, ECCV’12.[9] J. Xiao, B. C. Russell, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba,

“Basic level scene understanding: from labels to structure and beyond”, SIG-GRAPH Asia’12.

[10] P.F. Felzenszwalb and D.P. Huttenlocher, “Efficient Graph-Based Image Seg-mentation”, IJCV, vol.59, no.2, 2004.

[11] A. Gupta and A.A. Efros and M. Hebert, “Blocks World Revisited: ImageUnderstanding Using Qualitative Geometry and Mechanics”, ECCV’10.

[12] H.S. Koppula, A. Anand, T. Joachims, A. Saxena, “Semantic Labeling of 3DPoint Clouds for Indoor Scenes”, NIPS’11.

[13] N. Silberman and R. Fergus, “Indoor Scene Segmentation using a Struc-tured Light Sensor”, ICCV Workshop on 3D Representation and Recognition,2011.

[14] H. Jiang, and J. Xiao, “A Linear Approach to Matching Cuboids in RGBDImages”, CVPR’13.