a search-classify framework for cluttered scene...

A Search-Classify Framework for Cluttered Scene Understanding

Liangliang NanSIAT, China

[email protected]

Ke XieSIAT, China

[email protected]

Andrei SharfBen Gurion University, Israel

[email protected]

Abstract

We present a search-classify framework which inter-leaves segmentation and classification in an iterative man-ner. Using a robust classifier we traverse the scene andgradually propagate classification information. We rein-force classification by a template fitting step which yieldsa scene reconstruction. We deform-to-fit templates to clas-sified objects to resolve classification ambiguities. The re-sulting reconstruction is an approximation which capturesthe general scene arrangement. We demonstrate the effec-tiveness of the framework for cluttered indoor scenes.

1. Introduction

3D scans of large scale environments are relatively newand were made possible due to recent progress in scanningtechnology. Many algorithms have been proposed for pro-cessing scanned scenes [1, 2, 3, 4, 5, 6, 7], while under-standing scanned scenes still remain a challenge.

We propose a framework that is capable of understandingand modeling raw scans of cluttered scenes (see Fig. 1, 2).We argue that object classification cannot be directly ap-plied to the scene, since object segmentation is unavailable.Moreover, the segmentation of the scene into objects is aschallenging as the classification since spatial relationshipsbetween points and patches are neither complete nor reli-able. Our key idea is to interleave the computations of seg-mentation and classification of the scene into meaningfulparts. We denote this approach search-classify, since wesearch for meaningful segments using a classifier that esti-mates the probability of a segment to be part of an object.

2. Search-Classify Framework

The key idea underlying our search-classify frameworkis a controlled region growing process which searches formeaningful objects in the scene by accumulating surfacepatches with high classification likelihood. In each step,we query accumulated parts with our classifier and obtain aset of likelihood probabilities for different classes. We pro-

ceed by growing regions with highest likelihood probability.We further reinforce classification by template fitting wheretemplates are deformed-to-fit classified objects in order tosolve ambiguous cases. Using fitting error, we can detec-t outliers and misclassified parts and re-iterate the search-classify process. An immediate outcome of this step is anapproximated scene reconstruction by deformed templateswhich captures the general objects’ arrangements.

These two algorithmic components perform in a feed-back loop, where initial classification is refined by templatefitting which in turn is reevaluated by classification (seeFig. 1).

Figure 1. Block-diagram overview of our framework.

2.1. Preprocessing

In the off-line learning stage we train a classifier on alarge set of both clean 3D digital models and manually seg-mented scans, using our designed point cloud feature. Formore details of the point cloud feature, please refer to [7].Given a raw scan of an indoor scene, we initially over-segment the scene into smooth patches and compute an ad-jacency graph between parts(see Fig. 3 left).

1

Figure 2. A zoom of a cluttered scene reveals that accurate segmentation and classification are challenging, even for human perception.We over segment the scene (mid-left) and search-classify meaningful objects in the scene (mid-right), that are reconstructed by templates(right) overcoming the high clutter.

Figure 3. Visualization of graph traversal and classification. Left-to-right, from a graph defined on initial patches, we select an initialobject seed with above threshold classification confidence (mid-left). We traverse the graph in directions where classification confidenceincreases (number value, also blue color intensity). In rightmost figure, we show a neighboring patch (table-side) causing a steep decreasein classification confidence, hence we do not accumulate.

2.2. Controlled Region Growing

We start from a set of random seeds defined by patchtriplets. The region growing performs from the initial seeds,by traversal of their adjacent segments and accumulatingsegments into significant objects. For each set of segments(representing a potential object), we attempt to accumulateadjacent segments by querying our classifier with the newset for likelihood probability value. We grow a set if itslikelihood value is non-decreasing (see Fig. 3).

2.3. Template Fitting

The above process is not perfect since objects may stil-l overlap due to ambiguities in cluttered regions. We fita deformable template to the classified point cloud aimingat minimizing their one-sided Hausdorff distance (points totemplate). Thus, incorrectly segmented parts (outliers) willhave a low fitting score to template (see red loop in Fig. 1).More details about template fitting can be found in [8];

3. Conclusions and Future Work

We have presented a framework for cluttered scene un-derstanding. Although we tested our algorithm on indoorscenes, the framework can be extended to more generalscenes, such as outdoor environment. In future work, weplan to extend this model to incorporate contextual infor-mation between different objects.

References[1] D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gup-

ta, G. Heitz, and A. Ng. Discriminative learning of markovrandom fields for segmentation of 3d scan data. In CVPR’05- Volume 2 - Volume 02, CVPR’05, pages 169–176, 2005. 1

[2] A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik. Rec-ognizing objects in range data using regional point descrip-tors. In ECCV’04, May 2004. 1

[3] A. Golovinskiy, V. G. Kim, and T. Funkhouser. Shape-basedrecognition of 3D point clouds in urban environments. IC-CV’09, sep 2009. 1

[4] Y. M. Kim, N. J. Mitra, D. Yan, and L. Guibas. Acquiring 3dindoor environments with variability and repetition. In SIG-GRAPH Asia’12, 2012. 1

[5] Y. Livny, F. Yan, M. Olson, B. Chen, H. Zhang, and J. El-Sana.Automatic reconstruction of tree skeletal structures from pointclouds. ACM Trans. Graph., 29:151:1–151:8, 2010. 1

[6] L. Nan, A. Sharf, H. Zhang, D. Cohen-Or, and B. Chen. S-martboxes for interactive urban reconstruction. 1

[7] L. Nan, K. Xie, and A. Sharf. A search-classify approach forcluttered indoor scene understanding. SIGGRAPH Asia’12),31(6), 2012. 1

[8] Y. Zheng, H. Fu, D. Cohen-Or, O. K.-C. Au, and C.-L. Tai.Component-wise controllers for structure-preserving shapemanipulation. In Computer Graphics Forum, volume 30,pages 563–572. Wiley Online Library, 2011. 2

a search-classify framework for cluttered scene...

Documents