ieee transactions on robotics and automation 1 a...

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION 1

A Real-Time Expectation Maximization Algorithmfor Acquiring Multi-Planar Maps of Indoor

Environments with Mobile RobotsSebastian Thrun, Christian Martin, Yufeng Liu, Dirk Hahnel,

Rosemary Emery-Montemerlo, Deepayan Chakrabarti, Wolfram Burgard

Abstract— This paper presents a real-time algorithm for ac-quiring compact 3D maps of indoor environments, using a mobilerobot equipped with range and imaging sensors. Building onprevious work on real-time pose estimation during mapping [1],our approach extends the popular expectation maximizationalgorithm [2] to multi-surface models, and makes it amenableto real-time execution. Maps acquired by our algorithm consistof compact sets of textured polygons that can be visualizedinteractively. Experimental results obtained in corridor-typeenvironments illustrate that compact and accurate maps can beacquired in real-time and in a fully automated fashion.

Index Terms— Robotic mapping, mobile robots, perception,statistical techniques

I. INTRODUCTION

THIS paper presents a real-time algorithm for generatingthree-dimensional maps of indoor environments, from

range and camera measurements acquired by a mobile robot.A large number of indoor mobile robots rely on environmentmaps for navigation [3]. Most all existing algorithms foracquiring such maps operate in 2D. 2D maps may appearsufficient for navigation, given that most indoor mobile robotsare confined to two-dimensional planes. However, modelingan environment in 3D has two important advantages: first, 3Dmaps facilitate the disambiguation of different places, since3D models are richer than 2D models; second, 3D maps arebetter suited for users interested in the interior of a building,such as architects or human rescue workers that would like tofamiliarize themselves with an environment before entering it.For these and other reasons, modeling buildings in 3D has beena long-standing goal of researchers in computer vision [4],[5], [6], [7]. Generating such maps with robots would makeit possible to acquire maps of environments inaccessible topeople [8], [9], such as abandoned mines that have recentlybeen mapped in 3D by mobile robots [10].

In robotic mapping, moving from 2D to 3D is not just atrivial extension. The most popular paradigm in 2D mappingto date are occupancy maps [11], [12], which represent envi-ronments by fine-grained grids. While this is feasible in 2D,in 3D the complexity of these representations pose seriousscaling limitations [12]. Other popular representations in 2Dinvolve point clouds [13], [14] or line segments [15]. Linerepresentations have been generalized to 3D by representing

S. Thrun is with Stanford University, Y. Liu, R. Emery-Montemerlo, andD. Chakrabarti are with Carnegie Mellon University, C. Martin is with theUniversity of Ilmenau, and W. Burgard and D. Hahnel are with the Universityof Freiburg.

maps through sets of fine-grained polygons [1], [9]. Theresulting maps are often quite complex, and off-the-shelfcomputer graphics algorithms for mesh simplification [16]tend to generate maps that are visually inaccurate [1].

This paper presents an algorithm for recovering low-complexity 3D models from range and camera data that specif-ically exploits prior knowledge on the shape of basic buildingelements. In particular, our approach fits a probabilistic modelthat consists of large rectangular, flat surfaces to the datacollected by a robot. Areas in the map that are not explainedwell by flat surfaces are modeled by small polygons (asin [9]), enabling our approach to accommodate non-flat areasin the environment. The resulting maps are less complex thanthose generated by the previous approaches discussed above.Moreover, by moving to a low-complexity model, the noise inthe resulting maps is reduced—which is a side-effect of thevariance reduction by fitting low-complexity models.

To identify low-complexity models, the approach presentedhere uses a real-time variant of the expectation maximization(EM) algorithm [2], [17]. Our algorithm simultaneously esti-mates the number of surfaces and their locations. Measure-ments not explained by any surface are mapped onto fine-grained polygonal representations, enabling our approach tomodel non-planar artifacts in the environment. The resultingmap is represented in VRML format (virtual reality markuplanguage), with texture superimposed from a panoramic cam-era.

Our approach rests on two key assumptions. First, it assumesthat a good estimate of the robot pose is available. Theissue of pose estimation (localization) in mapping has beenstudied extensively in the robotics literature [18]. In all ourexperiments, we use a real-time algorithm described in [1]to estimate pose; thus, our assumption is not unrealistic atall, but it lets us focus on the 3D mapping aspects of ourwork. Second, we assume that the environment is largelycomposed of flat surfaces. The flat surface assumption leadsto a convenient close-form solution of the essential steps ofour EM algorithm. Flat surfaces are commonly found in indoorenvironments, specifically in corridors. We also notice that ouralgorithm retains measurements that cannot be mapped ontoany flat surface and maps them into finer grained polygonalapproximations. Hence, the final map may contain non-flatregions in areas that are not sufficiently flat in the physicalworld.

Our approach has been fully implemented using the mo-bile robot shown in Figure 1a. This robot is equipped with


(a) (b)

Fig. 1. Mobile robot, equipped with two 2D laser range finders and apanoramic camera. The camera uses a panoramic mirror mounted only a fewcentimeters away from the optical axis of the laser range finder.

a forward-pointed laser range finder for localization duringmapping, an upward-pointed laser range finder for structuralmapping, and a panoramic camera for recording the texture ofthe environment (see Figure 1b). The system has been testedin several different buildings. Our results illustrate that thealgorithm is effective in generating compact and accurate 3Dmaps in real-time.

II. GENERATIVE PROBABILISTIC MODEL

A. World Model

In our approach, 3D maps are composed of rectangular flatsurfaces, representing doors, walls, ceilings, plus sets of smallpolygons representing non-flat surfaces. We will denote the setof rectangular flat surfaces by

�, where� � ��

(1)

Here � is the total number of rectangular surfaces��

. Each��

is described by a total of nine parameters, arranged in threegroups: ��

(2)

The vector� �

is the three-dimensional surface normal of thesurface; the value

��is the one-dimensional offset between

the surface and the origin of the coordinate system; and� �

are five parameters specifying the size and orientation of therectangular area within the (infinite) planar surface representedby

��and

��.

B. Measurements

Measurements are obtained using a laser range finder. Eachrange measurement is projected into 3D space, exploiting thefact that the robot pose is known. The 3D coordinate of the�-th range measurement will be denoted

��! "$# (3)

We denote the set of all measurements by% � � � � � (4)

The Euclidean distance of any coordinate �� in 3D space toany surface

��will be denoted&('

� � ��) (5)

In our implementation, we distinguish two cases: The casewhere the orthogonal projection of � � falls into the rectangle,and the case where it does not. In the former case,

&('�*� �� )

Fig. 2. Illustration of the parameters in the planar surface model, shownhere for one surface.

is given by��,+ � �.- ��

; in the latter case,

&/'� � ��) is the

Euclidean distance between the bounding box of the rectangleand � � , which is either a point-to-line distance or a point-to-point distance.

C. Correspondences

In devising an efficient algorithm for environment mapping,it will prove convenient to make explicit the relation betweenindividual measurements �� and the different components

� �of

the model. This is achieved through correspondence variables.For each measurements 0 � , we define there to be �2143 binarycorrespondence variables, collectively referred to as 0 � .

0 � � ��5 �76 �5 � ��5 �98 ��:�5 � � � (6)

The vector 0 � specifies which part of the model�

“causes”the measurement � � . Each of the variables in 0 � is binary.The variable

5 � � (for 3<;>=?;@� ) is 1 if and only if the�-th

measurement � � corresponds to the = -th surface in the map,��

.If the measurement does not correspond to any of the surfacesin the map, the “special” correspondence variable

5 �76 is 1. Thismight be the case because of random measurement noise, ordue to the presence of non-planar objects in the world.

Naturally, each measurement is caused by exactly one ofthose �A1B3 possible causes. This implies that the correspon-dences in 0 � sum to 1:

5 �C6 1�D�EF� 5 � �� 3 (7)

Our algorithm below involves a step in which probabilitiesover correspondences are calculated from the data.

D. Measurement Model

The measurement model ties together the volumetric mapand the measurements

%. The measurement model is a proba-

bilistic generative model of the measurements given the world:G'� �IH 0 � ��) (8)

where 0 � is the correspondence vector of the�-th measure-

ment, and�

is the set of planar surfaces. Our approach assumesGaussian measurement noise. Suppose

5 � �J� 3 , that is, themeasurement �� corresponds to the surface

� �in the model.


The error distribution is then given by the following normaldistribution with variance parameter �

G'� �IH 5 � �2� 3 �� )�� 3� �� 8�� (9)

Notice that the log likelihood of this normal distributionis proportional to the squared Euclidean distance

&('�*� �� )

between the measurement � � and the surface��

.The normal distributed noise is a good model if a range

finder succeeds in detecting a flat surface. Sometimes, how-ever, the object detected by a range finder does not correspondto a flat surface, that is,

5 �C6 � 3 . In our approach, wemodel such events using a uniform distribution over the entiremeasurement range:

G'� �H 5 �76 � 3 ��)�� 3 � �"!$#&% if ' ; � � ; �"!$#�%' otherwise

(10)

The interval ( '*) �"!$#�%"+ denotes the measurement range of therange finder. The uniform noise model is clearly just a crudeapproximation, as real measurement noise is not uniform.However, uniform distributions are mathematically convenientand provide excellent results.

For reasons that shall become apparent below, we notethat the uniform density in Equation (10) can be rewrittenas follows:

3� !$#�% � 3� �� 8,� � .- /�0 �1 2*354 56 � (11)

Clearly, a uniform noise model is somewhat simplistic; how-ever, it is mathematically convenient and was found to workwell in our experiments.

III. THE LOG-LIKELIHOOD FUNCTION

To devise a likelihood function suitable for optimization, itshall prove useful to express the sensor model as the followingexponential mixture:

G'� �H 0 � ��) � 3� �� 8�� 87:9 �:; - /�0 � 2<3=4 56 � ?>A@ � 9 �B� � �B� � � � � �� C

(12)

This form follows directly from Equations (9) and (11) andthe assumption that exactly one variable in 0 � is 1, whereas allothers are zero. This form of the measurement model enablesus to devise a compact expression of the joint probability of ameasurement � � along with its correspondence variables 0 � :D�E�FHG�I&J8GLK MONQP RE5S.T R NLU VHW<X.Y[ZO\ � 7^] �^;`_ a1b � 2*354 56 � dc @ � ] �� C

(13)

Assuming independence in measurement noise, the likelihoodof all measurements

%and their correspondences 0 � � � 0 � �

is then given by

G' % � 0 H ��) (14)

� e � 3'�F1 3 ) � � � � 8 � � A7f9 �^; - /�0 � 2*354 �6 � ?>A@ � 9 �� B��1� �=�� C

This equation is simply the product of (13) over all measure-ments �� .

In EM, it is common practice to maximize the log-likelihoodinstead of the likelihood (14), exploiting the fact that thelogarithm is monotonic in its argument:gBh"i D�E=jkI&JlK M"NmP (15)D Gon gBh"i RE5S.T R NLU VpW*X�Yrq RV*s G^t gBh"i F Yuwv�xVpW*X Y q RV D y s G ydz Y E�F G ILM y NX Y {Finally, while the formulas above all compute a joint over mapparameters and correspondence, all we are actually interestedin are the map parameters. The correspondences are onlyinteresting to the extent that they determine the most likelymap

�. Therefore, the goal of estimation is to maximize the

expectation of the log likelihood (15), where the expectationis taken over all correspondences 0 . This value, denoted|~} ( �^�� G ' % � 0 H ��) + , is the expected log likelihood of the datagiven the map with the correspondences integrated out. It isobtained directly from Equation (15):| } ( �� G ' % � 0 H ��) H % �� +

� D� n �^�� 3'

�F1 3 ) � �� 8 - 3� | ( 5 �76 H �� + �� 8!$#�%� � � 8- 3� D � | ( 5 � � H � � �� + & 8 ' � � ��)� 8 �� (16)

In [19], it is shown that maximizing this expectation indeedmaximizes the log likelihood of the data.

IV. LIKELIHOOD MAXIMIZATION VIA EM

The expected log-likelihood (16) is maximized using EM,a popular method for hill climbing in likelihood space forproblems with latent variables [2]. EM generates a sequence ofmaps,

�d� �&� ��?� � � ��?� 8 � ��. Each map improves the log-likelihood

of the data over the previous map until convergence. Morespecifically, EM starts with a random map

�*� �&�. Each new

map is obtained by executing two steps: an E-step, where theexpectations of the unknown correspondences

| ( 5 � � H � � � � � � � +and

| ( 5 �C6 H �?� �O� � � � + are calculated for the � -th map�d� �O�

, andan M-step, where a new maximum likelihood map

�*� � > � � iscomputed under these expectations.

A. The E-Step

In the E-step, we are given a map�*� �O�

for which we seekto determine the expectations

| ( 5 � � H �?� �O� � � � + and| ( 5 �C6 H �d� � � � � � +

for all� � = . Bayes rule, applied to the sensor model, gives us a

way to calculate the desired expectations (assuming a uniformprior over correspondences for mathematical convenience):� � �O�� | ( 5 � � H � � �O� � � � + � G

' 5 � � H � � �O� � � � )� G

'� � H � � �O� � 5 � ��) G ' 5 � � H � � �O� )

G'� � H � � �O� )

� �� B��1� �=�� .- /�0 � 2*354 �6 � 1 @�� 5�L� �5�H�� (17)


and similarly� � �O��C6 � � | ( 5 �C6 H � � �O� � � � +� � �� - /�0 �1 2*354 �6 � �� .- /�0 � 2*354 �6 � 1 @�� B�� H�� (18)

As pointed out in [17], [19], substituting these expectationsinto the log-likelihood (16) lower-bounds the log-likelihoodby a function tangent to it at

�*� �O�.

B. The M-Step

In the M-step, this lower bound is optimized. Morespecifically, we are given the expectations � � �O�� and � � � ��76and seek to calculate a map

�*� � > � � that maximizes theexpected log-likelihood of the measurements, as given byEquation (16). In other words, we seek surface parame-ters

�� > � � ��$� � > � � � � � � > � �7� that maximize the expected log-likelihood of the map under fixed expectations � � �O�� and � � �O��C6 .

Obviously, many of the terms in (16) do not depend on themap parameters

�. This allows us to simplify (16) and instead

carry out the following minimization:� � � > � � � �� D�D � � � � �� & 8 ' � � ��) (19)

The actual M-step proceeds in two steps. First, our approachdetermines the parameters

� � � > � ��and

� � � > � ��, which spec-

ify the principal orientation and location of the rectangularsurface without the surface boundary. If walls are assumedto be boundless, the minimization (19) is equivalent to theminimization� � � � > � � �� > � � � � �� D

�D � � � �O�� ' � � + �� - � � ) 8 (20)

subject to the normality constraints�F� + � � � 3 , for all = .

This quadratic optimization problem is commonly solved viaLagrange multipliers � � for = � 3 ��:� � [20], [21]:� � � D

�D � � � �O�� ' � � + � � - � ��) 8 1 D � � �� +�� (21)

Obviously, for each minimum of�

, it must be the case that�� ' and�� ' . Setting the derivatives of

�to zero

leads to the linear system of equalities:D� � � �O�� ' � � � > � �� + �� - � � � > � �� ) �� - � � � � � > � �� ' (22)

D� � � �O�� ' � � � > � �� + �� - � � � > � �� )J� ' (23)

� � � > � �� +� � � > � �� 3 (24)

The values of� � � > � ��

is obtained from Equations (22) and (23):

� � � > � �� @ � � � � �� > � �� + � �@ � � � � �� (25)

Substituting those back into (22) gives us

D� � � �O�� > � �� + �� - @ � � � �O�� > � �� + � �@�� O�� > � ��

(26)

This is a set of linear equations of the type� � � �� +� � � > � �� > � ��(27)

where each� � � ��

is a 3 � 3 matrix whose elements are asfollows:� � �O�� D

� � � �O�� - @ � � � �O�� @ � � � �O�� @�� O�� (28)

for � �� 3 � � �� , subject to (24). It is now easy to seethat each solution of (27) must be an eigenvector of

� � � ��.

The two eigenvectors with the largest eigenvalues describethe principle orientation of the surface. The third eigenvector,which corresponds to the smallest eigenvalue, is the normalvector of this surface, hence our desired solution for

� � � > � ��.

Finally, the M-step calculates new bounding boxes� � � > � ��

.It does so by determining the minimum rectangular box on thesurface which includes all points whose maximum likelihoodassignment is the = -th surface

��:� �� such that = � �� "!��# � %$%$%$% � 6 � � �O�� (29)

This optimization problem does not possess an easy closed-form solution. Our approach probes the orientation in 1-degreeintervals, then calculates he tightest bounding box for eachorientation. The bounding box with the smallest enclosedsurface volume is finally selected. This step leads to a near-optimal rectangular surface that contains all measurements thatmost likely correspond to the surface at hand.

C. Determining the Number of Surfaces

Parallel to computing the surface parameters, our approachdetermines the number of surfaces � . Our approach is based ona straightforward Bayesian prior that penalizes complex mapsusing an exponential prior, written here in log-likelihood form:

G' � H % )'& G

' % H ��) -)( � (30)

Here ( is a constant factor. The final map estimator is, thus, amaximum posterior probability estimator (MAP), which com-bines the complexity-penalizing prior with the data likelihoodcalculated by EM. In practice, this approach implies thatsurfaces not supported by sufficiently many data measurements(weighted by their expectation) are discarded. This makesit possible to choose the number of rectangular surfaces �concurrently with the execution of the EM algorithm.

In our implementation, the search for the best � is inter-leaved with running the EM algorithm. The search involves astep for creating new surfaces, and another one for terminatingsurfaces, both executed in regular intervals (every 20 iterationsin our offline implementation). In the surface creation step,new surfaces are created based on measurements � � that arepoorly explained by the existing model. A measurement � �is considered poorly explained if its value � �C6 exceeds athreshold, indicating that none of the planar surfaces in themodel explain the measurement well. A new surface is startedif three adjacent measurements are poorly explained; the initialparameters of this new surface are then uniquely determinedthrough the coordinates of these three measurements. The


Fig. 3. Polygonal map generated from raw data, not using EM. The displaywithout texture shows the level of noise involved. In particular, it illustratesthe difficulty of separating the door from the nearby wall (as achieved byEM).

number of new surfaces is kept limited at each iteration,using random selection among all candidates if the number ofcandidate surfaces exceeds the total limit. The new surfacesare then added to the model and treated identically to all othersurfaces in subsequent iterations of EM.

In the termination step, each surface undergoes a posteriorevaluation using the criterion set forth in Equation (30). Ifremoving a surface or fusing it with a nearby surface increasesthe posterior in Equation (30), the corresponding action istaken. Otherwise, it is retained in the model. In this way, onlysurfaces supported by sufficiently many data points survivethe selection process, avoiding the overfitting that inevitablywould occur without a complexity penalty term.

D. Texture Mapping and Visualization

Textures are extracted from the panoramic camera, alonga stripe shown in Figure 1b that corresponds to the rangemeasurement taken by the vertical laser range finder. Thesestripes are collected at frame rate and pasted together intoraw texture maps. These maps are then mapped onto theplanar surfaces in real-time, using a technique analogous tothe one described in [22]. Textures of the same feature in theenvironment recorded at different points in time are presentlynot merged due to tight computational constraints—which isa clear shortcoming of our present implementation.

The final model contains all surfaces in�; however, it lacks

information on non-flat objects in the environment. The final3D visualization of the environment is enriched by fine-grainedpolygonal models of such non-flat objects. In particular, ourapproach analyzes all measurements whose most likely corre-spondence is the class “ � .” For any occurrence of three nearbysuch measurements. a small triangle is introduced into the final3D visualization. In this way, the visualization contains notonly large flat surfaces, but also fine-grained polygonal modelsof non-flat objects in the environment. The final output of ourprogram is a VRML file, which makes it possible to renderthe 3D model using off-the-shelf software.

V. ONLINE EM

Our approach has been extended into an online algorithmthat enables robots to acquire compact 3D models in real-time (see [23]). EM, as presented so far, is inherently offline.It required multiple passes through the data set. As the data

set grows, so does the computation per iteration of EM.This limitation is a common limitation of the vanilla EMalgorithm [19].

The key insight that leads to an online implementation isthat during each fixed time interval, only a constant number ofnew range measurements arrive (collectively referred to as � � ).If we begin our estimation with the model acquired in the first� - 3 measurements, only constantly many new measurementvalues have to be incorporated into the model in responseto observing � � . Such a routine would be strictly incremental;however, it would fail to adjust correspondence variables basedon data acquired at a later point in time. Our approach isslightly more sophisticated in that it adjusts past expectations,but it does this in a way that still conforms to constant updatetime.

A. Online E-Step

The online E-step considers only a finite subset of allmeasurements, and only calculates expectations for the cor-respondence variables of those measurements. In particular, itincludes all new range measurements, of which there are onlyfinitely many. Additionally, it includes past measurements thatmeet the following condition: They lie at the boundary of twosurfaces (judging from their maximum likelihood assignments)or they are entirely unexplained by any existing surface.These conditions alone are insufficient to assure constanttime computation. Thus, we also attach a counter to eachmeasurement that keeps track of the total number of timesit was considered in an E-step calculation. If this counterexceeds a threshold, the expected correspondence value isfrozen and never recalculated again. It is easy to see that thelast condition ensures constant time update (in expectation).The former condition identifies “interesting” measurements,which greatly reduces the constant factor in the computationalcomplexity.

B. Online M-Step

The online M-step is slightly more complex. This is becauseas the data set grows, so do two things: the number of modelcomponents � and the number of measurements � � associatedwith each model component. Each of these expansions haveto be controlled to ensure that the M-step can be executed inreal-time.

In the online M-step, only a small number of “active”surfaces are re-estimated. We say that a surface

� �is active

at time�

if the E-step executed at that time changed, forany of the updated measurements, the maximum likelihoodcorrespondence to or away from the surface

��. Clearly, this

is an approximation, in that any adjustment of the expectedcorrespondence affects all model parameters, at least in prin-ciple. However, our approach is a good approximation thatlimits the number of surfaces considered in the M-step to aconstant number.

Finally, our approach addresses the fact that the numberof points involved in each maximization grows over time.In conventional EM, every measurement participates in thecalculation of all surface parameters, though most make only


a negligible contribution. To reduce the number of measure-ments used in the M-Step, our online approach considers onlythose whose maximum likelihood data association correspondsto the model component in question. This is analogous tothe well-known k-means algorithm [24] for clustering, whichapproximates the EM algorithm using hard assignments. Fur-thermore, if the number of maximum likelihood measurementsexceeds a threshold, our approach randomly subsamples thosemeasurements. By doing so, the amount of computation whenrecalculating the parameters of a model component is boundedby a constant number, as is the overall computational complex-ity of the M-Step.

C. Online Model Selection

Finally, our approach also implements Bayesian modelselection in real-time. New surfaces are introduced for newmeasurements which are not “explained” by any of the existingsurfaces in the map. A measurement is explained by anexisting surface if its likelihood of having been generated fromthat surface is above a threshold; otherwise it will be used toseed a new surface. Surfaces are removed from the map ifafter a fixed number of iterations they are not supported bysufficiently many measurements, in accordance with the mapcomplexity penalty factor ( set forth in Equation (30). As aresult, the number of surfaces � increases with the complexityof the environment, while all computation can still be carriedout online, irrespective of the total map size.

The resulting algorithm is strictly incremental. It has beenimplemented on a low-end laptop PC, where it is able toconstruct compact 3D maps in real-time.

VI. EXPERIMENTAL RESULTS

Our experiments were carried out in two stages: First,we implemented and evaluated the offline EM algorithm.This evaluation served us to establish the basic capability ofextracting large planar surfaces from complex data sets usingEM. Our second set of experiments was carried out in real-time using the online version of our approach.

All computation in these experiments was carried out on-board the moving robot platform shown in Figure 1. Asnoted above, the robot is equipped with two SICK PLSlaser range finders, one pointed forward and one pointedupward, perpendicular to the robot’s motion direction. TheSICK sensor covers an area of 180 degrees with a one-degreeangular resolution. The range accuracy is described by themanufacturer as

��centimeters. The sampling rate in our

experiments is approximately 5 Hertz (entire scans).

A. Offline EM

The offline version of EM was tested using a data setacquired inside a university building. The data set consistsof 168,120 range measurements and 3,270 camera images,collected in less than two minutes. In a pre-alignment phase,our software extracted 3,220,950 pixels, all of which directlycorrespond to range measurements. Nearby scans were con-verged into polygons. Figure 3 shows a detail of the resultingmap without the texture superimposed. This figure clearlyillustrates the high level of noise in the raw data. The left

(a) Fine-grained polygonal mapsgenerated from raw data

(b) Low-complexity mapsgenerated using EM

Fig. 4. 3D map generated (a) from raw sensor data, and (b) using the offlineversion of EM. In this map. 94.6% of all measurements are explained by 7surfaces. Notice that the map in (b) is smoother and appears to be visuallymore accurate than the one in (a).

column of Figure 4 shows three views of the map with texturesuperimposed.

The result of our EM algorithm is shown in the right columnof Figure 4. This specific map contains of � ��

flat sur-faces, which together account for 94.6% of all measurements.Clearly, the new map is smoother and visually more accurate.It also models non-planar regions well, as manifested by atrash-bin in the top panel of that figure, which is not modeledby a planar surface. The corresponding measurements arenot mapped to any of the surfaces

� � . Figure 5 show thecorresponding map segment, illustrating that importance ofmerging flat surfaces with fine-grained polygons for modelingbuilding interiors. Notice also that the wall surface is differentfrom the door surface. A small number of measurements in thedoor surface are erroneously assigned to the wall surface. Theposter board shown in various panels of Figure 4 highlightsonce again the benefits of the EM approach over the raw datamaps: Its visual accuracy is higher thanks to the planar modelon which the texture is being projected.

Figure 6 shows the number of surfaces � and the numberof measurements explained by these surfaces, as a functionof the iteration. Every 20 steps (in expectation), surfaces areterminated and restarted. After only 500 iterations, the numberof surfaces settles around a mean (and variance) of 8.76

�1.46,

which explains a steady 95.5�

0.006% of all measurements.


Fig. 5. Model of a trash bin in the final map, with a large planar rectangularsurface patch in the background. Our algorithm recognizes that this objectcannot be explained by planar surfaces with sufficient likelihood, hence itretains the fine-grained polygonal representation.

2,000 iterations require approximately 20 minutes computationon a low-end PC.

B. Real-Time Implementation

The real-time implementation using online EM was evalu-ated more thoroughly and in several different buildings. Over-all, we found our approach to be highly reliable in generatingaccurate maps. Figure 7 illustrates the online creation ofcompact maps. Shown in the left column are maps generateddirectly from the raw measurement data by creating polygonsfor any set of nearby measurements. The right column showsa sequence of maps built in real-time, using our incrementalEM algorithm. We note that this specific illustration showsonly vertical surfaces; however, our implementation handlessurfaces in arbitrary orientations.

As it is easily seen, the new map is more compact than theraw data map. A small number of surfaces suffices to modelthe vast amounts of data. Close inspection of Figure 7 illus-trates EM at work. For example, a small region in Figure 7ihas not been identified as belonging to a flat wall, simplybecause the amount of noise in the rectangular surface belowthis patch is too large. Further optimization of this past dataleads to a more compact map, as illustrated in Figure 7j.

Figure 8 shows views of a compact 3D map acquired inreal-time, for the same environment as reported previously.The entire data collection requires less than 2 minutes, duringwhich all processing occurs.

As is easily seen, the identification of rectangular surfacesin the environment has a positive effect of the visual acuity ofthe map. Figure 9 shows inside projections of a 3D map builtby EM and compares it to a map built without EM, using fine-grained polygonal maps, and using data sets obtained in twodifferent university buildings. Obviously, the visual accuracyof the texture projected onto flat surfaces is higher than therenderings obtained from the fine-grained polygonal map. Thisillustrates once again that the resulting map is not only morecompact but it also provides more visual detail than the mapcreated without our approach.

In an attempt to quantitatively evaluate our approach, wemapped three different corridor environments in differentbuildings. The complexity of those environments was compa-rable to the maps shown here. The number of initial polygons

(a) Map complexity (number of surfaces � )

0 500 1000 1500 2000

Iteration EM

0

2

4

6

8

10

12

nu

mb

er o

f su

rfac

es

(b) Percentage of measurements explained by those surfaces

0 500 1000 1500 2000

Iteration EM

0

0.2

0.4

0.6

0.8

1

Per

cen

tag

e m

easu

rem

ents

exp

lain

ed b

y m

od

el

Fig. 6. Offline EM: (a) number of surfaces � and (b) percentage of pointsexplained by those surfaces as function of the iterator. A good map is availableafter 500 iterations.

was between � � + 3"' � and � � � + 3"' � . The final maps contained on

average ' � �'�� as many polygons (0.69%, 0.80%, and 0.32%),which corresponds to an average compression ratio of 3 � 3�� .Overall, we found that the online version generated an orderof magnitude more flat surfaces � than the offline version.This increased number was partially due to a lower penaltyterm ( . An additional cause of the increase in � was the factthat the online version considers splitting and fusion decisionsonly for a brief time period. Overall, the increased number ofsurfaces � led to a decrease in the overall complexity of themap, thanks to the fact that significantly more measurementswere explained by surfaces

��found by EM.

We finally note that all computation in our experiment wascarried out on a single laptop on board the robot. The views inFigures 7 through 9 were rendered using a standard softwarepackage for rendering VRML models.

VII. RELATED WORK

As argued in the introduction to this article, the vast majorityof robot modeling research has focused on building maps in2D. Our approach is reminiscent of an early paper by Chatilaand Laumond [15] and a related paper by Crowley [25], whoproposed to reconstruct low-dimensional line models in 2Dfrom sensor measurements. Our work is also related to workon line extraction from laser range scans [26]. However, thesemethods address the two-dimensional case, where lines can beextracted from a single scan.


(a) Raw data map (15 scans) (f) Map with online EM (15 scans)

(b) Raw data map (43 scans) (g) Map with online EM (43 scans)

(c) Raw data map 62 scans) (h) Map with online EM (62 scans)

(d) Raw data map (172 scans) (i) Map with online EM (172 scans)

(e) Raw data map (274 scans)

(j) Map with online EM (274 scans)

Fig. 7. Raw data (left column) and maps generated using online EM (rightcolumn). The red lines visualize the most recent range scan. Some of theintermediate maps exhibit suboptimal structures, which are resolved in lateriterations of EM. Despite this backwards-correction of past estimates, thealgorithm presented in this paper still runs in real-time, due to careful selectionof measurements that are considered in the EM estimation.

Fig. 8. Views of a compact 3D texture map built in real time with anautonomously exploring robot.

In the area of computer vision, object recognition undergeometric constraints (such as the planar surface assumption)is a well-researched field [27], [28], [29]. Multiple researchershave studied the topic of 3D scene reconstruction from data.Approaches for 3D modeling can roughly be divided into twocategories: Methods that assume knowledge of the pose of thesensors [4], [5], [6], [30], [31], and methods that do not [32].Of great relevance is a paper by Roth and Wibowo [33], whohave also proposed the use of a mobile platform to acquiretextured 3D models of the environment. Their systems includesa 3D range sensor and eight cameras. Just like our system,theirs uses sensor information to compensate odometric errors.Their approach, however, does not consider the uncertainty ofindividual measurements when generating the 3D model.

Our approach is somewhat related to [34], which recon-structs planar models of indoor environments using stereovision, using some manual guidance in the reconstructionprocess to account for the lack of visible structure in typicalindoor environments. Like ours, the environment model iscomposed of flat surfaces; however, due to the use of stereovision, the range data is too incomplete to allow for a fullyautomated modeling process to take place. Consequently, thetechnique in [34] relies on manual guidance in the process ofidentifying planar surfaces. Related work on outdoor terrainmodeling can be found in [7], [35].


(a) Raw data map (using a high-accuracy range finder)

(b) Planes, extracted from the map using EM

(c) Plane boundaries

(d) Full texture map (planes only)

Fig. 9. Maps generated in real-time, of office environments at CarnegieMellon University’s Wean Hall (left column) and Stanford University’s GatesHall (right column).

VIII. CONCLUSION

We have presented an online algorithm for building compactmaps of building interiors. This approach utilizes the expecta-tion maximization algorithm for finding rectangular surfacepatches in 3D data, acquired by a moving robot equippedwith laser range finders and a panoramic camera. While EMis traditionally an offline algorithm, a modified version ofEM was presented, which is capable of generating such mapsonline, while the robot is in motion. This approach retainsthe key advantage of EM—namely the ability to revise pastassignments and map components based on future data—while simultaneously restricting the computation in ways thatmake it possible to run the algorithm in real-time, regardlessof the size of the map. Experimental results illustrate thatthis approach enables mobile robots to acquire compact andaccurate maps of corridor-style indoor environments in real-time.

All results shown in this article were obtained using accu-rate 2D laser range finders. The mathematical approach canaccommodate a wider array of range finders, such as sonarsand range cameras, assuming that the sensor model is adjustedappropriately. However, we suspect that the high spatial andangular resolution of our laser ranges finder plays an importantrole in the success of our approach.

Although stated here in the context of finding rectangularflat surfaces, the EM algorithm is more general in that itcan easily handle a richer variety of geometric shapes. Theextension of our approach to richer classes of objects is subjectto future research. Another topic of future research is toaugment the EM algorithm to estimate the robot’s locationduring mapping, as described in [36].

Finally, it would be worthwhile to include both laser’s datain the mapping process. Unfortunately, the upward-pointedlaser may never observe the same aspect of the environmenttwice; hence, further assumptions (such as smoothness) wouldbe necessary to utilize its data. It would also be interesting toadd the robot pose variables as latent variables in the EM algo-rithm, so that localization and modeling can be interleaved, asin see [36]. Unfortunately, the added computational complexityincurred by such extensions would make it difficult to executethe resulting algorithm online and in real-time.

ACKNOWLEDGMENT

This research is sponsored by by DARPA’s MARS Program(Contracts N66001-01-C-6018 and NBCH1020014) and theNational Science Foundation (CAREER grant IIS-9876136and regular grant IIS-9877033), all of which is gratefullyacknowledged. We also thankg three anonymous reviewers forinsightful comments on an earlier version of this manuscript.

REFERENCES

[1] S. Thrun, “A probabilistic online mapping algorithm for teams of mobilerobots,” International Journal of Robotics Research, vol. 20, no. 5, pp.335–363, 2001.

[2] A. Dempster, A. Laird, and D. Rubin, “Maximum likelihood fromincomplete data via the EM algorithm,” Journal of the Royal StatisticalSociety, Series B, vol. 39, no. 1, pp. 1–38, 1977.

[3] D. Kortenkamp, R. Bonasso, and R. Murphy, Eds., AI-based MobileRobots: Case studies of successful robot systems. Cambridge, MA:MIT Press, 1998.

[4] R. Bajcsy, G. Kamberova, and L. Nocera, “3D reconstruction of envi-ronments for virtual reconstruction,” in Proc. of the 4th IEEE Workshopon Applications of Computer Vision, 2000.

[5] P. Debevec, C. Taylor, and J. Malik, “Modeling and rendering architec-ture from photographs,” in Proc. of the 23rd International Conferenceon Computer Graphics and Interactive Techniques (SIGGRAPH), 1996.

[6] H. Shum, M. Han, and R. Szeliski, “Interactive construction of 3D mod-els from panoramic mosaics,” in Proc. of the International Conferenceon Computer Vision and Pattern Recognition (CVPR), 1998.

[7] S. Teller, M. Antone, Z. Bodnar, M. Bosse, S. Coorg, M. Jethwa, andN. Master, “Calibrated, registered images of an extended urban area,”in Proceedings of the Conference on Computer Vision and PatternRecognition (CVPR), 2001.

[8] J. Casper, “Human-robot interactions during the robot-assisted urbansearch and rescue response at the world trade center,” Master’s thesis,Computer Science and Engineering, University of South Florida, Tampa,FL, 2002.

[9] M. Montemerlo, H. D, D. Ferguson, R. Triebel, W. Burgard, S. Thayer,W. Whittaker, and S. Thrun, “A system for three-dimensional roboticmapping of underground mines,” Carnegie Mellon University, ComputerScience Department, Pittsburgh, PA, Tech. Rep. CMU-CS-02-185, 2002.


[10] S. Thrun, D. Hahnel, D. Ferguson, M. Montemerlo, R. Triebel, W. Bur-gard, C. Baker, Z. Omohundro, S. Thayer, and W. Whittaker, “A systemfor volumetric robotic mapping of abandoned mines,” in Proceedings ofthe IEEE International Conference on Robotics and Automation (ICRA),2003.

[11] A. Elfes, “Occupancy grids: A probabilistic framework for robot per-ception and navigation,” Ph.D. dissertation, Department of Electrical andComputer Engineering, Carnegie Mellon University, 1989.

[12] H. P. Moravec, “Sensor fusion in certainty grids for mobile robots,” AIMagazine, vol. 9, no. 2, pp. 61–74, 1988.

[13] J.-S. Gutmann and K. Konolige, “Incremental mapping of large cyclicenvironments,” in Proceedings of the IEEE International Symposium onComputational Intelligence in Robotics and Automation (CIRA), 2000.

[14] F. Lu and E. Milios, “Globally consistent range scan alignment forenvironment mapping,” Autonomous Robots, vol. 4, pp. 333–349, 1997.

[15] R. Chatila and J.-P. Laumond, “Position referencing and consistentworld modeling for mobile robots,” in Proceedings of the 1985 IEEEInternational Conference on Robotics and Automation, 1985.

[16] M. Garland and P. Heckbert, “Simplifying surfaces with color andtexture using quadric error metrics,” in Proceedings IEEE Visualization,1998.

[17] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions.New York: Wiley Series in Probability and Statistics, 1997.

[18] J. Leonard, J. Tardos, S. Thrun, and H. Choset, Eds., Workshop Notesof the ICRA Workshop on Concurrent Mapping and Localization forAutonomous Mobile Robots (W4). Washington, DC: ICRA Conference,2002.

[19] R. Neal and G. Hinton, “A view of the EM algorithm that justifiesincremental, sparse, and other variants,” in Learning in GraphicalModels, M. Jordan, Ed. Kluwer Academic Press, 1998.

[20] D. Eberly, 3D Game Engine Design: A Practical Approach to Real-TimeComputer Graphics. Morgan Kaufman/Academic Press, 2001.

[21] Y. Liu, R. Emery, D. Chakrabarti, W. Burgard, and S. Thrun, “UsingEM to learn 3D models with mobile robots,” in Proceedings of theInternational Conference on Machine Learning (ICML), 2001.

[22] S. Nayar, “Omnidirectional vision,” in Proceedings of the Ninth Inter-national Symposium on Robotics Research, Shonan, Japan, 1997.

[23] C. Martin and S. Thrun, “Online acquisition of compact volumetric mapswith mobile robots,” in IEEE International Conference on Robotics andAutomation (ICRA). Washington, DC: ICRA, 2002.

[24] R. Duda and P. Hart, Pattern classification and scene analysis. NewYork: Wiley, 1973.

[25] J. Crowley, “World modeling and position estimation for a mobile robotusing ultrasonic ranging,” in Proceedings of the 1989 IEEE InternationalConference on Robotics and Automation, Scottsdale, AZ, May 1989, pp.674–680.

[26] F. Lu and E. Milios, “Robot pose estimation in unknown environmentsby matching 2d range scans,” Journal of Intelligent and Robotic Systems,vol. 18, pp. 249–275, 1998.

[27] W. Grimson, Object Recognition by Computer: The Role of GeometricConstraints. Boston, MA: MIT Press, 1991.

[28] O. Faugeras, Three-Dimensional Computer Vision, A Geometric View-point. Boston, MA: MIT Press, 1993.

[29] A. R. Pope, “Model-based object recognition - A survey of recentresearch,” University of British Columbia, Department of ComputerScience, Vancouver, Canada, Tech. Rep. TR-94-04, 1994.

[30] P. Allen and I. Stamos, “Integration of range and image sensing forphotorealistic 3D modeling,” in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA), 2000, pp. 1435–1440.

[31] S. Becker and M. Bove, “Semiautomatic 3-D model extraction fromuncalibrated 2-D camera views,” in Proc. of the SPIE Symposium onElectronic Imaging, San Jose, 1995.

[32] S. El-Hakim, P. Boulanger, F. Blais, and J.-A. Beraldin, “Sensor basedcreation of indoor virtual environment models,” in Proc. of the 4thInternetional Conference on Virtual Systems and Multimedia (VSMM),Geneva, Switzerland, 1997.

[33] G. Roth and R. Wibowo, “A fast algorithm for making mesh-modelsfrom multiple-view range data,” in Proceedings of the DND/CSARobotics and Knowledge Based Systems Workshop, 1995.

[34] L. Iocchi, K. Konolige, and M. Bajracharya, “Visually realistic mappingof a planar environment with stereo,” in Proceesings of the 2000International Symposium on Experimental Robotics, Waikiki, Hawaii,2000.

[35] Y.-Q. Cheng, E. Riseman, X. Wang, R. Collins, and A. Hanson, “Three-dimensional reconstruction of points and lines with unknown correspon-dence across images,” International Journal of Computer Vision, 2000.

[36] S. Thrun, D. Fox, and W. Burgard, “A probabilistic approach to con-current mapping and localization for mobile robots,” Machine Learning,

vol. 31, pp. 29–53, 1998, also appeared in Autonomous Robots 5, 253–271 (joint issue).

ieee transactions on robotics and automation 1 a...

Documents