unifying planar and point mapping in monocular...

1
Unifying Planar and Point Mapping in Monocular SLAM José Martínez-Carranza http://www.cs.bris.ac.uk/~carranza Andrew Calway http://www.cs.bris.ac.uk/~andrew Department of Computer Science University of Bristol, UK Planar features in filter-based Visual SLAM systems require an ini- tialisation stage that delays their use within the estimation. In this stage, surface and pose are initialised either by using an already generated map of point features [2, 3] or by using visual clues from frames [4]. This delay is unsatisfactory specially in scenarios where the camera moves rapidly such that visual features are observed for a very limited period. In this paper we present a unified approach to mapping in which points and planes are initialised alongside each other within the same framework. The best structure emerges according to what the camera observes, thus avoiding delayed initialisation for planar features. To do this we use a similar parameterisation to the one used for planar features in [3, 4]. The Inverse Depth Planar Parameterisation (IDPP), as we call it, is used to represent both planes and points. This IDPP is also combined with a point based measurement model where the planar constraint is in- troduced. The latter allows us to estimate and grow a planar structure if suitable, or to estimate a 3-D point if visual measurements do not support the constraint. The IDPP contains three main components: (1) A refer- ence camera (RC); (2) the depth w.r.t. the RC of a seed 3-D point on the plane; (3) the normal of the plane. Prediction K-F Predicted Projection True Point Position Normal Uncertainty Point Uncertainty C t a) Reference Camera Salient Points Initial Plane Correction Measurements in Current View K-F b) C t+1 Ready to Grow Point estimate using plane intersection K-F Prediction Plane Growing c) C t+1 K-F Correction Salient Point Rejected f) C t+3 K-F Prediction e) C t+2 K-F Correction d) C t+2 Figure 1: Undelayed plane intialisation by using the proposed framework. Figure 1 shows the implicit initialisation process for a planar feature starting out as a single point, the seed point, that will evolve to a planar structure. Initially, the depth of the seed point and normal are quite uncer- tain, Fig. 1a. After the first correction, the plane contains a single point whose measurement contributes to reduce depth uncertainty, Fig. 1b. For the next step the system grows the plane by randomly selecting two salient points on the Key Frame (KF) attached to the RC, Fig. 1c. The selected pixels produce rays to infinity intersecting the plane estimate, thus pro- ducing 3-D points and their corresponding uncertainties. Therefore, the plane contains three 3-D points which can be measured to effectively cor- rect the current camera pose, RC, depth and normal of the plane as shown in the Fig. 1d. The same process is carried out again as Fig. 1e-f il- lustrates, also showing a case where a salient point selected from the KF does not fulfill the planar constraint. (a) (b) Figure 2: (a) State size and number of 3-D points obtained with IDPP and a points-only method for an indoor sequence, Fig 3. (b) Estimated map and uncertainties overlaid on a schematic plan for the same sequence. (a) (b) (c) Figure 3: Planar features initialized with the proposed framework. (a) 3-D View of the world. (b) Projection of planar bounding box on the camera view. (c) 2-D Points Measurements in the Camera View, note measurements with a common plane are linked with a line. To carry out the observation a template matching process is performed as for conventional point based mapping. Individual 3-D points within the planar feature are represented by templates of 11x11 pixels extracted from the KF. Each template is warped according to the plane estimate thus pro- viding view invariance. Additionally, a subset of the 3-D points associated with the plane are characterised with spatial gradient descriptors as in [1] in order to recover from tracking failure. Hence, this hybrid representa- tion allows fast camera tracking and robust camera relocalisation. The proposed framework has been tested in several indoor and out- doors scenarios where the camera performs fast motion and always points ahead. The experiments demonstrate the benefits of the approach in terms of undelayed initialisation of planar features along side conventional point initialisation and efficient state representation. In fact, the latter allowed mapping of larger areas with a single filter under real time operation. Figure 2a shows a comparison in the number of points and state size generated for an indoor sequence depicted in Fig. 3 for our approach and the standard point based mapping. A schematic plan of the corridor is shown in Fig. 2b where the final map and camera trajectory (in green) are overlaid. Figure 4 shows an outdoor experiment where the camera moves ap- prox. 230 meters. This experiment illustrates the significant degree of saving since the planarity of the scene is exploited to produce dense scene representation within a more compact state size, thus allowing real time operation even when a single filter was used. The final map and camera trajectory are overlaid on an aerial view in the same figure. Figure 4: Outdoors example where the camera moved approx. 230 meters approximately. The system operated with a single filter. [1] D.Chekhlov, W.Mayol-Cuevas, and A.Calway. Appearance based in- dexing for relocalisation in real-time visual slam. In Proc. British Machine Vision Conf, 2008. [2] A. Gee, D. Chekhlov, W.W Mayol, and A. Calway. Discovering planes and collapsing the state space in visual slam. In Proc. British Machine Vision Conf, 2007. [3] J. Martinez-Carranza and A. Calway. Efficiently increasing map den- sity in visual slam using planar features with adaptive measurement. In British Machine Vision Conference, September 2009. [4] T. Pietzsch. Planar features for visual slam. In In Proc. German Conference on Artificial Intelligence (KI 2008). Springer, September 2008.

Upload: others

Post on 06-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unifying Planar and Point Mapping in Monocular SLAMbmvc10.dcs.aber.ac.uk/proc/conference/paper43/abstract43.pdf · 2010. 8. 10. · [2] A. Gee, D. Chekhlov, W.W Mayol, and A. Calway

Unifying Planar and Point Mapping in Monocular SLAM

José Martínez-Carranzahttp://www.cs.bris.ac.uk/~carranza

Andrew Calwayhttp://www.cs.bris.ac.uk/~andrew

Department of Computer ScienceUniversity of Bristol, UK

Planar features in filter-based Visual SLAM systems require an ini-tialisation stage that delays their use within the estimation. In this stage,surface and pose are initialised either by using an already generated mapof point features [2, 3] or by using visual clues from frames [4]. This delayis unsatisfactory specially in scenarios where the camera moves rapidlysuch that visual features are observed for a very limited period.

In this paper we present a unified approach to mapping in whichpoints and planes are initialised alongside each other within the sameframework. The best structure emerges according to what the cameraobserves, thus avoiding delayed initialisation for planar features. To dothis we use a similar parameterisation to the one used for planar featuresin [3, 4]. The Inverse Depth Planar Parameterisation (IDPP), as we callit, is used to represent both planes and points. This IDPP is also combinedwith a point based measurement model where the planar constraint is in-troduced. The latter allows us to estimate and grow a planar structure ifsuitable, or to estimate a 3-D point if visual measurements do not supportthe constraint. The IDPP contains three main components: (1) A refer-ence camera (RC); (2) the depth w.r.t. the RC of a seed 3-D point on theplane; (3) the normal of the plane.

Prediction

K-F Predicted Projection

True Point Position

Normal Uncertainty

Point Uncertainty

Ct

a)

Reference Camera

SalientPoints

Initial Plane

Correction

Measurementsin Current View

K-F

b)

Ct+1Ready to Grow

Point estimate using plane intersection

K-F

Prediction

Plane Growing

c)

Ct+1

K-F

Correction

Salient PointRejected

f)

Ct+3

K-F

Prediction

e)

Ct+2

K-F

Correction

d)

Ct+2

Figure 1: Undelayed plane intialisation by using the proposed framework.

Figure 1 shows the implicit initialisation process for a planar featurestarting out as a single point, the seed point, that will evolve to a planarstructure. Initially, the depth of the seed point and normal are quite uncer-tain, Fig. 1a. After the first correction, the plane contains a single pointwhose measurement contributes to reduce depth uncertainty, Fig. 1b. Forthe next step the system grows the plane by randomly selecting two salientpoints on the Key Frame (KF) attached to the RC, Fig. 1c. The selectedpixels produce rays to infinity intersecting the plane estimate, thus pro-ducing 3-D points and their corresponding uncertainties. Therefore, theplane contains three 3-D points which can be measured to effectively cor-rect the current camera pose, RC, depth and normal of the plane as shownin the Fig. 1d. The same process is carried out again as Fig. 1e-f il-lustrates, also showing a case where a salient point selected from the KFdoes not fulfill the planar constraint.

(a) (b)

Figure 2: (a) State size and number of 3-D points obtained with IDPP anda points-only method for an indoor sequence, Fig 3. (b) Estimated mapand uncertainties overlaid on a schematic plan for the same sequence.

(a) (b) (c)

Figure 3: Planar features initialized with the proposed framework. (a)3-D View of the world. (b) Projection of planar bounding box on thecamera view. (c) 2-D Points Measurements in the Camera View, notemeasurements with a common plane are linked with a line.

To carry out the observation a template matching process is performedas for conventional point based mapping. Individual 3-D points within theplanar feature are represented by templates of 11x11 pixels extracted fromthe KF. Each template is warped according to the plane estimate thus pro-viding view invariance. Additionally, a subset of the 3-D points associatedwith the plane are characterised with spatial gradient descriptors as in [1]in order to recover from tracking failure. Hence, this hybrid representa-tion allows fast camera tracking and robust camera relocalisation.

The proposed framework has been tested in several indoor and out-doors scenarios where the camera performs fast motion and always pointsahead. The experiments demonstrate the benefits of the approach in termsof undelayed initialisation of planar features along side conventional pointinitialisation and efficient state representation. In fact, the latter allowedmapping of larger areas with a single filter under real time operation.

Figure 2a shows a comparison in the number of points and state sizegenerated for an indoor sequence depicted in Fig. 3 for our approach andthe standard point based mapping. A schematic plan of the corridor isshown in Fig. 2b where the final map and camera trajectory (in green) areoverlaid.

Figure 4 shows an outdoor experiment where the camera moves ap-prox. 230 meters. This experiment illustrates the significant degree ofsaving since the planarity of the scene is exploited to produce dense scenerepresentation within a more compact state size, thus allowing real timeoperation even when a single filter was used. The final map and cameratrajectory are overlaid on an aerial view in the same figure.

Figure 4: Outdoors example where the camera moved approx. 230 metersapproximately. The system operated with a single filter.

[1] D.Chekhlov, W.Mayol-Cuevas, and A.Calway. Appearance based in-dexing for relocalisation in real-time visual slam. In Proc. BritishMachine Vision Conf, 2008.

[2] A. Gee, D. Chekhlov, W.W Mayol, and A. Calway. Discoveringplanes and collapsing the state space in visual slam. In Proc. BritishMachine Vision Conf, 2007.

[3] J. Martinez-Carranza and A. Calway. Efficiently increasing map den-sity in visual slam using planar features with adaptive measurement.In British Machine Vision Conference, September 2009.

[4] T. Pietzsch. Planar features for visual slam. In In Proc. GermanConference on Artificial Intelligence (KI 2008). Springer, September2008.