pou-slam: scan-to-model matchingon b ased voxels

8
POU-SLAM: Scan-to-Model Matching Based on 3D Voxels Jianwen Jiang 1 , Jikai Wang 1 , Peng Wang 2 and Zonghai Chen 1 1 Department of Automation, University of Science and Technology of China, Hefei 230027, PR China 2 Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield S1 3JD, United Kingdom Abstract—Purpose: Localization and mapping with LiDAR data is a fundamental building block for autonomous vehicles. Though LiDAR point clouds can often encode the scene depth more accurate and steadier compared with visual information, laser-based Simultaneous Localization And Mapping (SLAM) remains challenging as the data is usually sparse, density variable and less discriminative. The purpose of this paper is to propose an accurate and reliable laser-based SLAM solution. Design/methodology/approach: The method starts with con- structing voxel grids based on the 3D input point cloud. These voxels are then classified into three types to indicate different physical objects according to the spatial distribution of the points contained in each voxel. During the mapping process, a global environment model with Partition of Unity (POU) implicit surface is maintained and the voxles are merged into the model from stage to stage, which is implemented by Levenberg-Marquardt algorithm. Findings: We find a laser-based S LAM method. The method uses POU implicit surface representation to build the model and is evaluated on the KITTI odometry benchmark without loop closure. Experimental results indicate that the method achieves accuracy comparable to the state-of-the-art methods. Originality/value: We propose a novel, low-drift SLAM method that falls into a scan-to-model matching paradigm. The method, which operates on point clouds obtained from Velodyne HDL64, is of value to researchers developing SLAM systems for au- tonomous vehicles. Keywords—Simultaneous Localization And Mapping; v oxel grids; scan-to-model; Partition of Unity I. I NTRODUCTION Reliable localization of autonomous vehicles is an attractive research topic since it is a basic requirement for navigation and other tasks [1,2,3,4,5]. Fully autonomous vehicles should reliably locate themselves, ideally by using only their own sensors on board without relying on external information sources such as GPS. In scenarios where such external infor- mation is absolutely unavailable, Simultaneous Localization And Mapping (SLAM) can always be an alternative option. SLAM has been intensively studied during the past few decades and various solutions based on different sensors have been proposed for both indoor and outdoor environments. In outdoor scenes, the advantage of LiDAR with respect to cameras is that the noise associated with each measurement is independent of the distance itself and usually more robust to the illumination variation. Therefore, laser-based SLAM is becoming one of the mainstream solutions in outdoor scenes. Most laser-based SLAM methods tend to extract stable fea- tures such as planes and edges from points, and then do SLAM in a ‘feature space’. They are therefore categorized as feature- based SLAM, whose performance is mainly determined by two factors. The first factor is the way features are designed. Ji Zhang et.al [6,7] propose a planar and edge feature designing method based on curvature. Michael Bosse et.al [8] incorporate shape information which is calculated from geometric statistics of the point cloud into the Iterative Closest Point (ICP) correspondence step. The second factor is the scan matching method. Scan-to-scan and scan-to-model matching are the two main scan matching frameworks in SLAM. Sen Wang et.al [9] propose a scan-to-scan odometry trajectory estimation method by using convolutional neural networks to process LiDAR point clouds. KinectFusion [10] implements scan-to- model matching by using an implicit surface as model. The state-of-the-art laser-based SLAM method, Lidar Odometry And Mapping (LOAM) [6] extracts distinct features corre- sponding to physical surfaces and corners. To enable on-line implementation, LOAM switches between scan-to-scan at 10 Hz and scan-to-model operation at 1 Hz update frequency. Although LOAM has achieved good performance, there are still challenges in outdoor applications using 3D laser sensors, i.e., (1) inherent point matching error since the sparsity of the point clouds, which means that there are always existing errors when we directly use the sparse point cloud to fit the surface of environment. Furthermore, it is nearly impossible to obtain rigid point correspondences between scans; (2) it is hard to fix the matching parameter since the density variation of the point clouds. During the feature-based matching process, it is required to extract enough context information of one 3D point to make it discriminative, while the scope of the dominating context is greatly affected by the point density. In this paper, to tackle the above challenges, we introduce POU implicit surface representation to regress the environment surface with sparsity points, which can effectively encode the detailed geometrical information. We discretize the point cloud into 3D voxels. These voxels are classified into planar and edge feature voxels according to the geometrical character of the points contained in the voxel. Finally, we build a feature voxel map for further scan-to-model matching, during which the POU implicit surface representation is proposed to adapt to the voxel map. The key contributions of our work are as follows: (1) We propose a novel feature voxel map which stores voxels with salient planar or edge features. (2) We propose a scan-to-model matching framework using Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2019 doi:10.20944/preprints201909.0105.v1 © 2019 by the author(s). Distributed under a Creative Commons CC BY license. Peer-reviewed version available at Appl. Sci. 2019, 9, 4147; doi:10.3390/app9194147

Upload: others

Post on 17-Jan-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: POU-SLAM: Scan-to-Model Matchingon B ased Voxels

1

POU-SLAM: Scan-to-Model Matching Based on 3D Voxels

Jianwen Jiang 1, Jikai Wang 1, Peng Wang 2 and Zonghai Chen 11Department of Automation, University of Science and Technology of China, Hefei 230027, PR China

2Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield S1 3JD, United Kingdom

Abstract—Purpose: Localization and mapping with LiDAR data is a fundamental building block for autonomous vehicles. Though LiDAR point clouds can often encode the scene depth more accurate and steadier compared with visual information, laser-based Simultaneous Localization And Mapping (SLAM) remains challenging as the data is usually sparse, density variable and less discriminative. The purpose of this paper is to propose an accurate and reliable laser-based SLAM solution.

Design/methodology/approach: The method starts with con-structing voxel grids based on the 3D input point cloud. These voxels are then classified into three types to indicate different physical objects according to the spatial distribution of the points contained in each voxel. During the mapping process, a global environment model with Partition of Unity (POU) implicit surface is maintained and the voxles are merged into the model from stage to stage, which is implemented by Levenberg-Marquardt algorithm.

Findings: We find a laser-based S LAM method. The method uses POU implicit surface representation to build the model and is evaluated on the KITTI odometry benchmark without loop closure. Experimental results indicate that the method achieves accuracy comparable to the state-of-the-art methods.

Originality/value: We propose a novel, low-drift SLAM method that falls into a scan-to-model matching paradigm. The method, which operates on point clouds obtained from Velodyne HDL64, is of value to researchers developing SLAM systems for au-tonomous vehicles.

Keywords—Simultaneous Localization And Mapping; voxel grids; scan-to-model; Partition of Unity

I. INTRODUCTION

Reliable localization of autonomous vehicles is an attractiveresearch topic since it is a basic requirement for navigationand other tasks [1,2,3,4,5]. Fully autonomous vehicles shouldreliably locate themselves, ideally by using only their ownsensors on board without relying on external informationsources such as GPS. In scenarios where such external infor-mation is absolutely unavailable, Simultaneous LocalizationAnd Mapping (SLAM) can always be an alternative option.

SLAM has been intensively studied during the past fewdecades and various solutions based on different sensors havebeen proposed for both indoor and outdoor environments.In outdoor scenes, the advantage of LiDAR with respect tocameras is that the noise associated with each measurementis independent of the distance itself and usually more robustto the illumination variation. Therefore, laser-based SLAM isbecoming one of the mainstream solutions in outdoor scenes.Most laser-based SLAM methods tend to extract stable fea-tures such as planes and edges from points, and then do SLAM

in a ‘feature space’. They are therefore categorized as feature-based SLAM, whose performance is mainly determined bytwo factors. The first factor is the way features are designed. JiZhang et.al [6,7] propose a planar and edge feature designingmethod based on curvature. Michael Bosse et.al [8] incorporateshape information which is calculated from geometric statisticsof the point cloud into the Iterative Closest Point (ICP)correspondence step. The second factor is the scan matchingmethod. Scan-to-scan and scan-to-model matching are thetwo main scan matching frameworks in SLAM. Sen Wanget.al [9] propose a scan-to-scan odometry trajectory estimationmethod by using convolutional neural networks to processLiDAR point clouds. KinectFusion [10] implements scan-to-model matching by using an implicit surface as model. Thestate-of-the-art laser-based SLAM method, Lidar OdometryAnd Mapping (LOAM) [6] extracts distinct features corre-sponding to physical surfaces and corners. To enable on-lineimplementation, LOAM switches between scan-to-scan at 10Hz and scan-to-model operation at 1 Hz update frequency.Although LOAM has achieved good performance, there arestill challenges in outdoor applications using 3D laser sensors,i.e., (1) inherent point matching error since the sparsity ofthe point clouds, which means that there are always existingerrors when we directly use the sparse point cloud to fit thesurface of environment. Furthermore, it is nearly impossible toobtain rigid point correspondences between scans; (2) it is hardto fix the matching parameter since the density variation ofthe point clouds. During the feature-based matching process,it is required to extract enough context information of one3D point to make it discriminative, while the scope of thedominating context is greatly affected by the point density.In this paper, to tackle the above challenges, we introducePOU implicit surface representation to regress the environmentsurface with sparsity points, which can effectively encode thedetailed geometrical information. We discretize the point cloudinto 3D voxels. These voxels are classified into planar andedge feature voxels according to the geometrical character ofthe points contained in the voxel. Finally, we build a featurevoxel map for further scan-to-model matching, during whichthe POU implicit surface representation is proposed to adaptto the voxel map. The key contributions of our work are asfollows:

(1) We propose a novel feature voxel map which storesvoxels with salient planar or edge features.

(2) We propose a scan-to-model matching framework using

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2019 doi:10.20944/preprints201909.0105.v1

© 2019 by the author(s). Distributed under a Creative Commons CC BY license.

Peer-reviewed version available at Appl. Sci. 2019, 9, 4147; doi:10.3390/app9194147

Page 2: POU-SLAM: Scan-to-Model Matchingon B ased Voxels

2

POU implicit surface representation.The rest of this paper is organized as follows. In Sect. II, we

discuss related work and conclude how our work is differentfrom the state of the art. The feature extraction algorithmis presented with details in Sect. III, and the scan-to-modelmatching algorithm is given in Sect. IV. Experimental resultsare shown in Sect. V. Finally, Sects. VI concludes the paper.

II. RELATED WORK

SLAM with cameras and LiDAR has attracted wide atten-tion both from robotic and computer vision communities. Weacknowledge the large body of work in this field, but concen-trate here only on approaches based on LiDAR. Since featureextraction, laser-based mapping and scan-matching frameworkare the three most important steps for laser-based SLAM, wetherefore look into related work on these three aspects andprovide general ideas on how they work as following.

Feature extraction: LOAM focuses on edge and planarfeatures in the LiDAR sweep and keep them in a map foredge-line and plane-planar surface matching. Tixiao Shan et.al[11] also use edges and planar features to solve differentcomponents of the 6-DOF transformation across consecutivescans. While these works extract features based on curvature,there is still difference like the method from [11] is ground-optimized, as it leverages the presence of a ground plane inits segmentation and optimization steps. Using point cloudobtained from a high resolution and high density Zoller-Frohlich Z+F laser, J Lalonde and Yungeun Choe et.al [12,13]exploit point cloud geometrical statistics to classify the naturalterrain into scatter-ness, linearity, and surface-ness. Usingpoint cloud from a SICK LMS291 laser range sensor, MichaelBosse et.al [8] implement scan-matching based on point cloudgeometrical statistics features. They use ICP to align voxelcentroids, which can reduce the number of points for scanmatching as well as the size of the map. Different from [8], weconsider the geometric features of the points more subtly andimplement scan matching with original points, which wouldnot lead to information loss.

Laser-based mapping: SLAM systems often build andmaintain environment map like feature map [6,7,11,14,15],dense map [16,17], subsampled clouds [18,19] and NormalDistributions Transform (NDT) based map [20,21]. In laser-based SLAM systems, feature map is usually a structuralcollection of surfaces and corners that are extracted fromoriginal point clouds. In this paper, we build a feature map,which can be distinguished from literature methods in the waythat we maintain more types of features in the form of voxels.

Scan matching framework: Recently, some laser-basedSLAM systems use scan-to-model matching framework andhave achieved state-of-the-art performance on KITTI odometrybenchmark. Jens Behley et.al [16] propose a dense approachto laser-based SLAM that operates on point clouds obtainedfrom Velodyne HDL-64. They construct a surfel-based mapand use a point-to-plane ICP [22,23] to estimate the posetransformation of the vehicle. Our method essentially belongsto scan-to-model matching framework. The distinction is thatwe build a feature voxel map and use POU implicit surface

representation to adapt to the scan-to-model matching processbased on feature voxel map.

Partition of unity approach: The POU approach has beenwidely used in surface reconstruction. Ireneusz Tobor et.al[24] show how to reconstruct multi-scale implicit surfaceswith attributes, given discrete point sets with attributes. Tung-Ying Lee et.al [25] propose a new 3D non-rigid registrationalgorithm to register two multi-level partition of unity implicitsurfaces with a variational formulation.

III. FEATURE EXTRACTION

Similar to [26], the scan-to-model matching is triggeredwhen there are salient geometrical features in local regions.So, it is important to find these regions first. In contrast to3D object classification [27,28] and place recognition [29,30],which use complex point cloud descriptor to segment andclassify objects, we use a simpler shape parameterizationbecause our task focuses on incremental transformation forwhich we have a stronger prior on the relative poses.

A. Computing Shape Parameters

A local region that is occupied by the input point cloudis repeated divided until the number of points fall into eachvoxel is equal to the threshold Np. The threshold Np is relatedto the horizontal and vertical resolution of Lidar point cloud.In the process of feature extracting, the densities of featuresin corresponding voxels vary. We consider the density whilesearching for the correspondences of the features. The lowerthe density of the features, the lower the probability of findingenough correspondences within a certain radius is. Our exper-iment results are achieved when Np is set to 25. Each voxelcontains points that fall into it and voxel centroid. The pointsthat fall into voxel are defined as {Xi} =

{(xi, yi, zi)

T}

. Thefirst and second-order moments µ, S describe the parametersfor spatial distribution of the points {Xi}:

µ =1

N

n∑i=1

Xi (1)

S =1

N

n∑i=1

(Xi − µ) (Xi − µ)T (2)

Inspired by [8], the matrix S is decomposed into principalcomponents ordered by increasing eigenvalues. −→e0 ,−→e1 ,−→e2 areeigenvectors corresponding to the eigenvalues λ0, λ1, λ2 re-spectively, where λ0 ≥ λ1 ≥ λ2. In the case that the structureof points in a voxel is a linear structure, the principal directionwill be the tangent at the curve, with λ0 � λ1 ≈ λ2. In thecase that the structure of points in a voxel is a solid surface,the principal direction is aligned with the surface normal withλ0 ≈ λ1 � λ2. The two saliency features, named linearfeature and surface feature, are linear combinations of theeigenvalues. Figure 2 illustrates the two features used. Thequantity:

c =λ0 − λ1

λ0 + λ1 + λ2(3)

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2019 doi:10.20944/preprints201909.0105.v1

Peer-reviewed version available at Appl. Sci. 2019, 9, 4147; doi:10.3390/app9194147

Page 3: POU-SLAM: Scan-to-Model Matchingon B ased Voxels

3

is the linear feature of the points in voxel which ranges from0 to 1. Similarly the quantity:

p =λ1 − λ2

λ0 + λ1 + λ2(4)

is the surface feature of the points in voxel which ranges from0 to 0.5.

Fig. 1. Illustration of the surface feature and linear feature.

B. Voxel Sampling Strategy

Essentially, the voxel based scan matching process is toconstruct the associations between voxels. It is believed thatvoxels with high quantity of linearity or planarity tend to bemore stable than others. So for each input point cloud, weclassify its 3d grids according to the proposed quantities intothree types: edge voxel, planar voxel and others. Furthermore,it should be noticed that the ground points usually containmore planar features and non-ground points usually containboth planar and edge features. With this bears in mind, wefirst divide the whole point cloud into ground and non-groundsegments using the method proposed in [31]. Figure 3(a) isan example of the ground segmentation result. Then, for non-ground points, we extract planar and edge features, while onlyplanar features are extracted for ground points.

In each scan which is now represented as the voxel cen-troids, we sort the voxels based on their linearity values c andsurface-ness values p and get two sorted lists. Two thresholdscth and pth are then employed to distinguish different typesof features. We call the voxels with c larger than cth edgefeatures, and the voxels with p larger than pth planar features.Then nFe

edge features with the maximum c are selected fromeach scan. Non-ground and ground planar features with themaximum p are also selected, and the numbers of the twotypes of features are nFp and nFgp , respectively. Edge andplanar features extracted from all the scans are denoted as Fe,Fp and Fgp thereafter. Visualization examples of these featuresare given in Figure 3(b)-(d).

IV. SCAN-TO-MODEL MATCHING WITH POU IMPLICITSURFACE REPRESENTATION

Once the currant laser scan is transformed into a set ofstable voxels, we implement scan-to-model matching process.In our method, the map consists of the last n located featuresets Fe, Fp and Fgp. Let

Mk ={{Mk

e

},{Mk

p

},{Mk

gp

}}(5)

be the map that contains planar features and edge featuresat time k. To better fit the observed surface with these points

and adapt to the feature voxel map, we take the POU implicitsurface representation to construct the model.

A. Finding Feature Point Correspondence

In the curret scan, each point is labeled according to thetype of its corresponding voxel. During the matching process,we construct the correspondences between voxels in the mapand the current feature points with the guidance of labels andEuclidean distance. Since such correspondences generatingmechanism narrows down the potential candidates for corre-spondences, it can help to improve the matching accuracy andefficiency. The detailed correspondence construction processis given below.

For each current edge point in feature set Fe, we search forne voxels of

Mke =

{F k−ne , . . . , F k−2

e , F k−1e

}(6)

inside a sphere space whose radius is re. For each corre-sponding voxel, we find an edge line as the correspondenceof the current edge point. After finding all the correspondenceedge lines, we can get the projection points Se of the currentedge point on the corresponded edge lines. Then we use linearquantity c to determine whether Se can be represented as anedge line. If Se can be represented as an edge line, it is thenregarded as the correspondence of the current edge point. Afterthe edge feature correspondences are obtained, we compute thedistance from an edge feature point to its correspondence. Theprocedure of finding an edge line and the distance computationcan be found in [6].

For each current planar point in feature sets Fp and Fgp,we respectively find np and ngp voxels of

Mkp =

{F k−np , . . . , F k−2

p , F k−1p

}(7)

andMk

gp ={F k−ngp , . . . , F k−2

gp , F k−1gp

}(8)

inside a sphere space whose radius is rp as the correspondingvoxels. Then, we use the corresponding voxels to construct thePOU implicit surface representation of the model, and computethe distance from a planar feature point to its implicit surfacerepresentation of the model. The detail of building the POUimplicit surface representation and compute the distance froma planar feature point to its implicit surface representation ofthe model will be discussed below.

B. POU Implicit Surface Representation For Planar Features

1) POU Approach: The basic idea of the POU approachis to break the data domain into several pieces, approximatethe data in each subdomain separately, and then blend thelocal solutions together using smooth, local weights that sumup to one everywhere in the domain. More specifically, weassign a specific planar patch as the correspondence of a planarfeature point in each correspondence voxel. The procedure offinding the specific points to form planar patches Pc can befound in [7]. We can calculate distances d between the planarfeature point and the corresponding planar patches Pc. Then,we project planar feature point x on the surface of planar

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2019 doi:10.20944/preprints201909.0105.v1

Peer-reviewed version available at Appl. Sci. 2019, 9, 4147; doi:10.3390/app9194147

Page 4: POU-SLAM: Scan-to-Model Matchingon B ased Voxels

4

(a) (b)

(c) (d)

Fig. 2. (a) is the result of ground segmentation. In (a), the red points are labeled as ground points and the green points are labeled as non-ground points.(b)(c)(d) are the ground planar features, non-ground edge features and non-ground planar features.

patches defined by ps: psi = x−dini, where ni is the normalof the closest point to x in each planar patch and is a goodapproximation of the surface normal at the projected point psi .Each planar patch can be regarded as a subdomain. Then, weuse the weight function to blend the subdomains together.

2) POU Implicit Surface Representation: The implicit sur-face is defined as the set of zeros of a function I(x), which alsobehaves as a distance function from x to the implicit surface.In this work, we use the POU implicit surface representation.

Similar to the POU implicit surface framework in [32], wedefine the POU implicit function IM

k

(x) as an approximatedistance of planar feature point x ∈ R3 at time k to the POUimplicit surface defined by the map Mk:

IMk

(x) =

∑si∈Pc

Wsi(x)(x− psi) · −→n si∑sj∈Pc

Wsj (x)(9)

where psi is the projected points on the corresponding planarpatch si, −→n si is the normal at point psi . For approximation,we use the quadratic B-spline b(t) to generate weight functionsfor planar patches.

Wsi(x) = b(3 |x− csi |

2Rsi

) (10)

where csi is the geometric center of planar patch si and Rsi =maxx∈si‖x− csi‖ is a spherical support radius of the planar patch

si. The function is proportional to the distance between x andcsi and inversely proportional to Rsi . It means that the closercsi to the x, the higher weight will be allocated. And themore scattered the points which form the planar patch are, thehigher the weight. In addition to computing the distance fromthe feature point to the model surface, the surface normal at theprojected point of the feature point x should also be computed.

In this work, we use the normal of the closest point psclosest tox as the approximation of the surface normal at the projectedpoint. Compared with the method using only one plane torepresent a surface (could be cylinder or sphere etc.), we adoptmultiple local planes with various planarities to represent asurface which can achieve higher accuracy and more accuratedistances from feature points to their correspondences.

Fig. 3. Illustration of POU implicit surface. The Figure shows the POUimplicit surface representation. Two cells, A and B, are associated with theirsupport radius RA and RB , respectively. The value of a point x in the slashedregion can be evaluated by IM

k(x) =

WA(x)QA+WB(x)QBWA(x)+WB(x)

; WA(x) =

b( 3d12RA

); WB(x) = b( 3d22RB

), where d1 and d2 are the distances from thepoint x to the centers of the cell A and B, respectively.

3) Motion Estimation: With distances from feature pointsto their correspondences in hand, we assign a bisquare weight [7] for each feature point. The rules are twofold. In gen-eral, when the distance between the feature point and itscorrespondence is below a certain threshold, the weight isassigned inversely proportional to the distance. However, whenthe distance is greater than the threshold, the feature pointis regarded as an outlier. We proceed to recover the LiDARmotion by minimizing the overall distances.

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2019 doi:10.20944/preprints201909.0105.v1

Peer-reviewed version available at Appl. Sci. 2019, 9, 4147; doi:10.3390/app9194147

Page 5: POU-SLAM: Scan-to-Model Matchingon B ased Voxels

5

Algorithm 1 POU-SLAM Scan To Model Matching frame-workInput: Current Scan, ModelOutput: Calculate Lidar Pose Transform T Between Current

Scan And Model1. Feature Extraction<Extract non ground features F k

p , groundfeatures F k

gp and edge features F ke based

on shape parameters in each voxel >2. Scan-To-Model Matching With POU Implicit SurfaceRepresentation<Calculate lidar pose transform Tby minimizing the distances betweenfeatures and POU implicit surface>for a number of iterations do

for each non ground planar feature inF kp do

Find an POU implicit surface as thecorrespondence, then compute point toimplicit surface distance based on (9)and stack the equation to (12)

for each ground planar feature in F kgp do

Find an POU implicit surface as thecorrespondence, then compute point toimplicit surface distance based on (9)and stack the equation to (12)

for each edge feature in F ke do

Find an edge line as thecorrespondence, then compute point toline distance and stack the equation to(12)

Compute a bisquare weight for eachrow of (12)

Update T for a nonlinear iterationbased on Levenberg Marquardt method

if the nonlinear optimization convergesthen

Break;

First, the following equation is used to project feature pointx to the map, namely x,

x = Rx+ τ (11)

in which, R and τ are the rotation matrix and translation vectorcorresponding to the pose transform T between current scanand the map. By combining the distances and weights fromfeature points to their correspondences and equation (11), wederive a geometric relationship between feature points and thecorresponding model,

f(x, T ) = d (12)

where each row of f corresponds to a feature point, and dcontains the corresponding distances. Finally, we obtain theLiDAR motion with the Levenberg-Marquardt method [33].We do scan matching between the current scan with the lastn scans, and the final result is obtained by aggregating the

successfully matched nm scans. Literally, by matching the current scan with the historical n scans, the error propagation problem can be suppressed. As for blunder, we limit the scale of the relative transformation between the current scan and the model. When the blunder occurs, we will abandon the result of the current scan and subsequent scan-matching implementation would be slightly affected. After computing the transformation between the scan and model, we add feature voxels corresponding to current scan to the map and remove the feature voxels corresponding to the oldest scan to always keep n scans for scan matching.

V. EXPERIMENTAL EVALUATION

A. Tests On The Public Dataset KITTI

We evaluate our method on the KITTI odometry benchmark[34], where we use point clouds from a vertical VelodyneHDL-64E S2 mounted on the roof of a car, with a recordingrate of 10Hz. The dataset is composed of a wide variety ofenvironments (urban city, rural road, highways, roads witha lot of vegetation, low or high traffic, etc.). The LiDARmeasurements are de-skewed with an external odometry, sowe do not apply ego-motion algorithm to this dataset. TableI shows the results of our method for all sequences in detail.The relative error is evaluated by the development kit providedwith the KITTI benchmark. The data sequences are split intosubsequences of 100, 200,. . . ,800 frames. The error es of eachsubsequence is computed as:

es =‖Es, Cs‖2

ls(13)

where Es is the expected position (from ground truth) and Cs

is the estimated position of the LiDAR where the last frameof subsequence was taken with respect to the initial position(within given subsequence). The difference is divided by thelength ls of the followed trajectory. The final error value isthe average of errors es across all the subsequences of all thelengths.

We include here for comparison the reported results ofLOAM, a laser-based odometry approach that switches be-tween scan-to-scan and scan-to-model framework, and the re-ported results of SUMA [16] a laser-based odometry approachusing scan-to-model framework. We can see that the proposedmethod yields results generally on par with the state-of-the-art in laser-based odometry and often achieves better resultsin terms of translational error. Overall, we achieve an averagetranslational error of 0.61% compared to 0.84% translsationalerror of LOAM on KITTI odometry benchmark. We can seethat the proposed method yields results generally on par withthe state-of-the-art in laser-based odometry and often achievesbetter results in terms of translational error.

B. Discussion Of The Parameter And Performance

In our experiment, we define the thresholds cth and pth based on the KITTI Odometry sequence 01. Laser scans contained in this dataset are collected from a highway, the scenario where there is only very few distinct structural features could be used for scan matching. To extract as

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2019 doi:10.20944/preprints201909.0105.v1

Peer-reviewed version available at Appl. Sci. 2019, 9, 4147; doi:10.3390/app9194147

Page 6: POU-SLAM: Scan-to-Model Matchingon B ased Voxels

6

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 4. Sample results using the KITTI odometry benchmark. The datasets are chosen from three types of environments: highway, country and urban from leftto right, corresponding to sequences 01, 03, and 07. In (a)–(c), we compare estimated trajectories of the vehicle to the GPS/INS ground truth. The mappingresults are shown in (d)-(f). An image is shown from each dataset to illustrate the environment, in (g)-(i).

more as possible features to accomplish the SLAM task, the thresholds should be set to a fairly low level. It is because the smaller the thresholds, features on spheres or cylinders are more likely to be extracted. These sphere and cylinder features themselves are not planar, and if represented by planes directly can bring about errors that would further reduce the accuracy of the downstream algorithms. In this paper, we however blend each planar patch extracted with the weighted function under the implicit surface representation mechanism. This explains why we achieve obvious improvement on this sequence. Therefore, if the thresholds can perform well on such structural feature lacking scenarios, they should be easily transferred to and work in scenarios with richer structural features. For more structured environment such as urban case, the thresholds can be further increased to compromise between the feature richness and the planarity, which promises an accuracy improvement to the whole SLAM algorithm. Table IV shows the influence of cth to the result. When cth is set too large, we can not extract enough features and the result gets worse. Figure 5 shows the variation of the number of

non-ground features in different frames on sequence 01 when cth is set to 0.85. The average number of non-ground features is about 7200. The quantity c as shown in (3) of ground features are generally large and we do not discuss it here. Parameters ne, np and ngp are set to 3 for the KITTI odometry benchmark. Table III shows the influence of parameter np. We compare our POU model to the plane model which is fitted as in LOAM. The result of our model is better than plane model and the result reaches the best when np is set to 3. When np is set too large, the correspondences are far away from the current scan features and do not contribute to local surface reconstruction. The parameter n is set to 40 for the KITTI odometry benchmark. Table II shows the the influence of parameter n.

C. Discussion Of The Processing Time

The operating platform in our experiment is Intel [email protected] GHz with 16 GB RAM. Our method is evaluated by processing KTTI datasets rather than being deployed in an

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2019 doi:10.20944/preprints201909.0105.v1

Peer-reviewed version available at Appl. Sci. 2019, 9, 4147; doi:10.3390/app9194147

Page 7: POU-SLAM: Scan-to-Model Matchingon B ased Voxels

7

autonomous vehicle. With regarding to real time implemen-tation, which heavily depends on the hardware configuration, we cannot assert if our method is real time or not. However, we can still provide some of the parameters so people can know the runtime performance of our method. First, we split the point cloud into voxels which contain certain numbers of points and calculate the linear-ness values and surface-ness value for these voxels. It takes 0.5s. Second, we do scan matching for each feature. The time cost is depending on the number of features. For KITTI Odometry sequence 01, the runtime is about 1.5s. Taking both the above factors into consideration, our SLAM runs at 2s per scan with one thread. For comparison, LOAM runs at 1s per scan on the same KITTI dataset. The reason is that compared to LOAM which extract two-dimensional curvature features, we construct voxel grids and extract features according to the spatial distribution of the points contained in each voxel. This results in better accuracy (refer to Table I) with a slight sacrifice of efficiency.

TABLE IRESULTS ON KITTI ODOMETRY BENCHMARK

Sequence Environment LOAM SUMA Our SLAM

0 Urban 0.78% 0.7% 0.64%1 Highway 1.43% 1.7% 0.90%2 Urban+Country 0.92% 1.2% 0.74%3 Country 0.86% 0.7% 0.59%4 Country 0.71% 0.4% 0.49%5 Urban 0.57% 0.4% 0.43%6 Urban 0.65% 0.5% 0.36%7 Urban 0.63% 0.7% 0.35%8 Urban+Country 1.12% 1.2% 0.84%9 Urban+Country 0.77% 0.6% 0.53%10 Urban+Country 0.79% 0.7% 0.83%

TABLE IIIMPORTANCE OF THE PARAMETER n

Parameter n Drift on KITTI training dataset

n=1 1.34%n=5 0.98%

n=10 0.92%n=40 0.90%

TABLE IIICOMPARE POU MODEL TO PLANE MODEL

Parameter np Plane Model POU Model

n=3 0.93% 0.90%n=4 0.94% 0.92%n=5 0.94% 0.92%

TABLE IVTHE INFLUENCE OF cth TO THE RESULT

Parameter cth Drift on KITTI training dataset

cth=0.95 1.03%cth=0.90 0.92%cth=0.85 0.90%

0 200 400 600 800 1000Sequence 01

2000

4000

6000

8000

10000

12000

14000

16000

Num

ber

of n

on-g

roun

d fe

atur

esFig. 5. Variation Of the Number of Non-ground features In Different FramesOn Sequence 01

VI. CONCLUSION

We present a new 3D LiDAR SLAM method that is com-posed of a new feature voxel map and a new scan-to-modelmatching framework. We build a novel feature voxel mapwith voxels including salient shape characteristics. In orderto adapt to the proposed map, we implement scan-to-modelmatching using POU implicit surface representation to blendthe correspondence voxels in map together. As experimentalresults illustrate, the proposed method yields accurate resultsthat are on par with the state-of-the-art. Future work willproceed in two directions. From the research perspective,a specific and efficient octree will be designed to get 3Dgrid. Meanwhile, we will deploy the method to fulfil realtime application with the aid of multiple threads or GPU toaccelerate data processing.

ACKNOWLEDGEMENTS

This work is supported by the National Natural ScienceFund of China (Grant No. 91848111, 61703387).

REFERENCES

[1] Jikai Wang and Zonghai Chen. A novel hybrid map based global pathplanning method. In 2018 3rd Asia-Pacific Conference on IntelligentRobot Systems (ACIRS), pages 66–70, 2018.

[2] Jikai Wang, Peng Wang, and Zonghai Chen. A novel qualitativemotion model based probabilistic indoor global localization method.Information Sciences, 429:284–295, 2018.

[3] Peng Wang, Qi-Bin Zhang, and Zong-Hai Chen. A grey probabilitymeasure set based mobile robot position estimation algorithm. Inter-national Journal of Control Automation and Systems, 13(4):978–985,2015.

[4] Qibin Zhang, Peng Wang, and Zonghai Chen. Mobile robot poseestimation by qualitative scan matching with 2d range scans. Journalof Intelligent and Fuzzy Systems, 36(4):3235–3247, 2019.

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2019 doi:10.20944/preprints201909.0105.v1

Peer-reviewed version available at Appl. Sci. 2019, 9, 4147; doi:10.3390/app9194147

Page 8: POU-SLAM: Scan-to-Model Matchingon B ased Voxels

8

[5] Hui Xiong, Youping Chen, Xiaoping Li, Bing Chen, and Jun Zhang. Ascan matching simultaneous localization and mapping algorithm basedon particle filter. Industrial Robot-an International Journal, 43(6):607–616, 2016.

[6] Ji Zhang and Sanjiv Singh. Loam: Lidar odometry and mapping inreal-time. In Robotics: Science and Systems 2014, volume 10, 2014.

[7] Ji Zhang and Sanjiv Singh. Low-drift and real-time lidar odometry andmapping. Autonomous Robots, 41(2):401–416, 2017.

[8] Michael Bosse and Robert Zlot. Continuous 3d scan-matching with aspinning 2d laser. In 2009 IEEE International Conference on Roboticsand Automation, pages 4244–4251, 2009.

[9] Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. Deepvo:Towards end-to-end visual odometry with deep recurrent convolutionalneural networks. In 2017 IEEE International Conference on Roboticsand Automation (ICRA), pages 2043–2050, 2017.

[10] Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, DavidMolyneaux, David Kim, Andrew J. Davison, Pushmeet Kohi, JamieShotton, Steve Hodges, and Andrew Fitzgibbon. Kinectfusion: Real-timedense surface mapping and tracking. In 2011 10th IEEE InternationalSymposium on Mixed and Augmented Reality, pages 127–136, 2011.

[11] Tixiao Shan and Brendan J. Englot. Lego-loam: Lightweight andground-optimized lidar odometry and mapping on variable terrain. In2018 IEEE/RSJ International Conference on Intelligent Robots andSystems (IROS), pages 4758–4765, 2018.

[12] Jean-Francois Lalonde, Nicolas Vandapel, Daniel F. Huber, and MartialHebert. Natural terrain classification using three-dimensional ladar datafor ground robot mobility. Journal of Field Robotics, 23(10):839–861,2006.

[13] Yungeun Choe, Inwook Shim, and Myung Jin Chung. Urban structureclassification using the 3d normal distribution transform for practicalrobot applications. Advanced Robotics, 27(5):351–371, 2013.

[14] Haoyang Ye, Yuying Chen, and Ming Liu. Tightly coupled 3d lidarinertial odometry and mapping. arXiv preprint arXiv:1904.06993, 2019.

[15] Michał R. Nowicki, Dominik Belter, Aleksander Kostusiak, Petr Cızek,Jan Faigl, and Piotr Skrzypczynski. An experimental study on feature-based slam for multi-legged robots with rgb-d sensors. Industrial Robot-an International Journal, 44(4):428–441, 2017.

[16] Jens Behley and Cyrill Stachniss. Efficient surfel-based slam using3d laser range data in urban environments. In Robotics: Science andSystems XIV, volume 14, 2018.

[17] Kejie Qiu and Shaojie Shen. Model-aided monocular visual-inertialstate estimation and dense mapping. In 2017 IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS), pages 1783–1789,2017.

[18] Frank Moosmann and Christoph Stiller. Velodyne slam. In 2011 IEEEIntelligent Vehicles Symposium (IV), 2011.

[19] Martin Velas, Michal Spanel, and Adam Herout. Collar line segmentsfor fast odometry estimation from velodyne point clouds. In 2016 IEEEInternational Conference on Robotics and Automation (ICRA), pages4486–4495, 2016.

[20] Todor Dimitrov Stoyanov, Martin Magnusson, Henrik Andreasson, andAchim Lilienthal. Fast and accurate scan registration through mini-mization of the distance between compact 3d ndt representations. TheInternational Journal of Robotics Research, 31(12):1377–1393, 2012.

[21] Jari Saarinen, Todor Stoyanov, Henrik Andreasson, and Achim J.Lilienthal. Fast 3d mapping in highly dynamic environments usingnormal distributions transform occupancy maps. In 2013 IEEE/RSJInternational Conference on Intelligent Robots and Systems, pages4694–4701, 2013.

[22] Francois Pomerleau, Francis Colas, and Roland Siegwart. A review ofpoint cloud registration algorithms for mobile robotics. Foundations andTrends in Robotics, 4(1):1–104, 2015.

[23] Szymon Rusinkiewicz and Marc Levoy. Efficient variants of the icpalgorithm. In Proceedings Third International Conference on 3-DDigital Imaging and Modeling, pages 145–152, 2001.

[24] Ireneusz Tobor, Patrick Reuter, and Christophe Schlick. Reconstructingmulti-scale variational partition of unity implicit surfaces with attributes.international conference on shape modeling and applications, 68(1):25–41, 2006.

[25] Tung-Ying Lee and Shang-Hong Lai. 3d non-rigid registration formpu implicit surfaces. In 2008 IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops, pages 1–8, 2008.

[26] Jakob Engel, Jorg Stuckler, and Daniel Cremers. Large-scale directslam with stereo cameras. In 2015 IEEE/RSJ International Conferenceon Intelligent Robots and Systems (IROS), pages 1935–1942, 2015.

[27] R. Qi Charles, Hao Su, Mo Kaichun, and Leonidas J. Guibas. Pointnet:Deep learning on point sets for 3d classification and segmentation. In

2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 77–85, 2017.

[28] Peiliang Li, Xiaozhi Chen, and Shaojie Shen. Stereo r-cnn based 3d ob-ject detection for autonomous driving. arXiv preprint arXiv:1902.09738,2019.

[29] Renaud Dube, Daniel Dugas, Elena Stumm, Juan I. Nieto, Roland Sieg-wart, and Cesar Cadena. Segmatch: Segment based place recognition in3d point clouds. In 2017 IEEE International Conference on Roboticsand Automation (ICRA), pages 5266–5272, 2017.

[30] Ting Sun, Ming Liu, Haoyang Ye, and Dit-Yan Yeung. Point-cloud-based place recognition using cnn feature extraction. arXiv preprintarXiv:1810.09631, 2018.

[31] Tongtong Chen, Bin Dai, Ruili Wang, and Daxue Liu. Gaussian-process-based real-time ground segmentation for autonomous land vehicles.Journal of Intelligent and Robotic Systems, 76(3):563–582, 2014.

[32] Yutaka Ohtake, Alexander G. Belyaev, Marc Alexa, Greg Turk, andHans peter Seidel. Multi-level partition of unity implicits. In ACMTransactions on Graphics (TOG), volume 22, pages 463–470, 2003.

[33] Richard Hartley and Andrew Zisserman. Multiple View Geometry inComputer Vision. 2000.

[34] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready forautonomous driving? the kitti vision benchmark suite. In 2012 IEEEConference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012.

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 September 2019 doi:10.20944/preprints201909.0105.v1

Peer-reviewed version available at Appl. Sci. 2019, 9, 4147; doi:10.3390/app9194147