urban 3d modeling with mobile laser scanning a...

• Review •

Urban 3D modeling with mobile laser scanning：a review

Cheng WANG *，Chenglu WEN ，Yudi DAI ，Shangshu YU，Minghao LIU

Fujian Key Lab. On Sensing and Computing for Smart City, School of Informatics, Xiamen University

* Corresponding author， [email protected]

Abstract Mobile laser scanning (MLS) systems mainly comprise laser scanners and mobile mapping

platforms. Typical MLS systems are able to acquire three-dimensional point clouds with 1-10 centimeter

point spacing at a normal driving or walking speed in the street or indoor environments. The MLS' advantages

of efficiency and stability make it a quite practical tool for three-dimensional urban modeling. This paper

reviews the latest advances in 3D modeling of the LiDAR-based mobile mapping system (MMS) point cloud,

including LiDAR Simultaneous Localization and Mapping (SLAM), point cloud registration, feature

extraction, object extraction, semantic segmentation, and deep learning processing. Then typical urban

modeling applications based on MMS are also discussed.

Keywords 3D Modeling; MMS; LIDAR; Urban

1 Introduction

Urban 3D modeling is to establish a 2.5D or 3D digital representation of the earth's surface and related objects

such as building, road, vegetation, and the manmade attributes in an urban area. There are three major

categories of the urban 3D modeling approach. (1) the conventional geodetical mapping techniques, (2) the

approaches that based on 2D image photogrammetry, and (3) the approaches that are based on 3D

measurements, such as laser scanning. Though the data is dense and the precision is high, the conventional

geodetical mapping techniques are high time cost and poor mobility. Therefore, this method is not suitable

for large-scale mobile mapping tasks. The 2D image photogrammetry methods are easy to set up and low

cost, and it is very convenient to apply various visual deep learning methods to extract semantic information.

However, they are very sensitive to the environmental changes, such as the ambient light, the weather, the

dark site, etc. Moreover, the 3D model built by the image can't be directly applied to the navigation. The

modeling methods based on the LiDAR are of high precision and have high reliability, not easily affected by

the environment. Unlike the 3D model built by the image, the 3D model built by LiDAR or other 3D

measurement equipment can be used for autonomous driving. Therefore, the methods discussed in this paper

are mainly based on LiDAR or other 3D measurement equipment.

The task of large urban area 3D modeling demands high efficiency in data acquisition. The MLS system

is an MMS equipped with laser scanners. MMS technology provides a solution to efficient 3D modeling.

Mobile Mapping is a system technology that installs photogrammetry sensors on a mobile platform with

high-precision, high-efficiency georeferencing capabilities. MMSs are able to efficiently collect

georeferenced three-dimensional measurements of the environment during platform movement. Successful

MMSs include the VISAT[1] system from the University of Calgary, Canada, the GPSVan[2] developed by

the Ohio State University and the LD2000[3] developed by Wuhan University. At present, typical MLS can

collect 1 million points per second, covering the road and surrounding surface with a point density of 2000

points per square meter, 1-10 centimeter point spacing, and moving speed of 10~110km/h.

MLS point clouds are of large volume, heavy redundancy, and irregular distribution[4]. In addition, the

quality of the point cloud is also degraded by noise and occlusion. As a result, MLS point cloud processing

is a challenging task in urban 3D modeling. Standard point cloud processing topics, including feature point

extraction, matching, and registration, object detection, semantic segmentation, SLAM, etc.

This paper presents a review of the MLS solutions on urban 3D modeling, as shown in figure 1. In the

following of the article, section 2 review the MLS technology, section 3 gives the discussion on the

processing of MLS point cloud, and section 4 presents typical urban modeling applications based on MLS.

Figure 1 A introduction of the logic among all components of the urban 3D modeling using MLS.

2 MLS system

In this review, we firstly introduce the system design and the important sensors of the MLS. Among the

sensors in MLS, the Global Navigation Satellite System (GNSS) and Inertial Measurement Unit (IMU) are

the key components of MLS for navigation. But in the GNSS-denied environment, LiDAR plays an important

role.

2.1 System design

The MLS system is an MMS equipped with laser scanners. As shown in figure 2, MLS systems usually

consist of GNSS receivers, laser scanners, digital cameras, IMUs, and other devices. Synchronization

of the data from the above sensors to a time reference frame is achieved by a precise timestamp [5].

Calculation of ground coordinates for objects from laser scanning system observations has been well

documented in the literature[6]. Coordinates of the ground objects can be computed by combining the

measurements from the integrated GNSS/INS navigation system, laser scanner, and sensor calibration

parameters.

Figure 2 Setup of MLS systems [5].

2.2 GNSS and IMU

The MLS systems conduct the survey by the ground vehicles. In the MLS system, the navigation system,

which includes a global navigation satellite system (GNSS) and inertial measurement unit (IMU),

provides the vehicle's trajectory and attitude for the generation of georeferenced 3D point clouds (as

shown in figure 3) [4]. The relative precision of the point can be below the subcentimeter, and its absolute

accuracy depends on the above GNSS-IMU integrated navigation solution.

Figure 3 GNSS/IMU positioning for direct georeferencing of MLS point clouds[4].

2.2.1 GNSS/IMU integrated navigation

GNSS provides geographical position and velocity data of a GNSS receiver antenna by employing a

constellation of orbiting satellites. The most popular GNSS systems include Global Positioning System

(GPS) (United States), Global Navigation Satellite System (GLONASS) (Russia), COMPASS/BeiDou

Navigation System (BDS) (China), and Galileo (European Union).

The position measurement is computed by triangulating satellite signals within a clear view of the

receiver antenna. Generally, there must be four satellites visible for a positional fix, as shown in figure

4, and the accuracy of GNSS will raise with more available satellites. However, there are some common

error sources, for example, receiver noise, atmospheric delays, multipath, and satellite clock timing,

result in GNSS receivers usually have a positioning accuracy of 1-2m. Obstructions such as buildings

or trees block the satellite signal, which results in unreliable navigation. Some methods, such as post-

processing, precise point positioning (PPP), and real-time kinematic (RTK) [7] have been proposed to

improve the accuracy of GNSS.

An Inertial Navigation System (INS) computes a relative position over time by using rotation and

acceleration measurements from an IMU, which can measure relative movement in 3D space. An IMU

contains six complementary sensors, which are arrayed on three orthogonal axes. An accelerometer and

a gyroscope, which measures linear acceleration and rotational acceleration respectively, are coupled on

each of the three axes. Based on linear acceleration and rotational acceleration measurements, the INS

is able to calculate position and velocity. In addition to the position and velocity, for the three axes, IMU

also provides an angular solution, which can be translated into a local attitude (roll, pitch, and azimuth)

solution in INS [8].

When using an IMU for navigating in 3D space, hundreds/thousands of samples are acquired per

second. Meanwhile, errors are also being accumulated. Thus, without an external reference, an

uncorrected INS system will quickly drift from the true position. INS can estimate the error of the IMU

measurements by using a mathematical filter if an external reference provided by GNSS. GNSS provides

an absolute set of coordinates used as the initial start point and continuous positions and velocities used

for updating of the INS filter estimates. The integration of GNSS and INS enhances each other to provide

a more powerful navigation solution. For example, the INS system can be effectively used for navigation

for longer periods when GNSS is unreliable due to signal obstructions.

Figure 4 GNSS/IMU integrated navigation[8].

2.3 Laser Scanner

In the MLS system, the point cloud mainly comes from Laser scanners. Laser scanners sense distance by

emitting lasers and measuring the time, which considers laser light returns to the sensor, also known as

LiDAR. LiDAR can be used for plane mapping, navigation obstacle avoidance, and urban area modeling. 3D

LiDAR is mainly used in the outdoor environment, especially in geodesy, meteorology, geology, and military.

Usually, the optical pulse or the wave can only be used to measure the distance in a specific direction.

LiDAR normally includes oscillating mirrors, which can scan in multiple directions. According to the specific

oscillation mechanism, LiDAR can scan the surrounding environment in 2D or 3D.

Rotating LiDAR has a 360° view. With each rotation, they scan points along a cone originating from the

sensor, resulting in a single circular scan line. This cone angle is varied by a set amount after each full rotation,

with a maximum absolute angle so that the sensor is unable to scan the area directly above or under it.

The Velodyne VLP-16 and HDL-32 are representative products of the most affordable end of commercial

multi-beam sensors, and their main specifications are provided in Table 1. VLP-16 is more compact and

lightweight, while HDL-32 has a higher cost and better scanning effect.

Table 1 Manufacturer specifications (VLP-16 and HDL-32 sensors)[9]

VLP-16 HDL-32

Laser/detector

pairs 16 32

Range 1 m to 100m 1 m to 70 m

Accuracy ±3 cm ± 2 cm

Data Distance/Calibrated reflectivities Distance/Calibrated reflectivities

Data Rate 300,000 points/s 700,000 points/s

Vertical FOV 30∘:[−15∘, + 15∘] 41.3∘:[−30.67∘, + 10.67∘]

Vertical

Resolution 2.0° 1.33°

Horizontal FOV 360° 360°

Horizontal

Resolution 0.1° to 0.4° (programmable) 0.08° to 0.35° (programmable)

Size 103 mm × 72 mm 85.3 mm × 149.9 mm

Weight 0.83 Kg 1.3 Kg

The main advantages of LiDAR are:

1) Different types of LiDAR can provide different measuring ranges, from a few centimeters to more than

100 meters. Therefore, it can be used in both indoor and outdoor environments.

2) The horizontal aperture of LiDAR is usually between 90 and 360 degrees.

3) The angle resolution of LiDAR is usually less than 1 degree.

4) The measurement error of LiDAR is low, usually constant (for a short distance), or linear with the

distance.

5) LiDAR has a medium and high sampling rate, which is necessary to work in a dynamic environment;

the sampling rate is usually adjustable from 10Hz to 20Hz.

The main disadvantage of LiDAR is that the price is relatively high. In addition, LiDAR's power

consumption is high (more than ten times of the camera), and its scanning performance will decline in the

case of fog, rainstorm, or dust.

2.3.1 SLAM-based navigation

The ground receiver's visibility to GNSS satellites constitutes the main driver of accuracy for GNSS

positioning. But GNSS signal is vulnerable to external interference and causes failure when the platform

is in a complex environment such as in high buildings, steep slopes, or indoor environment, and the

accuracy will be degraded. This is the reason that alternative techniques must be assumed. SLAM is

arguably one of the most important algorithms in Robotics and 3D vision, which can work in the GNSS-

denied environment.

LiDAR SLAM LiDAR has been an important sensor in robot navigation for obstacle avoidance and path

planning. Meanwhile, LiDAR-based SLAM methods, such as feature-based registration method[10], Iterative

Closest Point[11] (ICP), and Normal Distribution Transform[12] (NDT), have been proposed for estimating the

transformation between two sets of overlapped point clouds.

A feature-based registration method is commonly used for initial transformation estimation between

the two point clouds. This kind of method first finds the key features in the two point clouds. Then,

computes the descriptors for the key features to matching. Finally, it calculates the transformation matrix

between the corresponding key features.

ICP converges to local minima by minimizing the squared error, and it can be categorized to be: point-

to-point, point-to-plane, and plane-to-plane ICP. For point-to-point ICP, correspondence pairs are build

by pairing each point in the first point cloud with the closest point in the second point cloud. Then, in

each correspondence pair, the transformation between the two point clouds is computed by minimizing

the sum of the squared distance between the points.

NDT method employs statistical models of points to estimate the possible alignment between the

two point clouds.

LiDAR SLAM might fail due to the sparseness of LiDAR point clouds. The integration of camera and

LiDAR is able to improve the performance [10,13]. Camera-based visual odometry can provide initial

estimation for ICP and correct the motion distortion of the point clouds caused by the different receiving

times of the points. Scherer et al. [14] estimated the ego-motion of the system by the integration of images

and IMU data and then refined the ego-motion estimation by LiDAR data. Droeschel et al. [15] developed

a 3D multi-resolution map for robot navigation by fusing LiDAR data with a 3D map.

As the sensor scans its surroundings, the platform may move and rotate. An extremum example: if the

platform counter-rotate at the same angular velocity as its rotating-scanner, all points will be located on

the same vertical plane in the world frame. Naively mapping the full scan from sensor frame to world

frame with a single affine transform will not accurately portray this effect. This is because each point is

taken at a different moment in time, and thus each has its own frame relative to the world frame. Precise

robot poses at various times during the scanning process allows for correcting for distortion by

associating a different affine mapping from sensor frame to world frame for each group of points

acquired.

Multi-sensor SLAM Monocular visual odometry (VO) has been well explored in this area for years,

and there are some robust and mature solutions, such as MonoSLAM [16], ORB-SLAM [17], SVO[18], etc.

For example, graph-based optimization and loop closure have prevailed for visual SLAM, such as

RTAB-Map [19,20]. However, visual SLAM methods are limited under dynamic weather and insufficient

lighting conditions. To improve accuracy and robustness, some researchers combine more cameras to

this system. There are also some visual solutions fused with the Inertial Measurement Unit (IMU), such

as Geneva's work [21], AbolDeepIO [22], VINS [23] and its advanced version——VINS-Fusion [24]. VINS-

Fusion fuses local states (camera, IMU, LiDAR, etc.) with global sensors (GPS, magnetometer,

barometer, etc.), and it achieves globally drift-free and locally accurate pose estimation. The fusion of

local estimations from existing VO/VIO approaches and global sensors is shown in figure 5.

Figure 5 An illustration of VINS-Fuison[24]

Due to the demands of high accuracy maps in autonomous driving, the robustness in dynamic

environments, and dense point cloud data, LiDAR-based SLAM is always the main topic in this field.

Meanwhile, multi-beam LiDAR's price drops dramatically in recent years. The fusion between LiDAR

and other sensors (Cameras, IMU, etc) has been studied by many researchers (figure 6).

Figure 6 Block diagram of the V-LOAM[13] system

A straight forward solution to fuse lasers and cameras is to use the VO result as an initial guess for

the ICP or GICP[25] pipeline, such as Pandey's work[26] and Zhang's work[13]. Zhang combined visual

odometry and LiDAR odometry for mapping tasks. There are some methods treating color information

as the fourth channel of a 3D-point for the later ICP pipeline [27,28,29]. Another way to fuse cameras and

lasers' information is to use LiDAR information to enhance visual features. Graeter et al. [30] proposed

LIMO, which can track camera features and estimate camera motion from LIDAR point clouds.

There are some studies focusing on LiDAR-IMU fusion, which is a topic that remains to be

sufficiently investigated [10]. Ye et al. [31] introduced a tightly coupled LiDAR-IMU fusion method by

jointly minimizing the cost derived from LiDAR and IMU measurements. Geneva et al. [21] presented

LIPS, a singularity free plane factor leveraging the closest point plane, fusing with IMU in a graph -

based optimization framework. Kuindersma et al. [32] proposed an optimization-based method.

3 Processing of Point Cloud Data

What to do with the large point cloud data? There is a lot of work focusing on the processing of point

cloud. We divide these methods into five categories: feature extraction, registration, completion,

semantic segmentation, and object/instance extraction. And we will discuss them in this section.

3.1 Point Clouds Completion

With the increasing popularity of data acquisition devices, like laser scanners and RGB-D cameras, even

complicated objects can be digitized with impressive accuracy. Given different digitizing technologies,

there are still several limitations pertaining to environmental conditions, inter -object occlusion, and

sensor capabilities constrain fully-effective scene depth captured by a mobile laser scanner. Incomplete

data will bring inconvenience to subsequent research. For incomplete data, we often want a

corresponding complete version. For simple data acquisition, we can re-scan to obtain the complete data.

However, sometimes it is hard to complete the 3D data by re-scanning it due to the occlusion by the

object or inaccessibility to the incomplete area by the scanning device, so we need to complete the data

manually or automatically.

This has created an area of completing the missing 3D information of an MLS data or other forms of

3D data. Existing methods for 3D data completion are categorized into geometry-based, data-driven,

and learning-based approaches.

3.1.1 Geometry-based approaches

Geometry-based approaches complete shapes via geometric cues from the input, where the missing

regions are inferred from observed areas. These approaches perform well in completing small holes and

regular shapes in a reasonable time cost.

Surface reconstruction approaches

Many prior works on surface reconstruction generate smooth interpolations to fill holes in locally

incomplete scans. The outperform of surface reconstruction always relies on what kind of environment

the MLS data represent. The most common scenario is the traffic scene, which is also easy to reconstruct

its surface.

An early work [33] proposed a road surface reconstruction method to process the raw data and produce

detail preserving 3D models. Another method [34] recognizes curbs while reconstructing the missing

information caused by occlusion, and it also reconstructs road surfaces and pavements in a centimeter

precision while reconstructing missing information of curbs. For indoor mapping, an incremental surface

growing-based method [35] was proposed to create the triangular mesh and fill the holes with sizeable

noisy LiDAR data from an indoor environment.

Some other methods reconstruct the surface in various ways, some construct operators for surface

approximation [36,37]; some provide algorithms to fill the holes on the surface [38,39,40]. Road surface

reconstruction approaches will fail when the surface of the object is seriously damaged due to occlusion.

Symmetry-based approaches

Symmetry is a common characteristic of real-world objects, like buildings. Symmetry is commonly

exploited to analyze and process the computational representations of most 3D objects from the real

world. Symmetry- based methods identify repeating structures and symmetry axes to duplicate parts to

incomplete regions.

Some studies complete small objects, such as household objects [41,42,43], and huge size objects, such

as building structures [44]. Many objects are not symmetric on the whole, but some parts of them are

symmetric. For these kinds of 3D objects, methods [45,46,47] were proposed to implement symmetry-based

completion on each part with symmetric character. Thrun et al. [45] described a technique for segmenting

objects into parts characterized by different symmetries and used them to attend the partial 3D shape

model into the occluded space. Another more general approach [46] was proposed to efficiently discover

and extract a compact representation of their Euclidean symmetries, which captures essential high-level

information about the structure, and in turn, enables further processing operations, including shape

symmetrization and segmentation, etc.

Regularity-based approaches

Regular geometric structures are ubiquitous in both natural and human-made objects, for which

repeated structure is an essential mechanism in how to recognize and understand the world — for many

objects are characterized by the presence of such patterns.

Regularity-based completion approaches are widely used to complete 3D building models [48,49,50]

since they are one of the most regular objects in the real world. These methods complete data with

various regularity principles, such as performing Fourier analysis on each scanline to fill holes and

generate meshes[49], or merely exploiting the large scale repetitions found in building scans and use it to

remodel the input [48].

3.1.2 Data-driven approaches

Considering that generating perfectly correct and complete data could be hard, data-driven approaches

complete shapes by matching the incomplete object with template models from template shape database .

The main idea of these approaches is to retrieve the most similar 3D model of the input query, which

can be done in single objects, such as vehicles and furniture, but not large scenes as buildings.

Retrieval-based approaches of replacement with completed object

Most of the retrieval-based methods retrieve the complete shape from a database and directly

replace the incomplete one with it [51,52,53].

Two methods [51,52] along with datasets of thousands of models, are provided for 3D shape retrieval,

which replaces the defective scanned data with the retrieved model. The idea of replacement is

implemented in 3D indoor reconstruction by classifying each object in the scene and replace the

mutilated objects with a complete one from the dataset to finish the reconstruction [53].

Approaches of assembling parts to obtain the complete shape

Some research deems that simply replacing the incomplete 3D object with a complete one is a bit

sloppy, and they advise to complete the 3D shape by retrieving and assembling object parts [51,54, 55, 56].

3.2 Feature Extraction (line, plan, and supervoxel)

How to process the massive and complex point cloud data efficiently is a challenge. There are two main

types of methods to process point clouds. The first type of method first project high-density point cloud

data into 2D images, and then apply image processing techniques [57,58,59]. Another type of methods

process point cloud data in feature space. Line and plane features contain abundant geometric

information of point clouds, especially in artificial environments. These features are generally parallel,

orthogonal or coplanar, which can effectively reduce the complexity of point clouds without losing the

main geometric information of point clouds. Therefore, line and plane extraction are widely used in the

target recognition[60], point cloud registration[61], reconstruction[10,13] and so on.

Line extraction can be classified into two categories. One is to project it into 2D images, and then use

LSD [62] or EDLines [63] to extract lines from images, and then back-project these lines into 3D space to

get the 3D line. Jain et al. [64] extracted the straight lines of the same scene from the multi-view images,

and then returned these lines to 3D space according to the view information, and finally obtained the 3D

straight lines. Lin et al. [58] proposed a Line-Half-Planes (LHP) model to extract 2D lines by projecting

3D point clouds onto multi-view images and then obtain 3D lines by projecting 2D lines back into 3D

space. The advantage of projecting point clouds into images is that the existing 2D line extraction

algorithms can be fully utilized. And the disadvantage is that it would take a long time to process large-

scale point clouds. There are many works that extract line features on point clouds directly. Daniels et

al. [65] used Robust Moving Least-Squares to fit the surface locally and then calculated a set of smooth

curves aligned along the edge to identify the line features in the point cloud, and finally returned a set

of complete smooth feature curves. Kim et al.[66] used a moving least-squares approximation to estimate

the local curvatures and their derivatives at a point by means of an approximating surface. Lin et al. [67]

presented a facet segmentation-based line segmentation method, which can work directly on the point

cloud. This method can extract more complete and correct line segments compared to the method [58].

Several different algorithms have been proposed for plane extraction from 3D point clouds.

Traditional plane extraction techniques can be generally categorized into region-growing[68,69,70], Hough

transform[71,72],and model-fitting methods[73,74,75]. However, these methods do not take advantage of the

geometric constraints of the point clouds. Lin et al. [76] proposed a method based on energy minimal to

reconstruct the planes, leveraging a constraint model that requires minimal prior knowledge to implicitly

establish relationships among planes. To balance between high-accuracy and high-efficiency, El-Sayed

et al. [77] proposed a plane detection method based on octree-balanced density down-sampling and

adaptive plane extraction. Nguyen et al. [78] utilized the scan profile patterns and the planarity values

between different neighboring scan profiles to detect and segment planar features in sparse and

heterogeneous MLS point clouds. Kwon et al. [79] proposed a plane extraction algorithm that consists of

decomposing, expanding and merging steps, which is robust to a low-density point cloud by adding

expansion stage in between the conventional decomposing and merging stages.

Line extraction and plane extraction are based on point-wise. In order to process point cloud faster,

supervoxels is proposed. Supervoxels, an analog of superpixels in the 3D domain, is a promising

alternative by which redundancy in the information can be markedly reduced, enabling computational

efficiency for fully automatic operation, with a minimal loss of information. Using supervoxels, a point

cloud is divided into a number of patches firstly and then processed in a patch-wise manner, rather than

a point-wise manner. Voxel cloud connectivity segmentation (VCCS) is a commonly-used supervoxel

generation method [80,81]. Lin et al. [82] formalized the supervoxel segmentation problem as a subset

selection problem optimized efficiently by a heuristic method utilizing local information for each point.

Zai et al. [83] proposed an improved supervoxel algorithm to generate supervoxels with adaptive sizes

inspired by the point cloud segmentation method [67]. Wang et al. [84] proposed an efficient 3D object

detection method by integrating supervoxel with a hough forest framework.

3.3 Matching and Registration

3D point cloud registration, a key issue for 3D data processing, is usually considered as Rigid

registration Urban 3D reconstruction, which can be solved by using transforming parameters with six

degrees of freedom (6DoF). Many related methods have been proposed for different applications.

The ICP [11] algorithm alternates between estimating the point correspondence and the transformation

matrix (see figure. 7). Many variations of this method [85,86,87,88] also have been proposed. However,

drawbacks of ICP are (1) explicit estimation of closest point correspondences, which leads to the

complexity scaling quadratically with the quantity of the points, (2) sensitivity to initialization, and (3)

difficulty of integrating with deep learning framework because of differentiability issue. The above

methods can not guarantee the global optimality of the solutions. Therefore, many researchers focus on

optimization algorithms to estimate relative transformation [89,90,91].

Pioneer studies on handcrafted 3D feature descriptors were mostly inspired by their 2D counterparts.

Many approaches including SHOT [92], RoPS [93], TOLDI [94], FPFH [95] and ACOV [96], estimate a unique

local reference frame (LRF), which is not robust for noise. Therefore, it is difficult to adapt to MLS

large-scale point clouds. With the development of deep-learning methods in geometric representations

of 3D data, learned-based 3D local feature descriptors are quickly applied to point cloud registration.

These works [97,98,99,100,101,102,103,104] focus on learning local features with robustness and then extract

matching correspondences by some strategies such as RANSAC, finally, the extracting correspondences

are used to estimate the transformation matrix. Some other studies [105,106,107,108] focus on constructing

an end-to-end network-based local feature learning to achieve point cloud registration. However, there

still exist some studies using the learning of global information to regress rotational transformation

matrices and translation vectors [109,110].

Figure 7 ICP overview scheme

Following the method of RANSAC, Aiger et al. [111,112] proposed a randomized alignment approach,

which uses planar congruent sets to compute optimal global rigid transformation. However, these

RANSAC-like methods are point-level operations, which may easily be sub-optimal when computing

transformation.

3.4 Semantic Labelling and Segmentation

Semantic labeling and segmentation of point cloud is the concept of understanding and recognizing the

meaningful entities in a scene by assigning each point to an entity. Entities in an urban scene include

sky, buildings, facades, roads, windows, doors poles, pedestrians, etc. In this section, we review

classification and semantic segmentation methods that focus on terrestrial laser scanning (TLS) and

MLS of the point cloud. Che's work [113] contained comprehensive literature on terrestrial mobile laser

scan processing covering semantic segmentation, feature extraction, and object recognition.

3.4.1 Feature-based methods

Feature-based methods label each point in the point cloud by extracting and joining features to form a

vector and perform labeling by deploying a trained classifier. Hackel et al. [114] reduced computation

time and also addressed the challenge of varying densities in point cloud by handling the strong varying

densities of TLS points. TLS and MLS point clouds are made up of millions of points and as such

labeling, each point is computationally intensive. Weinmann et al. [115] improved classification results

by using five different definitions of the neighborhood in selecting optimized features in the feature

extraction process. Hu et al. [116] used gridded segmentation to address the computational challenges and

their pipeline achieved good segmentation results without relying on computationally expensive

representations of the scene. Segmentation and classification are simultaneously conducted in Zhao's

work [117] by classifying each segment using its geometric properties and evaluating homogeneity in each

segment via its object class. Segmentation results can be improved by spatially smoothing neighbor

elements. Probabilistic models, for example, Markov Random Field (MRF) and Conditional Random

Field (CRF), are used for this purpose. Lu et al. [118] assigned semantic labels to each point by calculating

node potentials and edge potentials using the distance between points, and contextual relatio nships

between points were given by MRF. Another network [119] took advantage of CRFs to propagate

contextual information between neighboring entities. They performed discrete, multi -label classification

by learning high-dimensional parameters of CRFs and the higher-order models were robust in preserving

salient labels.

Previously, handcrafted features are primarily used for vision tasks. The handcrafted features were

designed to be invariance to certain transformations; however, they are usually geared towards a specific

task and requires a lot of human intervention. Feature-based methods in this category heavily rely on

handcrafted features that have since been outperformed by learned features.

3.4.2 Deep learning methods

Currently, many 3D deep learning techniques are being proposed. Deep learning techniques learn

features that can be applied in multiple tasks and are learned in an end-to-end manner, requiring little

human intervention. Convolutional neural networks (CNN) have proven to be effective in data formats

that have a regular format like the grid-like structure of pixels in 2D images. However, deploying CNNs

directly on point clouds is challenging hence it is an active and ongoing research area. Point clouds are

irregular and as such the segmentation of points has taken the following directions.

In general, deep learning methods in 3D can be categorized into Volumetric CNN, Multiview CNN, and

Point-based methods corresponding to the popular 3d data representations of Volumetric, Multiview

images, and Point Cloud respectively.

Volumetric CNNs

Volumetric CNNs operates on a volumetric data, which is often represented as a 3D binary voxel grid.

3D-ShapeNets [120] proposed shapeNets by representing a 3D shape as a probability distribution of binary

variables on a 3D grid. The voxel grid makes it's possible to apply 3D convolution operation. In Charles's

work [121], they proposed a model for predicting objects from partial sub-volumes by addressing

overfitting using auxiliary training tasks and another model for convolving the 3D shapes with the

anisotropic probing kernel. Besides, VoxelNet[122] used 3D CNN on voxels for real-time object

recognition. VoxelNet incorporates normal vectors of the object surfaces to the voxels to improve

discrimination capability. Although techniques based on volumetric CNNs have good performances,

they have higher memory and computation cost due to sparsity of the occupancy grid, and the

voxelization methods introduce quantization artifacts.

Multi-view CNNs

Projection of 3D point cloud to the 2D grid is to leverage the high performance of 2D segmentation

algorithms by rendering the 3D data in 2D. These techniques are based on the traditional CNN which

operates on 2D images. Giving a 3D object, these techniques map the 3D object into a collection of 2D

images of the object taken from different angles. Compared to their volumetric counterparts, multi -view

CNNs have better performance as the multi-view images contain richer information than 3D voxels. Su

et al.[123] proposed the first work on Multiview CNN for object recognition and achieved state-of-the-art

accuracy. Leng et al. [124] proposed a stacked local convolutional autoencoder (SLCAE) for the 3D

object retrieval task. In Tosteberg's work [125], 3D point clouds were projected into a 2D image and the

image is semantically segmented using a 2D semantic classifier. This operation loses valuable

information in the transformation of 3D to 2D because the former is richer in content (depth information).

In Wu's work[126], the spherical projection was used in a pipeline containing 2D CNNs and CRFs to

project point clouds into a 2D grid. The CNN part of the pipeline performs segmentation and the CRF

part refines it. "Auto-labelling" is an approach of transferring high-quality image-based semantic

segmentation from reference cameras to point clouds [127]. A Fully Convolutional Neural (FCN) network

was used in pixel-wise semantic segmentation of roads from top view images of point cloud [128]. Lawin

et al. [129] employed a similar approach and even went further to investigate the significance of surface

normal, depth, and color on the architecture. The main drawback of these methods is information loss

in the 3D-to-2D projection process.

Point-based

Direct processing[130,131] of the 3D point cloud is also popular. Point-based methods are pioneered by

PointNet[132]. Because point cloud is unstructured, irregular, and unordered, it is often converted into

volumetric shape or multi-view images to be processed using volumetric CNN and multi-view CNN

respectively. However, many methods can operate directly on point clouds in an end-to-end manner

using a combination of symmetric functions. The symmetric functions are composed of a multilayer

perceptron that is shared by all the input points and the global feature is extracted using maxpooling

function which is also symmetric. PointNet++ [133] extended PointNet to include local dependency by

applying PointNet hierarchically on local regions. Several other methods [134, 135, 136] were introduced to

improve the local dependency computations. PointCNN [135] applies X-transformations on local regions

before applying pointnet-like MLPs. VoxelNet[134] processed point cloud directly to achieve object

detection by dividing giving input into voxels, and used points in each voxel to compute feature vector

for the voxel, this process is applied hierarchically in Stacked Voxel Feature Encoding layers . Region

proposals are used for object detection. DGCNN [136] presented point cloud as a graph where each point

is a node that is connected by a directed graph to its neighboring points and used a convolution like

operation, EdgeConv, on neighboring pairs of points to exploit local geometry. Huang et al.[137] proposed

a multi-scale feature extraction method that embeds local features into a low-dimensional and robust

subspace. SEGCloud [138] transformed the point cloud to voxels because the former has a regular

structure and thus CNNs can be deployed on them. The architecture combines 3D-FCN, trilinear

interpolation, and CRF to label 3D point clouds. The processing of urban-scale voxels is compute-

intensive, Semantic3Dnet [139] is a large-scale benchmark of labeled TLS points that is essential in urban-

scale classification and segmentation tasks. OctNet[140] trained a network on different resolutions of

voxels to address resolution and computation challenges to segment 3D colored point cloud in the

RueMonge2014 dataset [141] of Haussmanian style facades into the window, wall, balcony, door, roof,

sky, and shop. Engelmann et al. [142] built its framework upon PointNet [132] by enlarging its receptive

field to cater for urban-scale scenes. Landrieu et al. [143] presented an architecture that directly addresses

the challenge of semantic segmentation of urban-scale scenes by encoding contextual relationships

between object parts in the 3D point cloud. The network first partitions the point cloud into simple

shapes called "superpoints" which are then embedded using PointNet [132] for onward segmentation. The

"superpoints" enables the segmentation of large-scale scenes. Xu et al.[144] presented a supervised

classification method for LiDAR point cloud semantic labeling.

There are few annotated large-scale datasets because the manual point-wise labeling of points is time-

consuming and effortful. This is the major challenge in large-scale classification and semantic

segmentation of point clouds because the tasks are mostly supervised. This is an active and ongoing

research field. The task of labeling and segmenting urban scenes is an active research area, especially

with the advent of deep learning. Its major challenges are scaling already existing algorithms or

generating novel pipelines to cater to large-scale scenes and lack of detailed annotated datasets to serve

as the benchmark for classification and segmentation tasks. Currently, Deep learning techniques on point

cloud are becoming increasingly popular due to an increase in popularity of laser scanners, and because

they require less preprocessing than both multi-view and volumetric CNNs as they operate directly on

the point cloud. Point-based 3D deep learning methods and deep learning on other unstructured data

such as social networks are becoming increasingly popular under the term 'Geometric deep learning'

introduced in LeCun's work [145].

3.5 Object/Instance Extraction

3D object detection is crucial for many real-world applications, such as robotics, autonomous driving,

and Augmented/Virtual Reality. 3D object detection locates and recognizes objects in 3D scenes by

estimating oriented 3D bounding boxes and semantic labels of objects from point clouds.

Range scans include the spatial coordinates of the 3D point cloud by nature, so they have an advantage

over camera images in locating the detected objects. Also, point clouds are robust to illumination

changes. In addition, compared with the detection in images, the object detection in point cloud naturally

locates an object in 3D and provides crucial information for subsequent tasks like navigation. However,

unlike images, 3D point clouds are sparse and have inconsistent point density because of non-uniform

sampling in 3D space, sensor range, and occlusion. Thus, detecting objects in point clouds still faces

huge challenges.

Existing object detection methods for point clouds are mainly divided into three categories as follows:

(1) Projection-based methods, which project point clouds into multiple perspective views, and apply

image-based object detection methods. (2) Voxelization based methods, which rasterize point clouds

into a 3D voxel grid and transform them into regular tensors. (3) Direct methods, which operate point

clouds and predict bounding box directly without other processing.

Projection-based methods

Projection-based methods project point clouds into perspective views and apply image-based

techniques, which may sacrifice critical geometric details[146]. Alejandro et al. [147] developed a multi-

cue, multimodal, and multi-view framework for pedestrian detection with handcrafted features and

random forest classifier, which boost the accuracy by a much larger margin. Bo et al. [148] presented 3D

point clouds in a 2D point map and then used a fully convolutional network to simultaneously predict

the object confidence and the bounding boxes. Chen et al. [149] formulated an object detection problem

as minimizing an energy function encoding object size prior, ground plane, and several depths informed

features that reason about free space, point cloud densities, and distance to the ground. Yang et al. [150]

propose a proposal-free, single-stage 3D object detector, called PIXOR, which estimates oriented 3D

object from pixel-wise neural network predictions on point clouds.

Voxelization-based methods

Voxelization-based methods voxelized the irregular point clouds to 3D grids, and then apply 3D CNN

for object detection. These methods fail to leverage data sparsity and suffer from high time cost due to

3D convolution operations. Dominic et al. [151] proposed an efficient and effective framework to apply

the sliding window approach on a 3D point cloud for object detection. They demonstrated that

exhaustive window searching in 3D can be efficient by fully exploiting the sparsity problem. They

proved the mathematical equivalence between sparse convolution and voting. Martin et al. [152] detected

3D objects in point clouds using CNNs constructed from sparse convolutional layers. Chen et al. [153]

proposed multi-view 3D networks (MV3D) by using both LiDAR point cloud and images to predict

oriented 3D bounding boxes. Li et al. [154] proposed a 3D fully convolutional network for object detection

in the point cloud. Zhou et al. [134] proposed a 3D detection network, called VoxelNet, by integrating

feature extraction and bounding box prediction into an end-to-end deep network. Daniel et al. [155]

presented a method of detecting small and potentially obscured obstacles in vegetated terrain. The

novelty of this method is the coupling of a volumetric occupancy map with a 3D CNN, which allows

training an efficient and highly accurate framework for detection tasks from raw occupancy data.

Direct methods

Recently, many approaches operate on raw point clouds and predict bounding box directly without

other processing. Shi et al. [156] proposed PointRCNN for 3D object detection from point cloud by the

bottom-up 3D proposal generation and proposal refinement in the canonical coordinates. Charles et al.

[157] introduced VoteNet, which votes the object centroids directly from point clouds and aggregates

votes to generate high-quality object proposals by local geometry. Alex et al. [158] proposed PointPillars,

which is a method for object detection in 3D that enables end-to-end learning with only 2D convolutional

layers. PointPillars uses a novel encoder that learns features on vertical columns (pillars) of the point

cloud to predict 3D oriented boxes for objects.

In summary, with the growth of the designing deep learning architectures suited for point clouds, the

3D object detection plays a key role in the point cloud processing. However, how to directly detect 3D

objects in the raw point cloud is still a work worthy of future work.

4 Typical Urban Modeling Applications based on MLS

The MLS technology has greatly facilitated the acquisition of the Urban 3D model. Nowadays, more

and more applications based on MLS have been proposed. In this section, we introduce four major

applications based on MLS: 1) Building Facet Modeling, 2) High Definition (HD) Map, 3) Building

Information Models, 4) Traffic Visibility Evaluation.

4.1 Building Facet Modeling

Recently, increasing attention has been attracted by 3D building modeling and reconstruction of indoor

buildings and large-scale urban buildings. Building models are usually composed of the primitives of

the buildings and may cause many difficulties to model, as shown in figure 8.

Figure 8 Example of urban building modeling with complex structures [159].

Thanks to the rapid developments in LiDAR technology, which has greatly facilitated the acquisition

of 3D model data of indoor and large-scale urban scenes. The captured point cloud can be inherently

capable of representing the physical geometry of real scenes, which leads to great help for modeling. In

city scenes, there is a huge number of urban objects with a great variety of shapes, so it will be difficult

and time-consuming to get manual modeling of urban buildings from raw point clouds.

Therefore, the automatic reconstruction of refined 3D models of large-scale urban buildings from the

raw point cloud is still a big challenge for researchers. The main difficulty is the data quality of raw

point cloud from urban buildings. LiDAR point clouds are often contaminated by noise and outliers and

may be influenced by the point density, coverage, and occlusions.

Zhou et al. [160] proposed a novel building segmentation and damage detection to realize automated

component-level damage assessment for major building envelop elements including the wall, roof,

balcony, column, and handrail. Goebbels et al. [161] used airborne LiDAR point cloud and true

orthophotos to get better building model edges. Zhang et al. [162] constructed a Delaunay TIN model and

an edge length ratio-based trace algorithm to refine the building's boundary and then used clusters from

the same plane point set to determine the roof structures. Chen et al. [163] integrated the LiDAR point

cloud and large-scale vector map to model buildings. They proposed to preprocess LiDAR point cloud

and vector maps, roof analysis, and building reconstruction in three steps to get the building models. Yi

et al. [159] used the divide-and-conquer strategy to decompose the entire point cloud into a number of

individual building subsets and then extracted the primitive elements through a novel algorithm named

Spectral Residual Clustering (SRC). The final accurate 3D building models were generated by applying

the union Boolean operations over the block models.

Xiong et al. [164] analyzed the property of topology graphs of building model surfaces and found the

three basic primitives of roof topology graphs. Wang et al. [165] combined the advantages of point cloud

and optical images to describe the high accurate building facade feature. Zhang et al. [166] proposed a

novel framework for urban point cloud classification and reconstruction. They presented an activation

function that rectified linear units' neural network (ReLu-NN) to the rectified linear units (ReLu) to

speed up convergence. Díaz et al. [167] analyzed the visibility issue of indoor environments and detected

door candidates. Stambler et al. [168] introduced room, floor, and building-level reasoning, and built

higher accurate models by performing modeling and recognition jointly over the entire building.

Javanmardi et al. [169] proposed an automatic and accurate 3D building model reconstruction technique

that integrates airborne LiDAR point cloud with a 2D boundary map. Zhang et al. [170] proposed a deep

neural network that integrates a 3D convolution, a deep Q-network, and a residual recurrent neural

network to get semantic labels for large-scale point cloud data. Then they used classification results and

an edge-aware resampling algorithm to generate urban building models. López et al. [171] utilized

historical and bibliographical data to get the graphic and semantic information of point cloud, and used

BIM software to create a library of parametric elements. Ochmann et al. [172] developed a parametric

building model that incorporates contextual information such as global wall connectivity . Xiong et al.[173]

proposed a parameter-free algorithm to robustly and precisely construct roof structures and building

models.

Hojebri et al. [174] proposed a method based on the fusion of point cloud and image, which can get

more accurate modeling results. Hron et al. [175] represented a review of auto-generation for 3D building

models from the point cloud. Based on the concept of data reuse, Chen et al. [176] proposed a building

modeling method, which has physical geometric shapes similar to a user-specified point cloud query,

and they can be retrieved and reused for the purpose of buildings model data extraction and modeling.

Zhang et al. [177] generated regular grid data DSM, used the Canny and Hough transform operator to

extract the edges of the building, and used E3De3 software to establish the 3D building model. Chen et

al. [178] introduced a novel encoding scheme based on low-frequency spherical harmonic basis functions

for 3D building model retrieval.

Contrary to the previous work, Demir et al. [179] proposed an approach that can operate directly on the

raw point cloud. Their approach consists of semi-automatic segmentation, a consensus-based voting

schema, a pattern extraction algorithm, and an interactive editing tool. Teng et al. [180] proposed a fast

and easy algorithm of plane segmentation based on cross-line element growth (CLEG) for 3D building

modeling. Chen et al. [163] integrated point cloud and large-scale vector maps to get the 3D building

model. Wang et al. [181] proposed a novel semantic line framework-based modeling building method

based on the backpacked point cloud. The proposed method can perform effectively line framework

extraction and output results for building modeling.

As listed in Table2, [166] report the quantitative evaluation of building models. BN is the building

identifier; #Pts is the number of ALS points in the building; #Model vertices is the number of vertices

in the building; #Model faces is the number of triangles in the building; % Completeness and R represent

the standard deviations of the distance of ALS points to their corresponding faces (in meters). The

numbers in ∑ parentheses represent the results obtained by the paper[182].

Table 2 Quantitative evaluation of building models. [166]

Dataset BN #Pts #Model vertices #Model faces %Completeness ∑

[166] a 16559 1972 3631 99.5 0.02

(16559) (2102) (3875) (98.2) (0.06)

b 1819 198 318 95.8 0.03

(1819) (217) (391) (94.1) (0.04)

c 10194 1179 2199 97.3 0.02

(10194) (1294) (2236) (96.2) (0.03)

d 9198 2097 3938 95.9 0.02

(9198) (2097) (4182) (93.8) (0.03)

e 8111 1782 3298 100 0.06

(8111) (1882) (3421) (98.4) (0.08)

f 33537 4356 7794 94.2 0.15

(33537) (4492) (7925) (93.5) (0.17)

Mean 97.1 0.05

(95.7) (0.07)

4.2 High Definition (HD) Map

HD map is a crucial technology for autonomous driving [183], especially, forego vehicle localization [184]

and cars' motion planning [185].

There are plenty of related works for constructing HD maps. Zhang et al. [186] built an HD map system

and described the components of an HD map according to an embodiment. Siam et al. [187] proposed a

semantic segmentation method to construct HD maps from images. Barsi et al. [188] created HD maps

using the TLS system.

HD maps can be divided into two types. One is Dense Semantic Point Cloud Maps, and the other one

is based on the landmark. The former type is constructed by a laser scanning point cloud with semantics.

Such an HD map consists of the road surface, road markings, road boundaries, traffic signs, etc. Hence,

many high-tech companies use this type of map due to its highly accurate and integrated road

information.

However, HD maps are hard to build directly from the collected LiDAR point cloud. Generally, the

collected point cloud consists of buildings, roads, parking lots, vegetation, and other uninteresting points.

Therefore, many kinds of literature studied how to extract each component of HD maps separately.

Road surface

The road surface is one of the primary components of the HD map, and also one of the essential parts

of road structure. In general, the raw data collected from the laser scanning system contains lots of

irrelevant points and noise, separating the on-road points and off-road points from raw data is a key step

for other HD map components extraction. Many methods have been proposed for road surface detection

and extraction from the point cloud and these works are mainly categorized into:(1) 3D-based methods

and (2) GRF-based methods [189].

In order to decrease computation complexity, trajectory information was used in road surface

extraction. Wu et al. [190] vertically partitioned raw point clouds using trajectory and then performed the

Random Sample Consensus (RANSAC) method to extract ground points by calculating the average

height of ground points. Based on point cloud features, Hata et al. [191] extracted ground surfaces by

using different filters, including differential filter and regression filter, on the point cloud. There are

some curb-based road surface extraction methods. Guan et al. [192] presented an assumption that road

curbstones can represent the boundaries of the pavement and extracted the road surface by separating

pavement surfaces from roadsides.

Many methods convert the point cloud into a 2D georeferenced feature (GRF) maps, and then road

surfaces are efficiently detected and extracted based on existing computer vision technologies. To

minimize computation complexity, Riveiro et al. [193] projected the point cloud onto a 2D space and then

detected the road by using principal component analysis (PCA). Yang et al. [194] extracted road surfaces

by generating GRF images to filter out off-ground objects.

Road boundary

Road boundary is also called road edge, road curb, curbstone, which is an essential part of HD maps.

The road boundary mostly is extracted by height jump between sidewalks and driveways. To reduce

computational complexity, some trajectory-based methods have been proposed. Wang et al. [195] first

divided the point cloud into several parts along the trajectory, and then the road boundary was extracted

and refined from each part. Wang et al. [196] extracted road boundary from the point cloud by considering

that the altitude of the road boundary points varies heavily. Zai et al. [83] detected rough road boundary

via super voxels and alpha-shape algorithm, and then extracted curb by applying graph cuts on the

trajectory and rough boundary. Since road edge extraction can be regarded as a classification problem,

Rachmadi et al. [197] detected road edge from the 3D point cloud using Encoder-Decoder Convolutional

Network. Based on the 3D local feature, Yang et al. [194] proposed a new binary kernel descriptor (BKD)

to detect road curbs and markings.

Road markings

Since road marking is an important part of the HD map for self-driving and is the key component to

achieve accurate navigation. Abundant related works extract road markings from laser scanning point

cloud. The road markings consist of different marking types, such as lane lines, zebra crossings, arrows,

texts, etc. Therefore, much research also studied the classification of road markings. The related studies

can be mainly divided into two categories. One is based on the 3D method, and the other on GRF -

projection.

3D-based methods extract road markings directly from road surface based on the distinct intensity

difference between markings and other points. The trajectory can be used to locate the positions of road

markings. Hence, Chen et al. [198] proposed a profile-based intensity analysis by partitioning the point

cloud into slices along the trajectory of the vehicle and then extracted the road markings via analyzing

the peak value of intensity in each scan line. Yu et al. [199] extracted road markings by using a multi-

segment thresholding strategy and spatial density filtering from the point cloud, then extracted and

classified small-sized road markings via Deep Boltzmann Machines (DBMs)-based neural networks.

To enable the image processing method, Jung et al. [200] rasterized point cloud into the x-y plane, and

then lane markings were extracted by intensity contrast. Since the road marking extraction can be treated

as semantic segmentation, there are lots of applicable neural networks. With the emergence of a large

number of image classification networks, the road markings can be classified efficien tly by such

networks. Wen et al.[201] proposed a deep learning framework to extract, classify, and complete road

markings. A modified U-net first extracts road marking from the projected intensity image, and then a

multi-scale clustering algorithm and a CNN classifier classify road markings. At last, a conditional

generative adversarial network and a context-based method complete the classified markings.

Traffic signs

The traffic sign is also an important part of HD maps, which provides critical information about roads

for traffic safety in autonomous driving navigation. Generally, traffic signs are part of pole-like objects

in a raw point cloud. Therefore, pole-like object extraction was performed firstly in majority related

work. Then, pole-like objects are classified into different categories that contain different types of traffic

signs. Most researches are performed by analyzing the position, continuity, verticality, shape, size, the

intensity of the pole-like objects [113].

Based on the size and intensity difference, Wen et al. [201] set a minimum threshold in clusters to

remove small objects and filtered out non-sign objects. By using the traffic sign attributes, Arcos-García

et al. [202] developed height and planar filters to eliminate small parts and non-planar parts. In Huang's

work [203], traffic signs were firstly detected from point cloud based on high intensity and position, and

then an occlusion detection was performed to analyze traffic sign occlusion by observing the relationship

between viewpoint and traffic sign.

The detected traffic signs are usually classified into different types via analyzing the features of point

clouds and images. Wen et al. [201] extracted integral features consisting of a Histogram of Gradients

(HOG) and color descriptor and used the support vector machine (SVM) to train a classification model.

There are also some deep learning-based methods for traffic sign recognition [202]. Yu et al. [204] projected

the point cloud into a 2D image and applied the Gaussian–Bernoulli deep Boltzmann machine model for

traffic sign recognition (TSR).

The key point of building an HD map using the LiDAR point cloud is how to accurately extract each

component of the HD map from raw data. However, until now, there is no method to extract all the HD

components at once. With the development of point cloud semantic segmentation, this goal may be

achieved in the future.

4.3 Building Information Models (BIM)

The indoor building model is the data source of BIM, which plays a vital role in building maintenance,

disaster rescue, and building renewal planning. However, it is a time-consuming and labor-consuming

process to generate indoor three-dimensional models artificially. To generate three-dimensional models

more efficiently, many studies generate indoor models from the original point clouds automatically. For

example ， Previtali et al.[205] proposed a method based on optimization to detect the indoor

characteristics of buildings (i.e., the same shape, the same alignment, and the same spacing); Wang et

al. proposed a new method of semantic indoor-modeling based on line frame; Tran et al. [206] proposed

a novel shape grammar method, which can effectively generate three-dimensional models. Shi et al. [207]

presented a method capable of automatically reconstructing 3D building models with semantic

information from the unstructured 3D point cloud of indoor scenes. Xiao et al. [208] presented a

framework that recovers missing points and estimates connectivity relations between planar and non -

planar surfaces to obtain complete and high-quality 3D models.

Existing approaches for indoor modeling can be classified into linear-primitive type, planar-primitive

type, and volumetric-primitive type [209].

4.3.1 Linear-primitive

The line-primitive indoor modeling method assumes that the wall is plane and vertical to the ground,

and then the indoor model is built based on the plane map. Oesau et al. [210] presented a graph-cut-based

indoor reconstruction method to solve an inside/outside labeling of a space partitioning based on the

raw point cloud. Ochmann et al. [172] proposed a parametric modeling method for reconstructing

parametric three-dimensional building models from indoor point clouds and automatically

reconstructing structural models containing multiple indoor scenes. Ochmann et al. [211] also presented

a novel method to tackle the indoor building reconstruction problem from point clouds using integer

linear programming. Li et al. [212] presented a segmentation method in the reconstruction of 3D indoor

interiors. This method overcomes the over-segmentation of graph-cut operations for long corridors and

removes shared surfaces to reconstruct connected areas across multiple floors. The line-primitive indoor

modeling method deals with indoor point clouds from a two-dimensional perspective, which is usually

only applicable to the situation of ground independence and no clutter.

4.3.2 Planar-primitive

The planar-primitive methods mainly involve the two steps: the plane is firstly extracted by classification,

and then the plane model is built based on the result of the classification. Sanchez et al. [213] used Random

Sampling (RANSAC) for plane fitting and alpha shape to calculate their ranges and extract large-scale

plane structures such as ground, ceiling, and wall from indoor point cloud data. Similarly, Budroni et al.

[214] used plane scanning to extract ceilings, floors, and walls. These methods can extract the plane very

well, but do not consider occlusion. And these methods usually use "context-based" reasoning to

distinguish building elements before plane fitting and intersection. These methods are not suitable for

complex indoor scenes or serious data missing situations. Wang et al. [181] proposed a method that

semantically labels the 3D point clouds into different categories firstly, and then extracts the line

structures from the labeled points separately.

4.3.3 Volumetric-primitive

The volumetric-primitive methods have a stronger regularity. These methods generally satisfy the

Manhattan world hypothesis that only vertical and horizontal environments are included. Furukawa et

al. [215] proposed an inverse solid geometry algorithm, which detects walls in 2D and then combines

them into cubes. Khoshelham et al. [216] proposed a grammar-based approach to reconstruct the indoor

space that satisfies the Manhattan world hypothesis by iteratively placing, connecting, and merging

cubes. Previtali et al. [217] transformed the indoor reconstruction problem into the labeling problem of

result units in a two-dimensional plane under the condition of satisfying the Manhattan world hypothesis.

Kim et al. [218] presented a geometry and camera pose reconstruction algorithm from image sequences

for indoor Manhattan scenes.

4.3.4 Door and Window Detection

Indoor building models generally include the main structure of buildings, such as ceilings, floors, walls,

doors, windows, and other immovable objects, excluding furniture and other movable objects. The

detection of doors and windows is also a necessary part of indoor building models. Michailidis et al. [219]

focused on the wall and extracts the structure of doors and windows by detecting the holes in the wall.

However, this method can only operate on a single wall, and cannot be directly carried out on all indoor

point cloud data. Wang et al. [181] determined the most peripheral boundary line of the wall through point

clouds on the ground and ceiling, and then only retains the internal line structure when extracting the

wall line structure which is used to detect the location of doors and windows. Jung et al. [220] first divided

the point cloud into several separate rooms, then modeled the wall, and finally projected the point on

the wall onto a reverse binary, and then detects the doors and windows. Quintana et al. [221] proposed a

method for detecting doors and windows in three-dimensional color point clouds. It detects open doors

based on rectangular data on the wall, while closed doors are based on identifying the actual wall area

and rectangular area that does not correspond to the wall in subsequent processing. Previtali et al. [222]

proposed a voxel-based marking method based on visibility analysis. Díaz-Vilariño et al. [223] applied

generalized Hough transform to wall orthophoto images generated by color point clouds to detect closed

doors. Previtali et al. [222] detected doors and windows from occlusion by implementing a ray-tracing

algorithm after extracting the wall. Doors and windows are modeled by obtaining parametrized

rectangular shapes in images using the generalized Hough transform (Díaz-Vilariño et al. [224]).

Nikoohemat et al. [225] presented several algorithms for the interpretation of interior space using MLS

point clouds in combination with the trajectory of the acquisition system.

4.4 Traffic Visibility Evaluation

Maintaining the high visibility of traffic signs is crucial for traffic safety. The research on the visibility

of traffic signs can be divided into the following categories: simulation-based methods, image-based

methods, naturalistic driving experimentation-based methods, and point clouds-based methods.

Simulation-based methods gather statistics based on the visual or cognitive information collected from

volunteers and output evaluation by simulation. Some researchers use the simulation platform to

investigate cognition time [226], driver behavior associated with visual distractions [227], and cognitive

workload [228]. Motamedi et al. [229] analyzed traffic sign visibility by BIM-enabled Virtual Reality (VR)

environments. Eye tracker equipment [230, 231] has been also used to determine the visual cognition of

traffic signs under simulated driving conditions. The simulation-based methods cannot provide a

quantitative evaluation of visibility and visual or cognitive information for real roads.

Image-based methods compute the visibility of a traffic sign based on different contrast ratios and

numbers of pixels in the occluded area of an image [232,233,234,235]. These methods cannot continuously

evaluate visibility over an entire road surface because of viewpoint position limitations. Meanwhile,

image-based methods are not robust to lighting conditions and do not consider the current geometric

properties of the road and traffic signs.

Naturalistic driving experimentation-based methods recognize driving modes by long periods of

observation of a driver's behavior in natural conditions[236,237,238]. Since human cognition takes time, a

driver has to stop to obtain visibility from a given viewpoint in naturalistic driving. This drawback leads

to the difficulty of obtaining the visibility distribution of traffic signs.

Point-clouds-based methods research the visibility based on point clouds [203,239,240,]. Mobile Laser

Scanning (MLS) systems provide an efficient 3D measurement over large-scale traffic environment.

Zhang et al [241] proposed the concept of the visibility field of a traffic sign and took geometric factor,

occlusion factor, and sightline deviation factor into account to build a model to evaluate the visibility

distribution of traffic signs. Their algorithm is by far the only automated algorithm that can test the

visibility field on real roads on a large scale. The experimental results are shown in figure 9.

Figure 9 The large-scale application of visibility fields calculation on a real road[241]. In this figure, the detected

traffic signs are in yellow color, occluding point clouds are in red color, and visibility field results are shown as mesh

planes. The box marked with a cross ("X") represents the occluded traffic sign that is observed in that area. The color

change of the mesh planes from green to red means that visibility values change from big to small.

5 Future Work

Despite the achieved success, 3D modeling based on MLS scanning still encounters many challenges.

First of all, the next generation MLS should have a new FOG-based IMU and multi-GNSS constellation

receiver to promote more reliable positioning. It will also integrate a smaller laser scan head than

currently used to achieve higher scanning frequency and easier operation. At the same time, due to the

speedy development of present hardware technology, the expected cost of the MLS system will continue

to reduce, and promote their use. More research and application are needed to explore the full potential

of MLS in the future, combining LiDAR data and UAV images. According to the collected data,

automatic algorithms such as terrain extraction, urban 3D modeling, and vegetation analysis need further

development, and semi-automatic change detection mapping also needs further development.

At present, the deep learning on point cloud is still in its early stages. Current work should not only

focus on improving the accuracy and performance of the dataset, but also ensure the methods' robustness

and portability. More sophisticated deep learning architectures need to be developed to handle the

challenge of uneven distribution and possible insufficient sampling in the point cloud from the real -

world. Few datasets capture the complexity of real-world urban scenes, and the comprehensive semantic

understanding of the complex urban street is till challenging for artificial intelligence. While the rapidly

growing urban MLS point cloud data will bring up a new category of geo-bigdata and is funding the soil

for better AI on point clouds. Finally, urban is never static. The changes in downtown buildings, roads,

vegetations never stop, together with the dynamic scenery of traffic and pedestrians. Most methods only

focus on how to build an accurate 3D city model with scanning once, which lacks the abundant dynamic

information of the real world. It is a more challenging and worthy studying work that investigates the

dynamic 3D modeling combing the point cloud with other sensors such as cameras.

The future application of the MLS system will play an important role in various detection and

modeling tasks in various civil fields, such as transportation, civil engineering, forestry and agriculture,

and in-process monitoring and understanding analysis of some natural sciences, such as archaeology and

Geosciences.

6 Conclusion

Large area urban 3D modeling has evolved rapidly in the past few years. The current development of

MLS-based urban 3D modeling includes two parts, the hardware MLS system and the processing of

point clouds including LiDAR SLAM, point cloud registration, feature extraction, object extraction,

semantic segmentation, and deep point cloud processing. MLS's current development bings together

various levels of innovation from deep point cloud processing to high-level applications, such as BIM,

HD map, and traffic monitoring. Research works on mobile mapping 3D modeling using laser scanning

and is reviewed and discussed in this paper.

References

1 El-Sheimy N. The development of VISAT: a mobile survey system for GIS applications. University of Calgary,

1996

2 Thompson J W, Sorvig K. Sustainable landscape construction: a guide to green building outdoors. Island Press,

2007

3 Deren L I. Mobile mapping technology and its applications. Geospatial Information, 2006, 4(4): 1-5.

4 Kukko A, Kaartinen H, Hyyppä J, et al. Multiplatform mobile laser scanning: usability and performance. Sensors,

2012, 12(9): 11712-11733.

5 OlsenM J. Guidelines for the Use of Mobile LiDAR in Transportation Applications. Transportation Research

Board, 2013

6 Glennie C. Rigorous 3D error analysis of kinematic scanning LiDAR systems. Journal of Applied Geodesy jag,

2007, 1(3): 147-157.

7 Feng Y, Gu S, Shi C, et al. A reference station-based GNSS computing mode to support unified precise point

positioning and real-time kinematic services. Journal of Geodesy, 2013, 87(10-12): 945-960.

8 Jeffrey C. An introduction to GNSS: GPS, GLONASS, Galileo and other global navigation satellite systems.

NovAtel, 2010

9 Martinsanz G P. State-of-the-art Sensors Technology in Spain 2017 Sl: MDPI, 2018

10 Zhang J, Singh S. Laser–visual-inertial odometry and mapping with high robustness and low drift. Journal of

Field Robotics, 2018, 35(8): 1242-1264.

11 Besl P J, McKay N D. Method for registration of 3-d shapes. Sensor fusion IV: control paradigms and data

structures. International Society for Optics and Photonics, 1992, 1611:00:00 586-606.

12 Biber P, & Straßer W. The normal distributions transform: A new approach to laser scan matching. In

Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No.

03CH37453) (Vol. 3, pp. 2743-2748). IEEE.

13 Zhang J, Singh S. Visual-LiDAR odometry and mapping: Low-drift, robust, and fast. 2015 IEEE International

Conference on Robotics and Automation (ICRA). IEEE, 2015:00:00 2174-2181.

14 Zheng Fang, Sebastian Scherer. Experimental study of odometry estimation methods using RGB-D cameras.

2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2014

15 Maurelli F , Droeschel D , Wisspeintner T , et al. A 3D laser scanner system for autonomous vehicle

navigation. Advanced Robotics, 2009 ICAR 2009 International Conference on. IEEE, 2009

16 Davison A J, Reid I D, Molton N D, et al. MonoSLAM: Real-time single camera SLAM. IEEE transactions on

pattern analysis and machine intelligence, 2007, 29(6): 1052-1067.

17 Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: a versatile and accurate monocular SLAM system.

IEEE transactions on robotics, 2015, 31(5): 1147-1163.

18 Forster C, Pizzoli M, Scaramuzza D. SVO: Fast semi-direct monocular visual odometry. 2014 IEEE international

conference on robotics and automation (ICRA). IEEE, 2014:00:00 15-22.

19 Labbe M, Michaud F. Appearance-based loop closure detection for online large-scale and long-term operation.

IEEE Transactions on Robotics, 2013, 29(3): 734-745.

20 Labbe M, Michaud F. Online global loop closure detection for large-scale multi-session graph-based SLAM.

2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2014:00:00 2661-2666.

21 Geneva P, Eckenhoff K, Yang Y, et al. LIPS: LiDAR-inertial 3d plane slam. 2018 IEEE/RSJ International

Conference on Intelligent Robots and Systems (IROS). IEEE, 2018:00:00 123-130.

22 Esfahani M A, Wang H, Wu K, et al. AbolDeepIO: A novel deep inertial odometry network for autonomous

vehicles. IEEE Transactions on Intelligent Transportation Systems, 2019

23 Qin T, Li P, Shen S. Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE

Transactions on Robotics, 2018, 34(4): 1004-1020.

24 Qin T, Cao S, Pan J, et al. A general optimization-based framework for global pose estimation with multiple

sensors. arXiv preprint arXiv:1901.03642, 2019..

25 Segal A, Haehnel D, Thrun S. Generalized-icp. Robotics: science and systems. 2009, 2(4): 435

26 Pandey G, Savarese S, McBride J R, et al. Visually bootstrapped generalized ICP. 2011 IEEE International

Conference on Robotics and Automation. IEEE, 2011:00:00 2660-2667.

27 Andreasson H, Triebel R, Burgard W. Improving plane extraction from 3d data by fusing laser data and vision.

2005 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2005:00:00 2656-2661.

28 Joung J H, An K H, Kang J W, et al. 3D environment reconstruction using modified color ICP algorithm by

fusion of a camera and a 3D laser range finder. 2009 IEEE/RSJ International Conference on Intelligent Robots

and Systems. IEEE, 2009:00:00 3082-3088.

29 Men H, Gebre B, Pochiraju K. Color point cloud registration with 4D ICP algorithm. 2011 IEEE International

Conference on Robotics and Automation. IEEE, 2011:00:00 1511-1516.

30 Graeter J, Wilczynski A, Lauer M. Limo: LiDAR-monocular visual odometry. 2018 IEEE/RSJ International


31 Ye H, Chen Y, Liu M. Tightly coupled 3d LiDAR inertial odometry and mapping. 2019 International

Conference on Robotics and Automation (ICRA). IEEE, 2019:00:00 3144-3150.

32 Kuindersma S, Deits R, Fallon M, et al. Optimization-based locomotion planning, estimation, and control design

for the atlas humanoid robot. Autonomous robots, 2016, 40(3): 429-455.

33 Yu S J, Sukumar S R, Koschan A F, et al. 3D reconstruction of road surfaces using an integrated multi-

sensory approach. Optics and lasers in engineering, 2007, 45(7): 808-818.

34 Hervieu A, Soheilian B. Semi-automatic road/pavement modeling using mobile laser scanning. ISPRS Annals of

the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2013, 2:00 31-36.

35 Marton Z C, Rusu R B, Beetz M. On fast surface reconstruction methods for large and noisy point clouds.

2009 IEEE international conference on robotics and automation. IEEE, 2009:00:00 3218-3223.

36 Lipman Y, Cohen-Or D, Levin D, et al. Parameterization-free projection for geometry reconstruction. ACM

Transactions on Graphics (TOG), 2007, 26(3): 22-es.

37 Nealen A, Igarashi T, Sorkine O, et al. Laplacian mesh optimization. Proceedings of the 4th international

conference on Computer graphics and interactive techniques in Australasia and Southeast Asia. 2006:00:00 381-

389.

38 Sarkar K, Varanasi K, Stricker D. Learning quadrangulated patches for 3d shape parameterization and

completion. 2017 International Conference on 3D Vision (3DV). IEEE, 2017:00:00 383-392.

39 Zhao W, Gao S, Lin H. A robust hole-filling algorithm for triangular mesh. The Visual Computer, 2007,

23(12): 987-997.

40 Davis J, Marschner S R, Garr M, et al. Filling holes in complex surfaces using volumetric diffusion.

Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission. IEEE,

2002:00:00 428-441.

41 Kroemer O, Amor H B, Ewerton M, et al. Point cloud completion using extrusions. 2012 12th IEEE-RAS

International Conference on Humanoid Robots (Humanoids 2012). IEEE, 2012:00:00 680-685.

42 Figueiredo R, Moreno P, Bernardino A. Automatic object shape completion from 3d point clouds for object

manipulation. International joint conference on computer vision, imaging and computer graphics theory and

applications. 2017, 4:00 565-570.

43 Sipiran I, Gregor R, Schreck T. Approximate symmetry detection in partial 3d meshes. Computer Graphics

Forum. 2014, 33(7): 131-140.

44 Wolf D, Howard A, Sukhatme G S. Towards geometric 3d mapping of outdoor environments using mobile

robots. 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2005:00:00 1507-

1512.

45 Thrun S, Wegbreit B. Shape from symmetry. Tenth IEEE International Conference on Computer Vision

(ICCV'05) Volume 1 IEEE, 2005, 2:00 1824-1831.

46 Mitra N J, Guibas L J, Pauly M. Partial and approximate symmetry detection for 3D geometry. ACM

Transactions on Graphics (TOG), 2006, 25(3): 560-568.

47 Xu K, Zhang H, Tagliasacchi A, et al. Partial intrinsic reflectional symmetry of 3D shapes. ACM Transactions

on Graphics (TOG), 2009, 28(5): 1-10.

48 Zheng Q, Sharf A, Wan G, et al. Non-local scan consolidation for 3D urban scenes[J]. ACM Trans. Graph.,

2010, 29(4): 94:1-94:9.

49 Friedman S, Stamos I. Online facade reconstruction from dominant frequencies in structured point clouds. 2012

IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 2012:00:00

1-8.

50 Pauly M, Mitra N J, Wallner J, et al. Discovering structural regularity in 3d geometry. ACM SIGGRAPH 2008

papers. 2008:00:00 1-11.

51 Li Y, Dai A, Guibas L, et al. Database‐assisted object retrieval for real‐time 3d reconstruction. Computer

Graphics Forum. 2015, 34(2): 435-446.

52 Pauly M, Mitra N J, Giesen J, et al. Example-based 3d scan completion. Symposium on Geometry Processing.

2005 (CONF): 23-32.

53 Nan L, Xie K, Sharf A. A search-classify approach for cluttered indoor scene understanding. ACM Transactions

on Graphics (TOG), 2012, 31(6): 1-10.

54 Kalogerakis E, Chaudhuri S, Koller D, et al. A probabilistic model for component-based shape synthesis. ACM

Transactions on Graphics (TOG), 2012, 31(4): 1-11.

55 Girdhar R, Fouhey D. F, Rodriguez M, & Gupta A. Learning a predictable and generative vector representation

for 56Wu J, Zhang C, Xue T, Freeman B, & Tenenbaum J. Learning a probabilistic latent space of object

shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems (pp. 82-

90).

57 Guan H, Li J, Yu Y, et al. Iterative tensor voting for pavement crack extraction using mobile laser scanning

data. IEEE Transactions on Geoscience and Remote Sensing, 2014, 53(3): 1527-1537.

58 Lin Y, Wang C, Cheng J, et al. Line segment extraction for large scale unorganized point clouds. ISPRS

Journal of Photogrammetry and Remote Sensing, 2015, 102:00:00 172-183.

59 Zheng G, Moskal L M, Kim S H. Retrieval of effective leaf area index in heterogeneous forests with terrestrial

laser scanning. IEEE Transactions on Geoscience and Remote Sensing, 2012, 51(2): 777-786.

60 Wang Z, Zhang L, et al. A multiscale and hierarchical feature extraction method for terrestrial laser scanning

point cloud classification. IEEE Transactions on Geoscience and Remote Sensing, 53(5), 2409-2425.

61 Pathak K, Birk A, Vaskevicius N, & Poppinga J. Fast registration based on noisy planes with unknown

correspondences for 3-D mapping. IEEE Transactions on Robotics, 26(3), 424-441.

62 Von Gioi R G, Jakubowicz J, Morel J M, et al. LSD: A fast line segment detector with a FALSE detection

control. IEEE transactions on pattern analysis and machine intelligence, 2008, 32(4): 722-732.

63 Akinlar C, Topal C. EDLines: A real-time line segment detector with a FALSE detection control. Pattern

Recognition Letters, 2011, 32(13): 1633-1642.

64 Jain A, Kurz C, Thormählen T, et al. Exploiting global connectivity constraints for reconstruction of 3D line

segments from images. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

IEEE, 2010:00:00 1586-1593.

65 Daniels J I I, Ha L K, Ochotta T, et al. Robust smooth feature extraction from point clouds. IEEE

International Conference on Shape Modeling and Applications 2007 (SMI'07). IEEE, 2007:00:00 123-136.

66 Kim S K. Extraction of ridge and valley lines from unorganized points. Multimedia tools and applications,

2013, 63(1): 265-279.

67 Lin Y, Wang C, Chen B, et al. Facet segmentation-based line segment extraction for large-scale point clouds.

IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(9): 4839-4854.

68 Besl P J, Jain R C. Segmentation through variable-order surface fitting. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 1988, 10(2): 167-192.

69 Pu S, Vosselman G. Automatic extraction of building features from terrestrial laser scanning. International

Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2006, 36(5): 25-27.

70 Masuta H, Makino S, Lim H. 3D plane detection for robot perception applying particle swarm optimization.

2014 World Automation Congress (WAC). IEEE, 2014:00:00 549-554.

71 Duda R O, Hart P E. Use of the Hough transformation to detect lines and curves in pictures. SRI

INTERNATIONAL MENLO PARK CA ARTIFICIAL INTELLIGENCE CENTER, 1971

72 Xu L, Oja E, Kultanen P. A new curve detection method: randomized Hough transform (RHT). Pattern

recognition letters, 1990, 11(5): 331-338.

73 Fischler M A, Bolles R C. Random sample consensus: a paradigm for model fitting with applications to image

analysis and automated cartography. Communications of the ACM, 1981, 24(6): 381-395.

74 Awwad T M, Zhu Q, Du Z, et al. An improved segmentation approach for planar surfaces from unstructured

3D point clouds. The Photogrammetric Record, 2010, 25(129): 5-23.

75 Schnabel R, Wahl R, Klein R. Efficient RANSAC for point‐cloud shape detection. Computer graphics forum.

Oxford, UK: Blackwell Publishing Ltd, 2007, 26(2): 214-226.

76 Lin Y, Li J, Wang C, et al. Fast Regularity-Constrained Plane Reconstruction. arXiv preprint arXiv:1905.07922,

2019

77 El-Sayed E, Abdel-Kader R F, Nashaat H, et al. Plane detection in 3D point cloud using octree-balanced

density down-sampling and iterative adaptive plane extraction[J]. IET Image Processing, 2018, 12(9): 1595-

1605.

78 Nguyen H L, Belton D, Helmholz P. Planar surface detection for sparse and heterogeneous mobile laser

scanning point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 151:00:00 141-161.

79 Kwon H, Kim M, Lee J, et al. Robust Plane Extraction using Supplementary Expansion for Low-Density Point

Cloud Data. 2018 15th International Conference on Ubiquitous Robots (UR). IEEE, 2018:00:00 501-505.

80 Papon J, Abramov A, Schoeler M, et al. Voxel cloud connectivity segmentation-supervoxels for point clouds.

Proceedings of the IEEE conference on computer vision and pattern recognition. 2013:00:00 2027-2034.

81 Babahajiani P, Fan L, Kamarainen J, et al. Automated super-voxel based features classification of urban

environments by integrating 3D point cloud and image content. 2015 IEEE International Conference on Signal

and Image Processing Applications (ICSIPA). IEEE, 2015:00:00 372-377.

82 Lin Y, Wang C, Zhai D, et al. Toward better boundary preserved supervoxel segmentation for 3D point

clouds[J]. ISPRS journal of photogrammetry and remote sensing, 2018, 143:00:00 39-47.

83 Zai D, Li J, Guo Y, et al. 3-D road boundary extraction from mobile laser scanning data via supervoxels and

graph cuts. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(3): 802-813.

84 Wang H, Wang C, Luo H, et al. 3-D point cloud object detection based on supervoxel neighborhood with

Hough forest framework. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,

2015, 8(4): 1570-1581.

85 P. J. Besl, N. D. Mckay, A method for registration of 3-D shapes, IEEE Transactions on Pattern Analysis and

Machine Intelligence 14 -2 -2002 239–256 (2002).

86 K. H. Bae, D. D. Lichti, A method for automated registration of unorganised point clouds, ISPRS Journal of

Photogrammetry and Remote Sensing 63 -1 -2008 36–54 (2008).

87 A. Gressin, C. Mallet, J. Demantk, N. David, Towards 3D LiDAR point cloud registration improvement using

optimal neighborhood knowledge, ISPRS Journal of Photogrammetry and Remote Sensing 79 (I-3) -2013 240–

251 (2013).

88 J. Stechschulte, C. Heckman, Hidden markov random field iterative closest point, CoRR abs/1711.05864 (2017).

arXiv:1711.05864.

89 J. Yang, H. Li, D. Campbell, Y. Jia, Go-ICP: A globally optimal solution to 3d icp point-set registration, IEEE

Transaction on Pattern Analysis and Machine Intelligence 38 -11 -2016 2241–2254 (2016).

90 D. Campbell, L. Petersson, GOGMA: Globally-optimal gaussian mixture alignment, in IEEE Conference on

Computer Vision and Pattern Recognition, 2016, pp. 5685–5694 (2016).

91 J. Straub, T. Campbell, J. P. How, J. W. Fisher, Efficient global point cloud alignment using bayesian

nonparametric mixtures, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp.

2403–2412 (July 2017).

92 F. Tombari, S. Salti, L. D. Stefano, Unique signatures of histograms for local surface description, in European

Conference on Computer Vision Conference on Computer Vision, 2010 (2010).

93 Y. Guo, F. Sohel, M. Bennamoun, M. Lu, J. Wan, Rotational projection statistics for 3d local surface

description and object recognition, International Journal of Computer Vision 105 -1 -2013 63–86 (2013).

94 J. Yang, Q. Zhang, Y. Xiao, Z. Cao, Toldi: An effective and robust approach for 3d local shape description,

Pattern Recognition 65 -2017 175 – 187 (2017). doi:https://doi.org/10.1016/j.patcog.2016.11.019.

95 R. B. Rusu, N. Blodow, Z. C. Marton, M. Beetz, Aligning point cloud views using persistent feature

histograms, in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008, pp. 3384–3391

(Sept 2008).

96 D. Zai, J. Li, Y. Guo, M. Cheng, P. Huang, X. Cao, C. Wang, Pair-wise registration of TLS point clouds

using covariance descriptors and a non-cooperative game, ISPRS Journal of Photogrammetry and Remote

Sensing 134 (Supplement C) -2017 15–29 (2017).

97 A. Zeng, S. Song, M. Niebner, M. Fisher, J. Xiao, T. Funkhouser, 3D- Match: Learning local geometric

descriptors from RGB-D reconstructions -2017 199–208 (2017).

98 H. Huang, E. Kalogerakis, S. Chaudhuri, D. Ceylan, V. G. Kim, E. Yumer, Learning local shape descriptors

from part correspondences with multi-view convolutional networks, ACM Transactions on Graphics 37 -1 -2017

1–14 (2017).

99 G. Elbaz, T. Avraham, A. Fischer, 3D point cloud registration for localization using a deep neural network

auto-encoder, in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2472–2481 (2017).

100 M. Khoury, Learning Compact Geometric Features Supplementary Material -2017 153–61 (2017).

101 H. Deng, T. Birdal, S. Ilic, Ppf-foldnet: Unsupervised learning of rotation invariant 3d local descriptors, in

European Conference on Computer Vision, 2018 (2018).

102 Gojcic Z, Zhou C, Wegner J D, et al. The perfect match: 3d point cloud matching with smoothed densities.

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019:00:00 5545-5554.

103 Xu, Y., Boerner, R., Yao, W., Hoegner, L., & Stilla, U. (2019). Pairwise coarse registration of point clouds in

urban scenes using voxel-based 4-planes congruent sets. ISPRS journal of photogrammetry and remote sensing,

151, 106-123.

104 Shi, Xiaojing, Tao Liu, and Xie Han.

ImprovedIterativeClosestPoint(ICP)3Dpointcloudregistrationalgorithmbasedonpointcloudfilteringandadaptivefireworksf

orcoarseregistration. International Journal of Remote Sensing 41.8 (2020): 3197-3220.

105 Deng H, Birdal T, Ilic S. Ppfnet: global context aware local features for robust 3d point matching.


106 Georgakis G, Karanam S, Wu Z, et al. End-to-end learning of keypoint detector and descriptor for pose

invariant 3D matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018:00:00 1965-1973.

107 Yew Z J, Lee G H. 3dfeat-net: weakly supervised local 3D features for point cloud registration. European

Conference on Computer Vision. Springer, Cham, 2018:00:00 630-646.

108 Deng H, Birdal T, Ilic S. 3D local features for direct pairwise registration. Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition. 2019:00:00 3244-3253.

109 AokiY, Goforth H, Srivatsan R A, et al. Pointnetlk: robust & efficient point cloud registration using pointnet.


110 Sarode V, Li X, Goforth H, et al. Pcrnet: point cloud registration network using pointnet encoding. arXiv

preprint arXiv:1908.07906, 2019

111 Aiger D, Mitra N J, Cohen-Or D. 4-Points congruent sets for robust pairwise surface registration. ACM

SIGGRAPH 2008 papers. 2008:00:00 1-10.

112 Mellado N, Aiger D, Mitra N J. Super 4pcs fast global pointcloud registration via smart indexing. Computer

Graphics Forum. 2014, 33(5): 205-215.

113 Che E, Jung J, Olsen M J. Object recognition, segmentation, and classification of mobile laser scanning point

clouds: a state of the art review. Sensors, 2019, 19(4): 810

114 Hackel T, Wegner J D, Schindler K. Fast semantic segmentation of 3d point clouds with strongly varying

density. ISPRS annals of the photogrammetry, remote sensing and spatial information sciences, 2016, 3:00 177-

184.

115 Weinmann M, Jutzi B, Mallet C. Semantic 3d scene interpretation: a framework combining optimal

neighborhood size selection with relevant features. ISPRS Annals of the Photogrammetry, Remote Sensing and

Spatial Information Sciences, 2014, 2(3): 181

116 Hu H, Munoz D, Bagnell J A, et al. Efficient 3-d scene analysis from streaming data. 2013 IEEE

International Conference on Robotics and Automation. IEEE, 2013:00:00 2297-2304.

117 Zhao H, Liu Y, Zhu X, et al. Scene understanding in a large dynamic environment through a laser-based

sensing. 2010 IEEE International Conference on Robotics and Automation. IEEE, 2010:00:00 127-133.

118 Lu Y, Rasmussen C. Simplified markov random fields for efficient semantic labeling of 3D point clouds. 2012

IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012:00:00 2690-2697.

119 Munoz D, Bagnell J A, Vandapel N, et al. Contextual classification with functional max-margin markov

networks. 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009:00:00 975-982.

120 Wu Z, Song S, Khosla A, et al. 3d shapenets: A deep representation for volumetric shapes. Proceedings of

the IEEE conference on computer vision and pattern recognition. 2015:00:00 1912-1920.

121 Qi C R, Su H, Nießner M, et al. Volumetric and multi-view cnns for object classification on 3d data.


122 Maturana D, Scherer S. Voxnet: A 3d convolutional neural network for real-time object recognition. 2015

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015:00:00 922-928.

123 Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3d shape recognition.

Proceedings of the IEEE international conference on computer vision. 2015:00:00 945-953.

124 Leng B, Guo S, Zhang X, et al. 3D object retrieval with stacked local convolutional autoencoder. Signal

Processing, 2015, 112:00:00 119-128.

125 P. Tosteberg. Semantic segmentation of point clouds using deep learning. Patrik Tosteberg, Linkoping, 2017

126 Wu B, Wan A, Yue X, et al. Squeezeseg: convolutional neural nets with recurrent crf for real-time road-

object segmentation from 3d LiDAR point cloud. 2018 IEEE International Conference on Robotics and

Automation (ICRA). IEEE, 2018:00:00 1887-1893.

127 Piewak F, Pinggera P, Schafer M, et al. Boosting LiDAR-based semantic labeling by cross-modal training data

generation. Proceedings of the European Conference on Computer Vision (ECCV). 2018:00:00 0-0.

128 Caltagirone L, Scheidegger S, Svensson L, et al. Fast LiDAR-based road detection using fully convolutional

neural networks. 2017 IEEE intelligent vehicles symposium (iv). IEEE, 2017:00:00 1019-1024.

129 Lawin F J, Danelljan M, Tosteberg P, et al. Deep projective 3d semantic segmentation. International

Conference on Computer Analysis of Images and Patterns. Springer, Cham, 2017:00:00 95-107.

130 Wang, Xinsheng, et al. ArobustsegmentationframeworkforcloselypackedbuildingsfromairborneLiDARpointclouds.

International Journal of Remote Sensing 41.14 (2020): 5147-5165.

131 Guo, Zhou, and Chen-Chieh Feng. Usingmulti-

scaleandhierarchicaldeepconvolutionalfeaturesfor3DsemanticclassificationofTLSpointclouds. International Journal of

Geographical Information Science 34.4 (2020): 661-680.

132 Qi C R, Su H, Mo K, et al. Pointnet: deep learning on point sets for 3d classification and segmentation.


133 Qi C R, Yi L, Su H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space.

Advances in neural information processing systems. 2017:00:00 5099-5108.

134 Zhou Y, Tuzel O. Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition. 2018:00:00 4490-4499.

135 Li Y, Bu R, Sun M, et al. Pointcnn: convolution on x-transformed points. Advances in neural information

processing systems. 2018:00:00 820-830.

136 Wang Y, Sun Y, Liu Z, et al. Dynamic graph cnn for learning on point clouds. ACM Transactions on

Graphics (TOG), 2019, 38(5): 1-12.

137 Huang, R., Hong, D., Xu, Y., Yao, W., & Stilla, U. (2019). Multi-Scale Local Context Embedding for LiDAR

Point Cloud Classification. IEEE Geoscience and Remote Sensing Letters.

138 Tchapmi L, Choy C, Armeni I, et al. Segcloud: Semantic segmentation of 3d point clouds. 2017 international

conference on 3D vision (3DV). IEEE, 2017:00:00 537-547.

139 T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler and M. Pollefeys. Semantic3d.net: A new

Large-scale Point Cloud Classification Benchmark. in ISPRS Annals of the Photogrammetry, Remote Sensing

and Spatial Information Sciences, 2017

140 Riegler G, Osman Ulusoy A, Geiger A. Octnet: Learning deep 3d representations at high resolutions.


141 Riemenschneider H, Bódis-Szomorú A, Weissenberg J, et al. Learning where to classify in multi-view semantic

segmentation. European Conference on Computer Vision. Springer, Cham, 2014:00:00 516-532.

142 Engelmann F, Kontogianni T, Hermans A, et al. Exploring spatial context for 3d semantic segmentation of

point clouds. Proceedings of the IEEE International Conference on Computer Vision Workshops. 2017:00:00

716-724.

143 Landrieu L, Simonovsky M. Large-scale point cloud semantic segmentation with superpoint graphs. Proceedings

of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:00:00 4558-4567.

144 Xu, Y., Ye, Z., Yao, W., Huang, R., Tong, X., Hoegner, L., & Stilla, U. (2019). Classification of LiDAR

Point Clouds Using Supervoxel-Based Detrended Feature and Perception-Weighted Graphical Model. IEEE

Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

145 Bronstein M M, Bruna J, LeCun Y, et al. Geometric deep learning: going beyond euclidean data. IEEE Signal

Processing Magazine, 2017, 34(4): 18-42.

146 Premebida C, Carreira J, Batista J, et al. Pedestrian detection combining rgb and dense LiDAR data. 2014

IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2014:00:00 4112-4117.

147 González A, Villalonga G, Xu J, et al. Multiview random forest of local experts combining rgb and LiDAR

data for pedestrian detection. 2015 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2015:00:00 356-361.

148 Li B, Zhang T, Xia T. Vehicle detection from 3d LiDAR using fully convolutional network. arXiv preprint

arXiv:1608.07916, 2016

149 Chen X, Kundu K, Zhu Y, et al. 3d object proposals for accurate object class detection. Advances in Neural

Information Processing Systems. 2015:00:00 424-432.

150 Yang B, Luo W, Urtasun R. Pixor: Real-time 3d object detection from point clouds. Proceedings of the IEEE

conference on Computer Vision and Pattern Recognition. 2018:00:00 7652-7660.

151 Wang D Z, Posner I. Voting for voting in online point cloud object detection. Robotics: Science and Systems.

2015, 1(3): 10-15607.

152 Engelcke M, Rao D, Wang D Z, et al. Vote3deep: fast object detection in 3d point clouds using efficient

convolutional neural networks. 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE,

2017:00:00 1355-1361.

153 Chen X, Ma H, Wan J, et al. Multi-view 3d object detection network for autonomous driving. Proceedings of

the IEEE Conference on Computer Vision and Pattern Recognition. 2017:00:00 1907-1915.

154 Li B. 3d fully convolutional network for vehicle detection in point cloud. 2017 IEEE/RSJ International


155 Maturana D, Scherer S. 3d convolutional neural networks for landing zone detection from LiDAR. 2015 IEEE

international conference on robotics and automation (ICRA). IEEE, 2015:00:00 3471-3478.

156 Shi S, Wang X, Li H. Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings

of the IEEE Conference on Computer Vision and Pattern Recognition. 2019:00:00 770-779.

157 Qi C R, Litany O, He K, et al. Deep hough voting for 3d object detection in point clouds. Proceedings of

the IEEE International Conference on Computer Vision. 2019:00:00 9277-9286.

158 Lang A H, Vora S, Caesar H, et al. Pointpillars: fast encoders for object detection from point clouds.


159 Yi C , Zhang Y , Wu Q , et al. Urban building reconstruction from raw LiDAR point data. Computer-Aided

Design, 2017:S0010448517301331.

160 Zhou Z, Gong J. Automated analysis of mobile LiDAR data for component‐level damage assessment of

building structures during large coastal storm events. Computer‐Aided Civil and Infrastructure Engineering,

2018, 33(5): 373-392.

161 Goebbels S, Pohle-Fröhlich R. Quality enhancement techniques for building models derived from sparse point

clouds. VISIGRAPP (1: GRAPP). 2017:00:00 93-104.

162 Zhang D, Du P. 3D building reconstruction from LiDAR data based on Delaunay TIN approach. International

Symposium on LiDAR and Radar Mapping 2011:00:00 Technologies and Applications. International Society for

Optics and Photonics, 2011, 8286:00:00 828612

163 Chen L C , Teo T A , Kuo C Y , et al. Shaping polyhedral buildings by the fusion of vector maps and

LiDAR point clouds. Photogrammetric Engineering & Remote Sensing, 2007, 73(9):1147-1157.

164 Xiong B , Jancosek M , Oude Elberink S , et al. Flexible building primitives for 3D building modeling.

ISPRS Journal of Photogrammetry and Remote Sensing, 2015, 101:275-290.

165 Wang Y , Ma Y , Zhu A X , et al. Accurate facade feature extraction method for buildings from three-

dimensional point cloud data considering structural information. ISPRS Journal of Photogrammetry and Remote

Sensing, 2018, 139:146-153.

166 Zhang L , Li Z , Li A , et al. Large-scale urban point cloud labeling and reconstruction. ISPRS Journal of

Photogrammetry and Remote Sensing, 2018, 138:86-100.

167 Díaz-Vilari?o, Lucía, Khoshelham K , Martínez-Sánchez, Joaquín, et al. 3D modeling of building indoor spaces

and closed doors from imagery and point clouds. Sensors, 2015, 15(2):3491-3512.

168 Stambler A, Huber D. Building modeling through enclosure reasoning. 2014 2nd International Conference on

3D Vision. IEEE, 2014, 2:00 118-125.

169 Javanmardi M, Gu Y, Javanmardi E, et al. 3D building map reconstruction in dense urban areas by integrating

airborne laser point cloud with 2D boundary map. 2015 IEEE International Conference on Vehicular

Electronics and Safety (ICVES). IEEE, 2015:00:00 126-131.

170 Zhang L , Zhang L . Deep learning-based classification and reconstruction of residential scenes from large-

scale point clouds. IEEE Transactions on Geoscience and Remote Sensing, 2017:1-11.

171 López, Facundo José, Lerones P M , Llamas, José, et al. A framework for using point cloud data of Heritage

buildings towards geometry modeling in a BIM context: a case study on Santa Maria la Real de Mave

Church. International Journal of Architectural Heritage, 2017:15583058.2017.1325541.

172 Ochmann S, Vock R, Wessel R, et al. Automatic reconstruction of parametric building models from indoor

point clouds. Computers & Graphics, 2016, 54:00:00 94-103.

173 Xiong B, Elberink S O, Vosselman G. Building modeling from noisy photogrammetric point clouds. ISPRS

Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2014, 2(3): 197

174 Hojebri, B., F. Samadzadegan, and H. Arefi. Building reconstruction based on the data fusion of LiDAR point

cloud and Aerial Imagery. (2014): 103-121.

175 Hron V, Halounová L. Automatic Generation of 3d building models from point clouds. Geoinformatics for

Intelligent Transportation. Springer, Cham, 2015:00:00 109-119.

176 Chen Y C , Lin B Y , Lin C H . Consistent roof geometry encoding for 3d building model retrieval using

airborne LiDAR point clouds. ISPRS International Journal of Geo-Information, 2017, 6(9):269-.

177 Zhang Y, Li X, Wang Q, et al. LIDAR point cloud data extraction and establishment of 3d modeling of

buildings. IOP Conference Series: Materials Science and Engineering. IOP Publishing, 2018, 301(1): 012037..

178 Chen J Y , Lin C H , Hsu P C , et al. Point cloud encoding for 3d building model retrieval. IEEE

Transactions on Multimedia, 2014, 16(2):337-345.

179 Demir I, Aliaga D G, Benes B. Procedural editing of 3d building point clouds. Proceedings of the IEEE

International Conference on Computer Vision. 2015:00:00 2147-2155.

180 Teng W , Xiangyun H , Lizhi Y . Fast and accurate plane segmentation of airborne LiDAR point cloud using

cross-line elements. Remote Sensing, 2016, 8(5):383-.

181 Wang C, Hou S, Wen C, et al. Semantic line framework-based indoor building modeling using backpacked

laser scanning point cloud. ISPRS journal of photogrammetry and remote sensing, 2018, 143:00:00 150-166.

182 Chen, D., Zhang, L., Mathiopoulos, P., Huang, X., 2014 A methodology for automated segmentation and

reconstruction of urban 3-D buildings from ALS point clouds. IEEE J. Sel. Top. Appl. Earth Obs. Rem. Sens.

7 (10), 4199–4217.

183 Seif H G, Hu X. Autonomous driving in the iCity—HD maps as a key challenge of the automotive industry.

Engineering, 2016, 2(2): 159-162.

184 Bauer S, Alkhorshid Y, Wanielik G. Using high-definition maps for precise urban vehicle localization. 2016

IEEE 19th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2016:00:00 492-497.

185 Zeng W, Luo W, Suo S, et al. End-to-end Interpretable Neural Motion Planner. Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition. 2019:00:00 8660-8669.

186 Zhang R, Chen C, Di Z, et al. Visual odometry and pairwise alignment for high definition map creation: U.S.

Patent Application 10/309,777. 2019-6-4.

187 Siam M, Elkerdawy S, Jagersand M, et al. Deep semantic segmentation for automated driving: Taxonomy,

roadmap, and challenges. 2017 IEEE 20th International Conference on Intelligent Transportation Systems

(ITSC). IEEE, 2017:00:00 1-8.

188 Barsi A, Poto V, Somogyi A, et al. Supporting autonomous vehicles by creating HD maps. Production

Engineering Archives, 2017, 16

189 Ma L, Li Y, Li J, et al. Mobile laser scanned point-clouds for road object detection and extraction: A review.

Remote Sensing, 2018, 10(10): 1531

190 Wu F, Wen C, Guo Y, et al. Rapid localization and extraction of street light poles in mobile LiDAR point

clouds: A supervoxel-based approach. IEEE Transactions on Intelligent Transportation Systems, 2016, 18(2):

292-305.

191 Hata A Y, Osorio F S, Wolf D F. Robust curb detection and vehicle localization in urban environments. 2014

IEEE Intelligent Vehicles Symposium Proceedings. IEEE, 2014:00:00 1257-1262.

192 Guan H, Li J, Yu Y, et al. Using mobile laser scanning data for automated extraction of road markings.

ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 87:00:00 93-107.

193 Riveiro B, González-Jorge H, Martínez-Sánchez J, et al. Automatic detection of zebra crossings from mobile

LiDAR data. Optics & Laser Technology, 2015, 70:00:00 63-70.

194 Yang B, Dong Z, Liu Y, et al. Computing multiple aggregation levels and contextual features for road

facilities recognition using mobile laser scanning data. ISPRS Journal of Photogrammetry and Remote Sensing,

2017, 126:00:00 180-194.

195 Wang H, Luo H, Wen C, et al. Road boundaries detection based on local normal saliency from mobile laser

scanning data. IEEE Geoscience and remote sensing letters, 2015, 12(10): 2085-2089.

196 Wang H, Cai Z, Luo H, et al. Automatic road extraction from mobile laser scanning data. 2012 International

Conference on Computer Vision in Remote Sensing. IEEE, 2012:00:00 136-139.

197 Rachmadi R F, Uchimura K, Koutaki G, et al. Road edge detection on 3d point cloud data using encoder-

decoder convolutional network. 2017 International Electronics Symposium on Knowledge Creation and

Intelligent Computing (IES-KCIC). IEEE, 2017:00:00 95-100.

198 Chen X, Kohlmeyer B, Stroila M, et al. Next generation map making: georeferenced ground-level LIDAR

point clouds for automatic retro-reflective road feature extraction. Proceedings of the 17th ACM SIGSPATIAL

International Conference on Advances in Geographic Information Systems. ACM, 2009:00:00 488-491.

199 Yu Y, Li J, Guan H, et al. Learning hierarchical features for automated extraction of road markings from 3-d

mobile LiDAR point clouds. IEEE Journal of Selected Topics in Applied Earth Observations and Remote

Sensing, 2014, 8(2): 709-726.

200 Jung J, Che E, Olsen M J, et al. Efficient and robust lane marking extraction from mobile LiDAR point

clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 147:00:00 1-18.

201 Wen C, Li J, Luo H, et al. Spatial-related traffic sign inspection for inventory purposes using mobile laser

scanning data. IEEE Transactions on Intelligent Transportation Systems, 2015, 17(1): 27-37.

202 Arcos-GarcíaÁ, Soilán M, Álvarez-García J A, et al. Exploiting synergies of mobile mapping sensors and deep

learning for traffic sign recognition systems. Expert Systems with Applications, 2017, 89:00:00 286-295.

203 Huang P, Cheng M, Chen Y, et al. Traffic sign occlusion detection using mobile laser scanning point clouds.

IEEE Transactions on Intelligent Transportation Systems, 2017, 18(9): 2364-2376.

204 Yu Y, Li J, Wen C, et al. Bag-of-visual-phrases and hierarchical deep models for traffic sign detection and

recognition in mobile laser scanning data. ISPRS journal of photogrammetry and remote sensing, 2016,

113:00:00 106-123.

205 Previtali M, Díaz-Vilariño L, Scaioni M. Towards automatic reconstruction of indoor scenes from incomplete

point clouds: Door and window detection and regularization. ISPRS TC-4 Mid-term Symposium 2018 2018,

42(4): 507-514.

206 Tran H, Khoshelham K, Kealy A, et al. Shape grammar approach to 3d modeling of indoor environments

using point clouds. Journal of Computing in Civil Engineering, 2018, 33(1): 4018055

207 Shi W, Ahmed W, Li N, et al. Semantic geometric modeling of unstructured indoor point cloud. ISPRS

International Journal of Geo-Information, 2019, 8(1): 9

208 Xiao Y, Taguchi Y, Kamat V R. Coupling point cloud completion and surface connectivity relation inference

for 3d modeling of indoor building environments. Journal of Computing in Civil Engineering, 2018, 32(5):

4018033

209 Díaz-Vilariño L, Verbree E, Zlatanova S, et al. Indoor modelling from SLAM-based laser scanner: door

detection to envelope reconstruction. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci, 2017, 42:00:00 345-

352.

210 Oesau S, Lafarge F, Alliez P. Indoor scene reconstruction using feature sensitive primitive extraction and

graph-cut. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 90:00:00 68-82.

211 OchmannS, Vock R, Klein R. Automatic reconstruction of fully volumetric 3d building models from oriented

point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 151:00:00 251-262.

212 Li L, Su F, Yang F, et al. Reconstruction of three-dimensional (3d) indoor interiors with multiple stories via

comprehensive segmentation. Remote Sensing, 2018, 10(8): 1281

213 Sanchez V, Zakhor A. Planar 3d modeling of building interiors from point cloud data. 2012 19th IEEE

International Conference on Image Processing. IEEE, 2012:00:00 1777-1780.

214 Budroni A, Boehm J. Automated 3d reconstruction of interiors from point clouds. International Journal of

Architectural Computing, 2010, 8(1): 55-73.

215 Furukawa Y, Curless B, Seitz S M, et al. Reconstructing building interiors from images. 2009 IEEE 12th

International Conference on Computer Vision. IEEE, 2009:00:00 80-87.

216 Khoshelham K, Díaz-Vilariño L. 3D modeling of interior spaces: Learning the language of indoor architecture.

The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2014, 40(5):

321

217 Previtali M, Díaz-Vilariño L, Scaioni M. Indoor building reconstruction from occluded point clouds using

graph-cut and ray-tracing. Applied Sciences, 2018, 8(9): 1529

218 Kim S, Manduchi R. Multi-planar monocular reconstruction of manhattan indoor scenes. Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2019:00:00 30-33.

219 Michailidis G T, Pajarola R. Bayesian graph-cut optimization for wall surfaces reconstruction in indoor

environments. The Visual Computer, 2017, 33(10): 1347-1355.

220 Jung J, Stachniss C, Ju S, et al. Automated 3d volumetric reconstruction of multiple-room building interiors

for as-built BIM. Advanced Engineering Informatics, 2018, 38:00:00 811-825.

221 Quintana B, Prieto S A, Adán A, et al. Door detection in 3d coloured point clouds of indoor environments.

Automation in Construction, 2018, 85:00:00 146-166.

222 Previtali M, Barazzetti L, Brumana R, et al. Towards automatic indoor reconstruction of cluttered building

rooms from point clouds. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences,

2014, 2(5).

223 Díaz-Vilariño L, Khoshelham K, Martínez-Sánchez J, et al. 3D modeling of building indoor spaces and closed

doors from imagery and point clouds. Sensors, 2015, 15(2): 3491-3512.

224 Díaz-Vilariño L, Boguslawski P, Khoshelham K, et al. Obstacle-aware indoor pathfinding using point clouds.

ISPRS International Journal of Geo-Information, 2019, 8(5): 233

225 Nikoohemat S, Peter M, Elberink S O, et al. Exploiting indoor mobile laser scanner trajectories for semantic

interpretation of point clouds. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information

Sciences, 2017, 4

226 Sun L, Yao L, Rong J, et al. Simulation analysis on driving behavior during traffic sign recognition.

International Journal of Computational Intelligence Systems, 2011, 4(3): 353-360.

227 Li N, Busso C. Predicting perceived visual and cognitive distractions of drivers with multimodal features.

IEEE Transactions on Intelligent Transportation Systems, 2014, 16(1): 51-65.

228 Lyu N, Xie L, Wu C, et al. Driver's cognitive workload and driving performance under traffic sign

information exposure in complex environments: A case study of the highways in China. International journal of

environmental research and public health, 2017, 14(2): 203

229 Motamedi A, Wang Z, Yabuki N, et al. Signage visibility analysis and optimization system using BIM-enabled

virtual reality (VR) environments. Advanced Engineering Informatics, 2017, 32:00:00 248-262.

230 Li L, Zhang Q. Research on Visual cognition about sharp turn sign based on driver's eye movement

characteristic. International Journal of Pattern Recognition and Artificial Intelligence, 2017, 31(07): 1759012

231 Bohua L I U, Lishan S U N, Jian R. Driver's visual cognition behaviors of traffic signs based on eye

movement parameters. Journal of Transportation Systems Engineering and Information Technology, 2011, 11(4):

22-27.

232 Belaroussi, R.; Gruyer, D. Impact of reduced visibility from fog on traffic sign detection. In Proceedings of

the 2014 IEEE Intelligent Vehicles Symposium Proceedings, Dearborn, MI, USA, 8–11 June 2014; pp. 1302–

1306.

233 Doman, K.; Deguchi, D.; Takahashi, T.; Mekada, Y.; Ide, I.; Murase, H.; Sakai, U. Estimation of traffic sign

visibility considering local and global features in a driving environment. In Proceedings of the 2014 IEEE

Intelligent Vehicles Symposium Proceedings, Dearborn, MI, USA, 8–11 June 2014; pp. 202–207.

234 Doman, K.; Deguchi, D.; Takahashi, T.; Mekada, Y.; Ide, I.; Murase, H.; Tamatsu, Y. Estimation of traffic

sign visibility toward smart driver assistance. In Proceedings of the 2010 IEEE Intelligent Vehicles Symposium

(IV), San Diego, CA, USA, 21–24 June 2010; pp. 45–50.

235 Doman, K.; Deguchi, D.; Takahashi, T.; Mekada, Y.; Ide, I.; Murase, H.; Tamatsu, Y. Estimation of traffic

sign visibility considering temporal environmental changes for smart driver assistance. In Proceedings of the

2010 IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany, 5–9 June 2011; pp. 667–672.

236 Balsa-Barreiro J, Valero-Mora P M, Berné-Valero J L, et al. GIS mapping of driving behavior based on

naturalistic driving data. ISPRS International Journal of Geo-Information, 2019, 8(5): 226

237 Balsa-Barreiro J, Valero-Mora P M, Montoro I P, et al. Georeferencing naturalistic driving data using a novel

method based on vehicle speed. IET Intelligent Transport Systems, 2013, 7(2): 190-197

238 Lee, Jiwon, and Ji Hyun Yang. Analysisofdriver'sEEGgiventake-

overalarminSAElevel3automateddrivinginasimulatedenvironment. International journal of automotive technology

21.3 (2020): 719-728.

239 Katz S, Tal A, Basri R. Direct visibility of point sets. ACM SIGGRAPH 2007 papers. 2007:00:00 24-es.

240 Katz S, Tal A. Improving the visual comprehension of point sets. Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition. 2013:00:00 121-128

241 Zhang S, Wang C, Lin L, et al. Automated visual recognizability evaluation of traffic sign based on 3d

LiDAR point clouds. Remote Sensing, 2019, 11(12): 1453

urban 3d modeling with mobile laser scanning a...

Documents