surface classification from airborne laser scanning data

ARTICLE IN PRESS

0098-3004/$ - se

doi:10.1016/j.ca

$Paper presen

April 2002.$$Editorial

E-mail addr1Present add

Engineering, T

32000, Israel.

Computers & Geosciences 30 (2004) 1033–1041

www.elsevier.com/locate/cageo

Surface classification from airborne laser scanning data$,$$

Sagi Filin1

Faculty of Aerospace Engineering, Photogrammetry and Remote Sensing Section, Delft University of Technology, Delft, The Netherlands

Accepted 18 July 2004

Abstract

The high point density of airborne laser mapping systems enables achieving a detailed description of geographic

objects and of the terrain. Growing experience shows, however, that extracting information directly from the data is

practically impossible. This applies to basic tasks like Digital Elevation Model (DEM) generation and to more involved

ones like the extraction of objects or generation of 3D city models. This paper presents an algorithm for surface

clustering and for identifying structure in the laser data. The proposed approach concerns analyzing the surface texture,

and via unsupervised classification identifying segments that exhibit homogeneous behavior. Clustering involves

analysis of several key issues in relation to processing laser data such as different point densities, processing an

irregularly distributed data set, analysis of attributes that can be derived from the data set, and ways to extract

attributes. This paper provides a detailed discussion of these issues as well.

r 2004 Elsevier Ltd. All rights reserved.

Keywords: LiDAR; Classification; Segmentation; Data processing

1. Introduction

Laser altimetry has emerged in recent years as a

leading technology for the extraction of information of

physical surfaces. The ever increasing point density of

current airborne systems enables achieving a detailed

description of the surveyed surfaces and provides a

wealth of information on physical objects and on the

terrain. As a result an increasing number of GIS

applications use LiDAR data, usually for generation

of Digital Surface Models (DSM) or related applications

like detection of urban changes or monitoring environ-

e front matter r 2004 Elsevier Ltd. All rights reserve

geo.2004.07.009

ted at the GIS Research Conference, Sheffield,

handing by Steve Wise

ess: [email protected] (S. Filin).

ress: Faculty of Civil and Environmental

echnion-Israel Institute of Technology, Haifa

mental phenomena (Murakami et al., 1999; Thomas et

al., 2000; Thoma et al., 2001). For urban mapping, the

detailed surface description can be used for generation

of 3D city models and thus serve a variety of

applications, ranging from urban planning and tele-

communication to real estate management and taxation.

The growing experience with processing the data shows,

however, that deriving products directly from the data is

practically impossible and that the extraction of

geographic information must identify the structure in

the data as a prerequisite step. As an example, consider

the generation of a surface or a terrain model. Terrain

model generation requires identifying and removing

points that were reflected from non-terrain objects, and

with surface models reflections from non-surface points,

e.g., cars, tree branches or trunks should be identified

and treated further. While one application concerns

removing objects and others identifying them, a

common thread in all is the need to introduce a level

of interpretation prior to extracting information or

d.

www.elsevier.com/locate/cageo

ARTICLE IN PRESSS. Filin / Computers & Geosciences 30 (2004) 1033–10411034

products from the data. Structure in the data can be

defined in several levels but the fundamental one is

grouping the laser points into segments with consistent

attributes. Grouping the data into consistent structure is

beneficial for other reasons as well; summarizing the

data reduces its volume significantly and enables

encoding the information it carries in a very compact

way; grouping also offers a very natural structure to

organize the data, and by emphasizing consistent

attributes, it reduces the overall effect of the noise

compared to the one of the signal.

An indication of the importance associated with the

process is the number of algorithms that were developed

for segmentation of range data. However, the majority

were proposed for close-range applications, e.g., Besl

(1988); Girod et al. (2000); Koster and Spann (2000) or

others referred to in the review by Hoover et al. (1996).

Close-range applications are usually used for reverse

engineering of objects that have a well-defined smooth

shape, so the focus is on finding a partition of the data

into disjoint smooth segments. Airborne laser scanning

offers more complex data consisting of a variety of

phenomena. Yet, the majority of the reported algo-

rithms focus on the specific class of planar surfaces (see,

e.g., Lee and Schenk, 2001; Roggero, 2001; Vosselman,

2001) mostly in relation with the extraction of roof

facets for building extraction. By identifying only one

type of objects in the data set these algorithms fail in

coping with complex building shapes, or mixture of

vegetation and buildings; they also lack the generality

required in associating the laser points in the data set

with a segment (the essence of data segmentation). Maas

(1999) and Oude Elberink and Maas (2000) propose

segmentation algorithms for a rasterized and quantized

version of the range data (i.e., range images). The

authors use height texture measures to identify classes in

the data. Their algorithms classify the data by attaching

a label to each pixel but practically do not provide

surface segments. So, while an association of the data

with classes is achieved, identifying structure in the data

is not guaranteed at all.

This paper presents an algorithm for identifying

structure in LiDAR data by point clustering and surface

classification. The goal is to identify point clusters with

homogeneous attributes. Homogeneity refers here to the

consistency of some data properties such as elevation,

trend or texture. Clustering, a label for procedures

aimed at grouping data into homogeneous patterns,

usually without an explicit a priori definition for the

patterns, offers flexibility in accommodating spatial

relation and data attributes together. As a classification

methodology it enables incorporating different cues into

the process in a very natural way. Clustering can be seen

as a combination of two processes—identifying patterns

in the data based on attributes and grouping the data

into clusters. Attributes capture the information em-

bedded in the data and should ensure sufficient

separation among classes. The structure that is identified

by the attributes should be coherent and spatially

meaningful; it should avoid an algorithmic tendency

for oversegmentation that forms small groups that are

very homogeneous but have little spatial context.

Processing LiDAR data offers some complexities. The

data itself consist of by nature a set of irregularly

distributed points (sometimes referred to as scattered

surface data) that carry only a limited amount of

information, namely, their x, y, and z coordinates. The

spatial point distribution may vary from one system to

another and the same applies to the point density that

cannot be assumed fixed. The algorithm that is

presented here copes with varying point density and

operates on the laser points directly without raster-

ization or other preliminary processing that introduces

unnecessary distortions. So, in addition to the algorith-

mic concerns, an adaptation of image-processing con-

cepts to the irregular pattern becomes necessary. With

the aim of identifying structure in the data in mind, the

proposed algorithm is general and can be applied in a

variety of applications.

The paper is organized as follows. Section 2 presents

the clustering algorithm. First the surface categories are

defined and in Section 2.1 the selected features to classify

the data are discussed. Following in Section 2.2 the

proposed method for identifying surface classes is

presented. Section 3 presents the clustering algorithm

as a whole, and Section 4 provides discussion and

results. As the applicability of LiDAR data for the

extraction of semantic information is perhaps best

manifested with the extraction of objects in cluttered

areas such as urban scenes, where image interpretation

systems usually fail, results are, therefore, presented in

relation to urban data.

2. Surface clustering

Data clustering can be defined as an exhaustive

partition of the data into disjoint regions, each with a

homogeneous property. More formally, let R={pi | i=1,

2,y,N} be the range data with pi the laser points and N

the number of points in the data set, and S={S1,

S2,y,SK} the data clusters where Si={pi1, pi2,y,piNi}

are the segments defined by a collection of laser points

and k the number of segments. Then, the clusters should

fulfill the following properties:

R ¼[k

i¼1

Si and Si \ Sj ¼ 0 8iaj: (1)

A cluster Si is assumed to be an instantiation of a

more generic process, which in the current case is defined

as a surface category. One surface category can

ARTICLE IN PRESS

Table 1

Surface categories vs. attributes

Category Height

difference

Surface trend Curvature

High vegetation Large Rapidly varying High

Low vegetation Medium Rapidly varying High

Smooth surface Small Locally constant Varying

Planar surface Small Fixed Zero

Fig. 1. A plane in 3D space.

S. Filin / Computers & Geosciences 30 (2004) 1033–1041 1035

instantiate several surface segments. The current im-

plementation specifies four different surface categories,

(i) forested/wooded areas, (ii) low vegetation areas and

rough surfaces, (iii) smoothly varying surfaces, and (iv)

planar surfaces. Surfaces refer here to the interpretation

of the data obtained by a laser scanning system. The

categories present one interpretation of the laser surface

and are not aimed at providing a topographic structure

of the terrain, mainly since the acquired data is not the

terrain itself. The first two categories refer, in general, to

two different types of vegetation, but the distinction

between the two is also made here for another reason.

Objects in these categories may require different

treatment when being filtered out or edited further.

The ‘‘low vegetation’’ category may include some other

objects like vehicles, or rough terrain that cannot be

distinguished from low vegetation by ranging at them;

they are therefore considered as one group. Planar

surfaces, a sub-class of smooth surfaces, are separated

here because of the tendency of man-made objects to

have planar facets; such information is valuable for

other applications. The following section discusses the

attributes that are selected to distinguish among the

surface categories and surfaces within each category.

2.1. Surface attributes

Analysis of LiDAR data relies by nature on attributes

that measure surface texture, these measures should be

sufficient to differentiate among surface categories and

among surfaces within each category. A literature review

shows a variety of measures that were proposed for

range data segmentation. Among them is the analysis of

the height histogram within a window, analysis of

variation in the surface normal direction (Flynn and

Jain, 1991), or the analysis of the second derivatives

(Axelsson, 1999). Besl (1988) and others propose the

segmentation of range data by the analysis of the mean

and Gaussian curvature. Maas (1999) uses a feature

vector consisting of the Laplacian, maximum slope and

the pixel height for classifying the data. Surface texture

can be measured, in general, in terms of height variation,

variations in the surface trend, and variation in surface

curvature. Table 1 lists the expected behavior of these

attributes for the proposed categories. The attributes in

Table 1 indicate that texture measures that are based on

height differences and surface trend are sufficient to

define the four categories and to distinguish them from

one another. These attributes are qualitative and not

strict but they indicate that a separation of these four

categories is possible. Their modeling is in essential part

of the algorithm.

Based on the analysis of the surface categories the

following measures are being used for the clustering

algorithm—the point position, the parameters of the

tangent plane to the point, and the relative height

difference between the point and its neighbors. Together

they form a 7-tuple attribute vector vi :={xi, yi, zi, fi, yi,

ri, di} for each laser point, with xi, yi, zi, the laser

point coordinates, di the height difference of the point

to its neighbors, and fi, yi, ri, the surface parameters

in a polar representation. A polar notation enables

describing all planes (including vertical ones) with

only three parameters and avoiding the use of tangents,

which do not behave linearly as the surface slope

increases. In polar representation a surface can be

written as

cos y cos fx þ sin y cos y þ sin fz þ r ¼ 0: (2)

With the surface normal direction given in Eq. (3)

(see also Fig. 1)

~n ¼ RzðfÞRyðyÞ

1

0

0

264

375 ¼

cos y cos f

sin y cos f

sin f

264

375: (3)

The inclusion of the point position as an attribute in

the feature vector may seem odd, but as the goal is to

find homogeneous clusters in the data, proximity of

points with similar texture measures is a significant

attribute in clustering the data. The feature-space can be

ARTICLE IN PRESS

Fig. 2. Potential inseparability of surfaces based on height

differences.

S. Filin / Computers & Geosciences 30 (2004) 1033–10411036

viewed as an attachment of a 4D feature vector

consisting of the height and tangent plane parameters

for each laser point.

The attributes contribution to the separation of

clusters can be interpreted as follows. Height differences

measure local variation and are expected to be reliable

up to the level of noise in the data. They capture the

existence of step edges and emulate in a way the effect of

an edge operator in raster data. Consequently, height

differences enhance the separation of clusters and

provide an adequate indication for the existence of high

vegetation as well. However, height differences are

insufficient for separating vegetation from buildings if

both are in close proximity, or for separating smooth

surfaces as the example in Fig. 2 demonstrates. The

tangent plane parameters measure the surface normal

direction and the plane position in space. Surface

normals capture first-order discontinuities and model

the existence of crease edges. They enhance the

separation among surfaces with different slopes, see

Fig. 2. The surface constant enables separating surfaces

with similar slopes, for example separating the ground

and flat horizontal rooftops. It shares some correlation

with the height difference, but a plane constant refers to

an infinite plane and is a rather global measure whereas

the latter measures the difference between neighboring

points, and is rather local.

2.2. Surface texture analysis

Based on the feature vector that has been defined, the

goal is to identify consistent structure in the data. It is

common to measure surface texture by analyzing the

attribute variation within a given neighborhood (usually

a window) around each point and to classify the point

based on this analysis. However, this approach is in

essence a point classification method and is rather

restrictive in generalizing into a clustering algorithm. As

a window based method, this approach also tends to fail

in classifying data in inhomogeneous situations, for

example around edges and building corners where

different processes are covered by the window; it also

depends on the window size. As a result, misclassifica-

tion of data is inevitable and structure is not recon-

structed completely.

To avoid these effects the approach taken here uses

direct classification of the attributes in a feature-space.

In the feature-space each point is represented by its

feature vector, where the values of the feature vector

determine the coordinates of the laser point in that

space. Clusters are then identified according to proxi-

mity of points in the feature-space. It is possible to

extract clusters directly from the 7D space by methods

like the K-means algorithm or others (see e.g., Flynn

and Jain, 1991), but the dimensionality of the problem

and the iterative fashion in which this algorithm works,

slow the process. Reduction of the dimensionality and

simplification of the search for classes is achieved by the

separation of the attributes from the 7D feature vector,

thereby removing the positional component, and creat-

ing a 4D attribute-space consisting of the surface

parameters and the height differences. The clustering

process is now separated into two parts. First, surface

classes are detected in the 4D attribute-space, and then

laser points that are part of a class are grouped in object-

space according to a neighborhood definition. Neigh-

borhood is defined here by the topology of the Delaunay

triangulation of the laser points; alternative neighbor-

hood definitions can be used as well (see e.g., Chaud-

huri, 1996). The relation between a surface class in

the attribute-space and a cluster in object-space is

not necessarily one-to-one since the attribute-space does

not contain positional information. As clusters are

defined by their attributes and by the neighborhood

relation in object-space a unique definition of a cluster

in object-space is achieved. One characteristic feature

of the attribute-space is that smooth surfaces tend

to cluster in this space but ‘‘vegetation’’ surfaces

(categories (i) and (ii) in Table 1) do not. ‘‘Vegetation’’

surfaces are defined by their lack of consistency, and

are identified by analyzing the unclustered points. The

surface attributes that are used here, in particular

the surface normals, enhance the tendency of vegetation

not to cluster. One consequence is that vegetation

and structured surfaces are unlikely to be grouped

together. Table 1 indicates that the dominant attribute

that separates high vegetation from low vegetation is

the height difference. Clustering points of these two

classes is therefore conducted by analyzing the unclus-

tered points (large slope variation) according to

their height distance and graph connectivity,

although in mixed areas such separation may not be

possible.

The extraction of point clusters of surface classes is

not the end result of the process. A validation and

refinement phase follows to ensure that the clusters refer

to actual physical objects. Then, an evaluation of the

clusters with respect to their neighborhood should take

place before they wear their final shape. The extracted

clusters are, therefore, considered surface proposals. The

following section presents the clustering algorithm as a

whole and elaborates on key issues that refer to the data

processing and analysis.

ARTICLE IN PRESSS. Filin / Computers & Geosciences 30 (2004) 1033–1041 1037

3. The clustering algorithm

Based on the extracted features and the formation of

the attribute-space the clustering can be described by the

algorithm listed in Table 2.

The algorithm is data-driven and region based. The

first phase that involves the formation of the clusters

combines two parts—generation of cluster proposals

from the attribute-space and validation of the proposed

clusters. Surface classes are detected by a mode seeking

algorithm (Haralick and Shapiro, 1992). The attribute-

space is discretized as a function of the expected

uncertainty in feature estimation due to noise in the

data. Laser points that contribute to the mode and

the surrounding region are collected and analyzed. The

analysis consists of linking the laser points into clusters

by region growing and then validating the consistency of

the clusters. Validation of the smooth clusters is

performed by a local plane fitting and an analysis of

the results, extension to general smooth surfaces follows

in a later phase. The surface extraction phase terminates

when no significant modes are detected.

3.1. Attribute computation

Noise or outliers may have a significant effect on the

values of the computed attributes and in turn affect the

analysis of the attribute-space and the clusters. The

noise effect on the attributes is influenced by the point

density—high point density provides a well-defined

surface but, on a local scale, a noisy one, lower point

density attenuates the noise effect but leads to a less

detailed surface description. The computation of the

features and the cluster analysis are, therefore, governed

in large by the existence of noise and outliers in the data.

Outliers are defined here as points that statistically do

not belong to their neighborhood. The term outlier may

be misleading since some of these points are in fact

reflected from physical objects (e.g., power lines or

Table 2

Clustering algorithm

1. Compute attributes di and fi,yi,ri, 8 laser points

2. Generate an attribute-space

3. Propose a surface class and identify points associated with the clas

4. Group points according to the neighborhood system, and compute

5. for each group

6. if the group size p 3 points then dismiss group else compute surfa

7. if s4a predefined threshold then

8. Test for the existence of outliers

9. Test for the existence of more than one class and split if need

10. endfor

11. repeat steps 3–10 until no meaningful surfaces are proposed

12. Extend each cluster based on its attributes until no further points

13. Merge clusters if they share similar attributes

14. Analyze and group unclassified points based on height variation

poles). The test is performed here by using the t-statistics

where the point is compared to its neighborhood. Since

the topological neighborhood (defined by the triangula-

tion of the point) may be insufficient, the neighborhood

is extended. This analysis is unlikely to classify break-

lines or corners as outliers since the standard deviation

(std.) is initially high. The surface parameters are

computed based on the neighborhood of the point. In

considering the neighborhood, this computation be-

comes a geometric implementation of a low-pass filter

integrated into the computation of the first derivatives.

With high point density such as few points per square

meter, the neighborhood for computing the attributes of

the points is extended.

3.2. Cluster validation

Validation of the proposed clusters concerns testing

whether the cluster is homogeneous and indeed com-

posed of only one surface class, and if that is so,

validating that all points in the cluster belong to the

same class. It is possible (and happens indeed) that due

to smoothing, points that do not belong to the class

(or that are marginal) obtain attributes that are similar

to their neighbors, or that points that belong to two

neighboring surfaces with similar surface attributes are

grouped together. The algorithm handles the two cases

as follows. The null-hypothesis assumes that the cluster

represents only one class. Therefore, the existence of

outliers is tested first. Outliers are detected via a

normalized residuals analysis. Instead of the std. the

median deviation, a measure that is more robust to the

existence of outliers (Rousseeuw and Leroy, 1987) is

used. It is noted that robust methods for detecting up to

50% of outliers, like the least-median-of-squares (Rous-

seeuw and Leroy, 1987) exist, but as they are essentially

greedy algorithms they are very slow. In general serious

outliers are already filtered out in attribute-space, so the

situation is by nature more controlled and leads the

s

attributes (see text), in particular the estimated std. s

ce attributes for the group

ed

can be added, or another cluster has been reached

ARTICLE IN PRESSS. Filin / Computers & Geosciences 30 (2004) 1033–10411038

simplified algorithm to work well. Failure is an

indication that the cluster may be composed of more

than one surface and therefore should be split. Some

methods such as refinement of the clustering to the given

set of points were tested for resolving this ambiguity,

however, results showed some sensitivity to the point

density and the shape of the clusters in question. An

algorithm that has proven to be robust to such cases has

its foundations in the random sampling consensus

algorithm (RANSAC) (Fischler and Bolles, 1981), but

while the original RANSAC has a combinatorial

computational complexity, the one proposed here is

linear. The original RANSAC algorithm evaluates a

small subset of the data sufficient to establish a solution

and then adds points from the data set that agree with

this solution. The algorithm evaluates sufficient combi-

nations to ensure that there is a 95% chance that the

correct solution will be identified. Here, the solutions

(i.e., planar surfaces) that are being probed are the

tangent planes to a point, which are computed by the

point and its neighbors. The rest of the implementation

follows the original RANSAC algorithm. Points within

the cluster that meet a predefined accuracy criteria are

being added to the proposed sub-cluster until no more

points can be added further. The sub-cluster that has the

most number of points is defined as a cluster, and the

remaining points are then analyzed in the same fashion

to identify remaining clusters.

3.3. Surface growing and merging

Extension of the cluster to neighboring points is only

a natural step, as usually the cluster constructed from

the attribute-space will not include border points.

Inclusion of points is carried out by testing whether

the point originates from the same distribution as the

cluster. Testing is performed by the t-statistics. Points

near crease edges may be associated with several

neighboring surfaces; they are, therefore, marked

ambiguous.

Merging clusters involve testing whether the clusters

share similar statistics. The F-test is a direct way to test

the change in the variance of a merged cluster with

respect to the original ones. Using sequential least-

squares techniques this computation can be carried out

quite efficiently. As has been noted, under the assump-

tion that smooth surfaces can be approximated locally

by planar surfaces the extracted clusters are validated by

local plane fitting. The merging phase enables testing

whether the clusters are part of a more global smooth

surface, which is approximated by a quadratic surface. If

the null-hypothesis that the two clusters are part of one

planar surface fails, an alternative test is formulated in

which the null-hypothesis is that the clusters are part of

one smooth surface. The approximation into local

planar surfaces and testing for smoothness in object–

space enables circumventing two major problems that

are usually associated with segmentation algorithms.

One is identifying smooth surface directly from the

attribute-space, and the other is over-parameterization

of the estimated surface.

As a global test the F-test is less sensitive to annexing

small clusters to larger ones. Cases where not all points

are in the annexed cluster are indeed part of the newly

formed surface are possible. A decision has to be made

then whether to reject the merging, accept the merging

and remove the ‘‘outlier’’ points, or to split the cluster. If

the result of the global test has indicated that both

clusters are part of one surface a subsequent test is

carried out, in which the ratio between the number of

‘‘inliers’’ to the original number of points exceeds a

discriminating value. More formally, if the null-hypoth-

esis in testing the clusters Si and Sj to be part of the same

surface has been accepted then the following test is

performed:

Tj ¼#fp 2 Si\jjp 2 Sjg

#fp 2 Sjg4q; (4)

where p the laser points, Si\j the joint cluster, #f gnumber of elements in the set, and q the discriminant

value. The discriminant value was set to 60% of the

points for clusters of more 10 points and a more liberal

value for clusters that have less then 10 points. The

motivation for selecting high value for larger clusters

(practically implying that the majority of the points in a

cluster should belong to the merged cluster) is to prevent

the formation of unnecessary partitions of well-defined

clusters. With smaller clusters the motivation is either to

eliminate them or to reduce them as much as possible. If

the number of ‘‘outliers’’ is too small to form a cluster

(and experience shows that this is the common case) the

leftover points are tested for merging with respect to

other neighboring clusters.

4. Discussion and results

The proposed method offers a very natural way to

model variation in the data and to identify homoge-

neous groups and structure. The chosen attributes detect

the existence of step and crease edges, but by searching

for homogeneous groups they also model the within

group consistency without the need to grow a region

from seed points first as is the case with region-growing

algorithms. Structure is obtained directly from the

parameter space and is validated in object–space. Notice

that in contrast to segmentation algorithms that are

based on surface fitting, by modeling essentially the

surface tangents and fitting surfaces locally the algo-

rithm can identify more complex structures. Another

favorable property of the algorithm is that the use of the

attribute-space to identify structure makes the algorithm

ARTICLE IN PRESS

Fig. 3. Clustering results for Stuttgart data (1.5m resolution).

(a) Original range data (intensities are a function of height),

(b,c) clustered data. Bright points are ones that were classified

as part of a smooth or planar surface, gray points are

vegetation points or ones with high elevation variation.

S. Filin / Computers & Geosciences 30 (2004) 1033–1041 1039

independent of measuring surface texture within a

window. In fact, the algorithm analyzes all windows

simultaneously.

Results for testing the algorithm are presented for

data sets with medium to relatively low resolutions,

which are less detailed and considered more difficult to

process. The data sets consist of last return data. In the

implementation of the algorithm the value of the std.

threshold (see the algorithm in Table 2) was set to 15 cm.

To prevent cases of undersegmentation due to an initial

small cluster with a very small std. (which then becomes

difficult to extend), a minimum std. threshold was

defined as well and was set to 5 cm. The lower threshold

was used when testing for inclusion of points in the

cluster in the cluster-growing phase.

The first data set was acquired in the suburban part of

Stuttgart. The spacing is about 1.5m between points.

The first data set describes a scene that consists of

several buildings, smooth ground surface and vegetation

that is close to the buildings. The data set is presented in

Fig. 3a and the results of applying the clustering

algorithm are in Figs. 3b and c. The bright points are

the ones that were classified as part of smooth surface

clusters and the gray ones are points that were classified

as part of vegetation or unclassified points with high

elevation variation. As can be seen, the algorithm

separated successfully the smooth objects, like rooftops

or smooth parts on the ground, from the vegetation,

even in cases where both were close to one another.

Since the vegetation is rather sparse it is difficult to

distinguish between high and low vegetation. Therefore,

they are classified as one structure.

The second data set is also taken from the Stuttgart

data set. The structure of the building roofs here is more

complex, see for example the rightmost building in

Fig. 4a. Comparing Fig. 4a to c, one can see the removal

of several points that were identified as outliers from the

data set. It can also be seen that the algorithm succeeded

in identifying and isolating the planar surfaces of the low

structure at front from the surrounding vegetation. In

Fig. 4c one can see the identification of what seems to be

low vegetation (or alternatively parking cars) along the

road that separated the two building blocks. Identifying

these objects is largely due to the surface trend

attributes.

The third data set has a lower ground spacing of

about 2.5m between points. The data set is acquired

over the Vaihingen area in Germany (Fig. 5). Buildings

here are smaller in size and lower in height; therefore,

finding structures like planar surfaces is more difficult.

The results show that the algorithm managed to identify

successfully the facets of the building at the center of the

scene and also the one at the far right. Considering

the complexity of the shape of the central building and

the point spacing, the results indicate that the algorithm

is capable of identifying fine structures without any

preliminary knowledge of their location. The fact that

the vegetation is generally correctly classified also

suggests that the algorithm does not fall into the trap

of identifying structures when none exist.

The quality of the clusters is analyzed by the standard

deviation of the laser points from the fitted surface. The

minimal size of clusters was set to seven points, which

offers redundancy of four points in plane fitting, and

also refers to the point density and the size of objects in

the Vaihingen data set (in particular roof faces). Results

are summarized in Table 3. The quality of the results is

an indication to the potential quality of information that

can be achieved by LiDAR data. As can be seen from

ARTICLE IN PRESS

Fig. 4. Clustering results for Stuttgart data (1.5m resolution).

(a) Original range data (intensities are a function of height),

(b,c) clustered data. Bright points are ones that were classified

as part of a smooth or planar surface, gray points are

vegetation points or ones with high elevation variation.

Fig. 5. Clustering results for Vaihingen data (2.5m resolution)

(a) Original range data (intensities are a function of height), (b)

clustered data. Bright points are one that were classified as part

of a smooth or planar surface, gray points are vegetation points

or ones with high elevation variation.

Table 3

Accuracy estimate of surfaces clusters

Data set std. range [m] Number of clusters (%)

Stuttgart 1 0oso.05 61

0.05oso.10 38

0.10oso.12 1

Stuttgart 2 0oso.05 66

0.05oso.10 24

0.10oso.13 10

Vaihingen 0oso.05 60

0.05oso.10 34

0.10oso.12 6

S. Filin / Computers & Geosciences 30 (2004) 1033–10411040

Table 3 in all the cases the majority of the clusters had a

std. smaller than 5 cm, which was the minimum thresh-

old that was set. In all these cases a small fraction of

clusters had a std. larger then 10 cm but did not exceed

13 cm even though the upper limit was set at 15 cm. The

results indicate that the cluster proposals manage to

propose natural clusters. The surface fitting accuracy of

the large clusters within all three data sets was below

5 cm. The size of the large clusters was of the order of

several hundreds of points per cluster and in the second

data set the largest one exceeded 1000 points. The

majority of the clusters in the high-accuracy category

had a relatively large number of points per cluster. There

is a direct correlation between the number of points per

cluster and surface quality, so in addition to the data

density the number of points has an effect on the ability

to determine the surface parameters accurately. This

realization was very evident in the Vaihingen data set,

where few of the roof faces clusters had their fitting

accuracy in the third category (10 cmostd.o12 cm),

without much place for improvement by removing

points. It was evident that these points represent a

structure, as they all were part of one roof face, so

dismissing them seemed a wrong decision. As these

objects are very likely to represent a structure in the

data that, due to low point density, cannot be

defined more precisely, these points are considered as a

coarse representation of these objects. The std. value

that is attached to these clusters serves as an indication

for that.

5. Conclusions

The paper presented a methodology for clustering

laser data surfaces. The approach that is taken is

hierarchical in nature, as it defines surface categories

as processes that instantiate the surfaces. Identifying

features that enables to distinguish the categories and

surfaces within each category are perhaps the key to the

successful identification of structure in the data. It was

ARTICLE IN PRESSS. Filin / Computers & Geosciences 30 (2004) 1033–1041 1041

shown that height variation and variation in surface

trend are sufficient to cluster the data. Their incorpora-

tion into the algorithm served both as an edge operator

and as a measure for similarity. Instead of using

standard texture evaluation techniques it was shown

that more natural ways to identify structure and texture

in the data exist. Using the properties of the data and of

the attributes, the analysis of the attribute-space did not

fix the number of clusters as the standard K-means

algorithm does, but implemented an efficient variation

on the attribute-space scanning. Finally, it can be

noticed that the implementation of the clustering

algorithm is based on several ‘‘double checks’’ of

previous results. Surface classes extracted from the

attribute-space are considered only as proposals that

are tested and validated in object-space, so even a

‘‘wrong’’ proposal has little effect on the robustness of

the algorithm. Then, the clusters themselves are eval-

uated with respect to their neighbors and are being

merged to form a more consistent segmentation if

certain criteria are being met. The results show that

even with relatively sparse data sets, a structure can be

identified, alluding to the generality of the algorithm.

References

Axelsson, P., 1999. Processing of laser scanner data—algo-

rithms and applications. Journal of Photogrammetry and

Remote Sensing 54 (2–3), 138–147.

Besl, P.J., 1988 Surface in range image understanding.

Perception Engineering. Springer, New York NY, 339 pp.

Chaudhuri, B.B., 1996. A new definition of neighborhood of a

point in multi-dimensional space. Pattern Recognition

Letters 17 (1), 11–17.

Fischler, M.A., Bolles, R.C., 1981. Random sample consensus:

a paradigm for model fitting with application to image

analysis and automated cartography. Communications of

the Association Computing Machinery 24, 381–395.

Flynn, P.J., Jain, A.K., 1991. BONSAI: 3D object recognition

using constrained search. IEEE Transactions on Pattern

Analysis and Machine Intelligence 13 (10), 1066–1075.

Girod, B., Greiner, G., Niemann, H. (Eds.), 2000. Principles of

3D Image Analysis and Synthesis. Kluwer Academic

Publishers, Dordrecht-Netherlands, 466 pp.

Haralick, R.M., Shapiro, L.G., 1992. Computer and Robot

Vision, vol. 1. Addison-Wesley, Reading, MA, 672 pp.

Hoover, A., Jean-Baptiste, G., Jiang, X., Flynn, P.J., Bunke,

H., Goldgof, D., Bowyer, K., Eggert, D., Fitzgibbon, A.W.,

Fisher, R.B., 1996. An experimental comparison of range

image segmentation algorithms. IEEE Transactions on

Pattern Analysis and Machine Intelligence 18 (7), 673–689.

Koster, K., Spann, M., 2000. MIR: an approach to robust

clustering—application to range image segmentation. IEEE

Transactions on Pattern Analysis and Machine Intelligence

22 (5), 430–444.

Lee, I., Schenk, T., 2000. 3D perceptual organization of laser

altimetry data. International Archives of Photogrammetry

and Remote Sensing 34 (3/W4), 57–65.

Maas, H.G., 1999. The potential of height texture measures for

the segmentation of airborne laserscanner data. In: Proceed-

ings of the Fourth International Airborne Remote Sensing

Conference, Ottawa, Canada, pp. 154–161.

Murakami, H., Nakagawa, K., Hasegawa, H., Shibata, T.,

Iwanami, E., 1999. Change detection of buildings using an

airborne laser scanner. Journal of Photogrammetry and

Remote Sensing 54 (2–3), 148–152.

Oude Elberink, S., Maas, H.G., 2000. The use of anisotropic

height texture measures for the segment of airborne laser

scanner data. International Archives of Photogrammetry

and Remote Sensing 33 (B3/2), 678–684.

Roggero, M., 2001. Airborne laser scanning clustering in raw

data. International Archives of Photogrammetry and

Remote Sensing 34 (3/W4), 227–232.

Rousseeuw, P.J., Leroy, A.M., 1987. Robust Regression and

Outlier Detection, Wiley, New York, NY, 329 pp.

Thoma, D.P., Gupta, S.T., Bauer, M.E., 2001. Quantifying

river bank erosion with scanning laser altimetry. Interna-

tional Archives of Photogrammetry and Remote Sensing 34

(3/W4), 169–173.

Thomas, R.H., Abdalati, W., Akins, T., Csatho, B., Frederick,

E., Gogineni, S., Krabill, W.B., Manizade, S., Rignot, E.,

2000. Substantial thinning of a major East Greenland outlet

glacier. Geophysical Research Letters 27 (9), 1291–1294.

Vosselman, G., 2001. 3D building model reconstruction from

point clouds and ground plans. International Archives of

Photogrammetry and Remote Sensing 34 (3/W4), 37–43.

surface classification from airborne laser scanning data

Documents