surface classification from airborne laser scanning data
TRANSCRIPT
ARTICLE IN PRESS
0098-3004/$ - se
doi:10.1016/j.ca
$Paper presen
April 2002.$$Editorial
E-mail addr1Present add
Engineering, T
32000, Israel.
Computers & Geosciences 30 (2004) 1033–1041
www.elsevier.com/locate/cageo
Surface classification from airborne laser scanning data$,$$
Sagi Filin1
Faculty of Aerospace Engineering, Photogrammetry and Remote Sensing Section, Delft University of Technology, Delft, The Netherlands
Accepted 18 July 2004
Abstract
The high point density of airborne laser mapping systems enables achieving a detailed description of geographic
objects and of the terrain. Growing experience shows, however, that extracting information directly from the data is
practically impossible. This applies to basic tasks like Digital Elevation Model (DEM) generation and to more involved
ones like the extraction of objects or generation of 3D city models. This paper presents an algorithm for surface
clustering and for identifying structure in the laser data. The proposed approach concerns analyzing the surface texture,
and via unsupervised classification identifying segments that exhibit homogeneous behavior. Clustering involves
analysis of several key issues in relation to processing laser data such as different point densities, processing an
irregularly distributed data set, analysis of attributes that can be derived from the data set, and ways to extract
attributes. This paper provides a detailed discussion of these issues as well.
r 2004 Elsevier Ltd. All rights reserved.
Keywords: LiDAR; Classification; Segmentation; Data processing
1. Introduction
Laser altimetry has emerged in recent years as a
leading technology for the extraction of information of
physical surfaces. The ever increasing point density of
current airborne systems enables achieving a detailed
description of the surveyed surfaces and provides a
wealth of information on physical objects and on the
terrain. As a result an increasing number of GIS
applications use LiDAR data, usually for generation
of Digital Surface Models (DSM) or related applications
like detection of urban changes or monitoring environ-
e front matter r 2004 Elsevier Ltd. All rights reserve
geo.2004.07.009
ted at the GIS Research Conference, Sheffield,
handing by Steve Wise
ess: [email protected] (S. Filin).
ress: Faculty of Civil and Environmental
echnion-Israel Institute of Technology, Haifa
mental phenomena (Murakami et al., 1999; Thomas et
al., 2000; Thoma et al., 2001). For urban mapping, the
detailed surface description can be used for generation
of 3D city models and thus serve a variety of
applications, ranging from urban planning and tele-
communication to real estate management and taxation.
The growing experience with processing the data shows,
however, that deriving products directly from the data is
practically impossible and that the extraction of
geographic information must identify the structure in
the data as a prerequisite step. As an example, consider
the generation of a surface or a terrain model. Terrain
model generation requires identifying and removing
points that were reflected from non-terrain objects, and
with surface models reflections from non-surface points,
e.g., cars, tree branches or trunks should be identified
and treated further. While one application concerns
removing objects and others identifying them, a
common thread in all is the need to introduce a level
of interpretation prior to extracting information or
d.
ARTICLE IN PRESSS. Filin / Computers & Geosciences 30 (2004) 1033–10411034
products from the data. Structure in the data can be
defined in several levels but the fundamental one is
grouping the laser points into segments with consistent
attributes. Grouping the data into consistent structure is
beneficial for other reasons as well; summarizing the
data reduces its volume significantly and enables
encoding the information it carries in a very compact
way; grouping also offers a very natural structure to
organize the data, and by emphasizing consistent
attributes, it reduces the overall effect of the noise
compared to the one of the signal.
An indication of the importance associated with the
process is the number of algorithms that were developed
for segmentation of range data. However, the majority
were proposed for close-range applications, e.g., Besl
(1988); Girod et al. (2000); Koster and Spann (2000) or
others referred to in the review by Hoover et al. (1996).
Close-range applications are usually used for reverse
engineering of objects that have a well-defined smooth
shape, so the focus is on finding a partition of the data
into disjoint smooth segments. Airborne laser scanning
offers more complex data consisting of a variety of
phenomena. Yet, the majority of the reported algo-
rithms focus on the specific class of planar surfaces (see,
e.g., Lee and Schenk, 2001; Roggero, 2001; Vosselman,
2001) mostly in relation with the extraction of roof
facets for building extraction. By identifying only one
type of objects in the data set these algorithms fail in
coping with complex building shapes, or mixture of
vegetation and buildings; they also lack the generality
required in associating the laser points in the data set
with a segment (the essence of data segmentation). Maas
(1999) and Oude Elberink and Maas (2000) propose
segmentation algorithms for a rasterized and quantized
version of the range data (i.e., range images). The
authors use height texture measures to identify classes in
the data. Their algorithms classify the data by attaching
a label to each pixel but practically do not provide
surface segments. So, while an association of the data
with classes is achieved, identifying structure in the data
is not guaranteed at all.
This paper presents an algorithm for identifying
structure in LiDAR data by point clustering and surface
classification. The goal is to identify point clusters with
homogeneous attributes. Homogeneity refers here to the
consistency of some data properties such as elevation,
trend or texture. Clustering, a label for procedures
aimed at grouping data into homogeneous patterns,
usually without an explicit a priori definition for the
patterns, offers flexibility in accommodating spatial
relation and data attributes together. As a classification
methodology it enables incorporating different cues into
the process in a very natural way. Clustering can be seen
as a combination of two processes—identifying patterns
in the data based on attributes and grouping the data
into clusters. Attributes capture the information em-
bedded in the data and should ensure sufficient
separation among classes. The structure that is identified
by the attributes should be coherent and spatially
meaningful; it should avoid an algorithmic tendency
for oversegmentation that forms small groups that are
very homogeneous but have little spatial context.
Processing LiDAR data offers some complexities. The
data itself consist of by nature a set of irregularly
distributed points (sometimes referred to as scattered
surface data) that carry only a limited amount of
information, namely, their x, y, and z coordinates. The
spatial point distribution may vary from one system to
another and the same applies to the point density that
cannot be assumed fixed. The algorithm that is
presented here copes with varying point density and
operates on the laser points directly without raster-
ization or other preliminary processing that introduces
unnecessary distortions. So, in addition to the algorith-
mic concerns, an adaptation of image-processing con-
cepts to the irregular pattern becomes necessary. With
the aim of identifying structure in the data in mind, the
proposed algorithm is general and can be applied in a
variety of applications.
The paper is organized as follows. Section 2 presents
the clustering algorithm. First the surface categories are
defined and in Section 2.1 the selected features to classify
the data are discussed. Following in Section 2.2 the
proposed method for identifying surface classes is
presented. Section 3 presents the clustering algorithm
as a whole, and Section 4 provides discussion and
results. As the applicability of LiDAR data for the
extraction of semantic information is perhaps best
manifested with the extraction of objects in cluttered
areas such as urban scenes, where image interpretation
systems usually fail, results are, therefore, presented in
relation to urban data.
2. Surface clustering
Data clustering can be defined as an exhaustive
partition of the data into disjoint regions, each with a
homogeneous property. More formally, let R={pi | i=1,
2,y,N} be the range data with pi the laser points and N
the number of points in the data set, and S={S1,
S2,y,SK} the data clusters where Si={pi1, pi2,y,piNi}
are the segments defined by a collection of laser points
and k the number of segments. Then, the clusters should
fulfill the following properties:
R ¼[k
i¼1
Si and Si \ Sj ¼ 0 8iaj: (1)
A cluster Si is assumed to be an instantiation of a
more generic process, which in the current case is defined
as a surface category. One surface category can
ARTICLE IN PRESS
Table 1
Surface categories vs. attributes
Category Height
difference
Surface trend Curvature
High vegetation Large Rapidly varying High
Low vegetation Medium Rapidly varying High
Smooth surface Small Locally constant Varying
Planar surface Small Fixed Zero
Fig. 1. A plane in 3D space.
S. Filin / Computers & Geosciences 30 (2004) 1033–1041 1035
instantiate several surface segments. The current im-
plementation specifies four different surface categories,
(i) forested/wooded areas, (ii) low vegetation areas and
rough surfaces, (iii) smoothly varying surfaces, and (iv)
planar surfaces. Surfaces refer here to the interpretation
of the data obtained by a laser scanning system. The
categories present one interpretation of the laser surface
and are not aimed at providing a topographic structure
of the terrain, mainly since the acquired data is not the
terrain itself. The first two categories refer, in general, to
two different types of vegetation, but the distinction
between the two is also made here for another reason.
Objects in these categories may require different
treatment when being filtered out or edited further.
The ‘‘low vegetation’’ category may include some other
objects like vehicles, or rough terrain that cannot be
distinguished from low vegetation by ranging at them;
they are therefore considered as one group. Planar
surfaces, a sub-class of smooth surfaces, are separated
here because of the tendency of man-made objects to
have planar facets; such information is valuable for
other applications. The following section discusses the
attributes that are selected to distinguish among the
surface categories and surfaces within each category.
2.1. Surface attributes
Analysis of LiDAR data relies by nature on attributes
that measure surface texture, these measures should be
sufficient to differentiate among surface categories and
among surfaces within each category. A literature review
shows a variety of measures that were proposed for
range data segmentation. Among them is the analysis of
the height histogram within a window, analysis of
variation in the surface normal direction (Flynn and
Jain, 1991), or the analysis of the second derivatives
(Axelsson, 1999). Besl (1988) and others propose the
segmentation of range data by the analysis of the mean
and Gaussian curvature. Maas (1999) uses a feature
vector consisting of the Laplacian, maximum slope and
the pixel height for classifying the data. Surface texture
can be measured, in general, in terms of height variation,
variations in the surface trend, and variation in surface
curvature. Table 1 lists the expected behavior of these
attributes for the proposed categories. The attributes in
Table 1 indicate that texture measures that are based on
height differences and surface trend are sufficient to
define the four categories and to distinguish them from
one another. These attributes are qualitative and not
strict but they indicate that a separation of these four
categories is possible. Their modeling is in essential part
of the algorithm.
Based on the analysis of the surface categories the
following measures are being used for the clustering
algorithm—the point position, the parameters of the
tangent plane to the point, and the relative height
difference between the point and its neighbors. Together
they form a 7-tuple attribute vector vi :={xi, yi, zi, fi, yi,
ri, di} for each laser point, with xi, yi, zi, the laser
point coordinates, di the height difference of the point
to its neighbors, and fi, yi, ri, the surface parameters
in a polar representation. A polar notation enables
describing all planes (including vertical ones) with
only three parameters and avoiding the use of tangents,
which do not behave linearly as the surface slope
increases. In polar representation a surface can be
written as
cos y cos fx þ sin y cos y þ sin fz þ r ¼ 0: (2)
With the surface normal direction given in Eq. (3)
(see also Fig. 1)
~n ¼ RzðfÞRyðyÞ
1
0
0
264
375 ¼
cos y cos f
sin y cos f
sin f
264
375: (3)
The inclusion of the point position as an attribute in
the feature vector may seem odd, but as the goal is to
find homogeneous clusters in the data, proximity of
points with similar texture measures is a significant
attribute in clustering the data. The feature-space can be
ARTICLE IN PRESS
Fig. 2. Potential inseparability of surfaces based on height
differences.
S. Filin / Computers & Geosciences 30 (2004) 1033–10411036
viewed as an attachment of a 4D feature vector
consisting of the height and tangent plane parameters
for each laser point.
The attributes contribution to the separation of
clusters can be interpreted as follows. Height differences
measure local variation and are expected to be reliable
up to the level of noise in the data. They capture the
existence of step edges and emulate in a way the effect of
an edge operator in raster data. Consequently, height
differences enhance the separation of clusters and
provide an adequate indication for the existence of high
vegetation as well. However, height differences are
insufficient for separating vegetation from buildings if
both are in close proximity, or for separating smooth
surfaces as the example in Fig. 2 demonstrates. The
tangent plane parameters measure the surface normal
direction and the plane position in space. Surface
normals capture first-order discontinuities and model
the existence of crease edges. They enhance the
separation among surfaces with different slopes, see
Fig. 2. The surface constant enables separating surfaces
with similar slopes, for example separating the ground
and flat horizontal rooftops. It shares some correlation
with the height difference, but a plane constant refers to
an infinite plane and is a rather global measure whereas
the latter measures the difference between neighboring
points, and is rather local.
2.2. Surface texture analysis
Based on the feature vector that has been defined, the
goal is to identify consistent structure in the data. It is
common to measure surface texture by analyzing the
attribute variation within a given neighborhood (usually
a window) around each point and to classify the point
based on this analysis. However, this approach is in
essence a point classification method and is rather
restrictive in generalizing into a clustering algorithm. As
a window based method, this approach also tends to fail
in classifying data in inhomogeneous situations, for
example around edges and building corners where
different processes are covered by the window; it also
depends on the window size. As a result, misclassifica-
tion of data is inevitable and structure is not recon-
structed completely.
To avoid these effects the approach taken here uses
direct classification of the attributes in a feature-space.
In the feature-space each point is represented by its
feature vector, where the values of the feature vector
determine the coordinates of the laser point in that
space. Clusters are then identified according to proxi-
mity of points in the feature-space. It is possible to
extract clusters directly from the 7D space by methods
like the K-means algorithm or others (see e.g., Flynn
and Jain, 1991), but the dimensionality of the problem
and the iterative fashion in which this algorithm works,
slow the process. Reduction of the dimensionality and
simplification of the search for classes is achieved by the
separation of the attributes from the 7D feature vector,
thereby removing the positional component, and creat-
ing a 4D attribute-space consisting of the surface
parameters and the height differences. The clustering
process is now separated into two parts. First, surface
classes are detected in the 4D attribute-space, and then
laser points that are part of a class are grouped in object-
space according to a neighborhood definition. Neigh-
borhood is defined here by the topology of the Delaunay
triangulation of the laser points; alternative neighbor-
hood definitions can be used as well (see e.g., Chaud-
huri, 1996). The relation between a surface class in
the attribute-space and a cluster in object-space is
not necessarily one-to-one since the attribute-space does
not contain positional information. As clusters are
defined by their attributes and by the neighborhood
relation in object-space a unique definition of a cluster
in object-space is achieved. One characteristic feature
of the attribute-space is that smooth surfaces tend
to cluster in this space but ‘‘vegetation’’ surfaces
(categories (i) and (ii) in Table 1) do not. ‘‘Vegetation’’
surfaces are defined by their lack of consistency, and
are identified by analyzing the unclustered points. The
surface attributes that are used here, in particular
the surface normals, enhance the tendency of vegetation
not to cluster. One consequence is that vegetation
and structured surfaces are unlikely to be grouped
together. Table 1 indicates that the dominant attribute
that separates high vegetation from low vegetation is
the height difference. Clustering points of these two
classes is therefore conducted by analyzing the unclus-
tered points (large slope variation) according to
their height distance and graph connectivity,
although in mixed areas such separation may not be
possible.
The extraction of point clusters of surface classes is
not the end result of the process. A validation and
refinement phase follows to ensure that the clusters refer
to actual physical objects. Then, an evaluation of the
clusters with respect to their neighborhood should take
place before they wear their final shape. The extracted
clusters are, therefore, considered surface proposals. The
following section presents the clustering algorithm as a
whole and elaborates on key issues that refer to the data
processing and analysis.
ARTICLE IN PRESSS. Filin / Computers & Geosciences 30 (2004) 1033–1041 1037
3. The clustering algorithm
Based on the extracted features and the formation of
the attribute-space the clustering can be described by the
algorithm listed in Table 2.
The algorithm is data-driven and region based. The
first phase that involves the formation of the clusters
combines two parts—generation of cluster proposals
from the attribute-space and validation of the proposed
clusters. Surface classes are detected by a mode seeking
algorithm (Haralick and Shapiro, 1992). The attribute-
space is discretized as a function of the expected
uncertainty in feature estimation due to noise in the
data. Laser points that contribute to the mode and
the surrounding region are collected and analyzed. The
analysis consists of linking the laser points into clusters
by region growing and then validating the consistency of
the clusters. Validation of the smooth clusters is
performed by a local plane fitting and an analysis of
the results, extension to general smooth surfaces follows
in a later phase. The surface extraction phase terminates
when no significant modes are detected.
3.1. Attribute computation
Noise or outliers may have a significant effect on the
values of the computed attributes and in turn affect the
analysis of the attribute-space and the clusters. The
noise effect on the attributes is influenced by the point
density—high point density provides a well-defined
surface but, on a local scale, a noisy one, lower point
density attenuates the noise effect but leads to a less
detailed surface description. The computation of the
features and the cluster analysis are, therefore, governed
in large by the existence of noise and outliers in the data.
Outliers are defined here as points that statistically do
not belong to their neighborhood. The term outlier may
be misleading since some of these points are in fact
reflected from physical objects (e.g., power lines or
Table 2
Clustering algorithm
1. Compute attributes di and fi,yi,ri, 8 laser points
2. Generate an attribute-space
3. Propose a surface class and identify points associated with the clas
4. Group points according to the neighborhood system, and compute
5. for each group
6. if the group size p 3 points then dismiss group else compute surfa
7. if s4a predefined threshold then
8. Test for the existence of outliers
9. Test for the existence of more than one class and split if need
10. endfor
11. repeat steps 3–10 until no meaningful surfaces are proposed
12. Extend each cluster based on its attributes until no further points
13. Merge clusters if they share similar attributes
14. Analyze and group unclassified points based on height variation
poles). The test is performed here by using the t-statistics
where the point is compared to its neighborhood. Since
the topological neighborhood (defined by the triangula-
tion of the point) may be insufficient, the neighborhood
is extended. This analysis is unlikely to classify break-
lines or corners as outliers since the standard deviation
(std.) is initially high. The surface parameters are
computed based on the neighborhood of the point. In
considering the neighborhood, this computation be-
comes a geometric implementation of a low-pass filter
integrated into the computation of the first derivatives.
With high point density such as few points per square
meter, the neighborhood for computing the attributes of
the points is extended.
3.2. Cluster validation
Validation of the proposed clusters concerns testing
whether the cluster is homogeneous and indeed com-
posed of only one surface class, and if that is so,
validating that all points in the cluster belong to the
same class. It is possible (and happens indeed) that due
to smoothing, points that do not belong to the class
(or that are marginal) obtain attributes that are similar
to their neighbors, or that points that belong to two
neighboring surfaces with similar surface attributes are
grouped together. The algorithm handles the two cases
as follows. The null-hypothesis assumes that the cluster
represents only one class. Therefore, the existence of
outliers is tested first. Outliers are detected via a
normalized residuals analysis. Instead of the std. the
median deviation, a measure that is more robust to the
existence of outliers (Rousseeuw and Leroy, 1987) is
used. It is noted that robust methods for detecting up to
50% of outliers, like the least-median-of-squares (Rous-
seeuw and Leroy, 1987) exist, but as they are essentially
greedy algorithms they are very slow. In general serious
outliers are already filtered out in attribute-space, so the
situation is by nature more controlled and leads the
s
attributes (see text), in particular the estimated std. s
ce attributes for the group
ed
can be added, or another cluster has been reached
ARTICLE IN PRESSS. Filin / Computers & Geosciences 30 (2004) 1033–10411038
simplified algorithm to work well. Failure is an
indication that the cluster may be composed of more
than one surface and therefore should be split. Some
methods such as refinement of the clustering to the given
set of points were tested for resolving this ambiguity,
however, results showed some sensitivity to the point
density and the shape of the clusters in question. An
algorithm that has proven to be robust to such cases has
its foundations in the random sampling consensus
algorithm (RANSAC) (Fischler and Bolles, 1981), but
while the original RANSAC has a combinatorial
computational complexity, the one proposed here is
linear. The original RANSAC algorithm evaluates a
small subset of the data sufficient to establish a solution
and then adds points from the data set that agree with
this solution. The algorithm evaluates sufficient combi-
nations to ensure that there is a 95% chance that the
correct solution will be identified. Here, the solutions
(i.e., planar surfaces) that are being probed are the
tangent planes to a point, which are computed by the
point and its neighbors. The rest of the implementation
follows the original RANSAC algorithm. Points within
the cluster that meet a predefined accuracy criteria are
being added to the proposed sub-cluster until no more
points can be added further. The sub-cluster that has the
most number of points is defined as a cluster, and the
remaining points are then analyzed in the same fashion
to identify remaining clusters.
3.3. Surface growing and merging
Extension of the cluster to neighboring points is only
a natural step, as usually the cluster constructed from
the attribute-space will not include border points.
Inclusion of points is carried out by testing whether
the point originates from the same distribution as the
cluster. Testing is performed by the t-statistics. Points
near crease edges may be associated with several
neighboring surfaces; they are, therefore, marked
ambiguous.
Merging clusters involve testing whether the clusters
share similar statistics. The F-test is a direct way to test
the change in the variance of a merged cluster with
respect to the original ones. Using sequential least-
squares techniques this computation can be carried out
quite efficiently. As has been noted, under the assump-
tion that smooth surfaces can be approximated locally
by planar surfaces the extracted clusters are validated by
local plane fitting. The merging phase enables testing
whether the clusters are part of a more global smooth
surface, which is approximated by a quadratic surface. If
the null-hypothesis that the two clusters are part of one
planar surface fails, an alternative test is formulated in
which the null-hypothesis is that the clusters are part of
one smooth surface. The approximation into local
planar surfaces and testing for smoothness in object–
space enables circumventing two major problems that
are usually associated with segmentation algorithms.
One is identifying smooth surface directly from the
attribute-space, and the other is over-parameterization
of the estimated surface.
As a global test the F-test is less sensitive to annexing
small clusters to larger ones. Cases where not all points
are in the annexed cluster are indeed part of the newly
formed surface are possible. A decision has to be made
then whether to reject the merging, accept the merging
and remove the ‘‘outlier’’ points, or to split the cluster. If
the result of the global test has indicated that both
clusters are part of one surface a subsequent test is
carried out, in which the ratio between the number of
‘‘inliers’’ to the original number of points exceeds a
discriminating value. More formally, if the null-hypoth-
esis in testing the clusters Si and Sj to be part of the same
surface has been accepted then the following test is
performed:
Tj ¼#fp 2 Si\jjp 2 Sjg
#fp 2 Sjg4q; (4)
where p the laser points, Si\j the joint cluster, #f gnumber of elements in the set, and q the discriminant
value. The discriminant value was set to 60% of the
points for clusters of more 10 points and a more liberal
value for clusters that have less then 10 points. The
motivation for selecting high value for larger clusters
(practically implying that the majority of the points in a
cluster should belong to the merged cluster) is to prevent
the formation of unnecessary partitions of well-defined
clusters. With smaller clusters the motivation is either to
eliminate them or to reduce them as much as possible. If
the number of ‘‘outliers’’ is too small to form a cluster
(and experience shows that this is the common case) the
leftover points are tested for merging with respect to
other neighboring clusters.
4. Discussion and results
The proposed method offers a very natural way to
model variation in the data and to identify homoge-
neous groups and structure. The chosen attributes detect
the existence of step and crease edges, but by searching
for homogeneous groups they also model the within
group consistency without the need to grow a region
from seed points first as is the case with region-growing
algorithms. Structure is obtained directly from the
parameter space and is validated in object–space. Notice
that in contrast to segmentation algorithms that are
based on surface fitting, by modeling essentially the
surface tangents and fitting surfaces locally the algo-
rithm can identify more complex structures. Another
favorable property of the algorithm is that the use of the
attribute-space to identify structure makes the algorithm
ARTICLE IN PRESS
Fig. 3. Clustering results for Stuttgart data (1.5m resolution).
(a) Original range data (intensities are a function of height),
(b,c) clustered data. Bright points are ones that were classified
as part of a smooth or planar surface, gray points are
vegetation points or ones with high elevation variation.
S. Filin / Computers & Geosciences 30 (2004) 1033–1041 1039
independent of measuring surface texture within a
window. In fact, the algorithm analyzes all windows
simultaneously.
Results for testing the algorithm are presented for
data sets with medium to relatively low resolutions,
which are less detailed and considered more difficult to
process. The data sets consist of last return data. In the
implementation of the algorithm the value of the std.
threshold (see the algorithm in Table 2) was set to 15 cm.
To prevent cases of undersegmentation due to an initial
small cluster with a very small std. (which then becomes
difficult to extend), a minimum std. threshold was
defined as well and was set to 5 cm. The lower threshold
was used when testing for inclusion of points in the
cluster in the cluster-growing phase.
The first data set was acquired in the suburban part of
Stuttgart. The spacing is about 1.5m between points.
The first data set describes a scene that consists of
several buildings, smooth ground surface and vegetation
that is close to the buildings. The data set is presented in
Fig. 3a and the results of applying the clustering
algorithm are in Figs. 3b and c. The bright points are
the ones that were classified as part of smooth surface
clusters and the gray ones are points that were classified
as part of vegetation or unclassified points with high
elevation variation. As can be seen, the algorithm
separated successfully the smooth objects, like rooftops
or smooth parts on the ground, from the vegetation,
even in cases where both were close to one another.
Since the vegetation is rather sparse it is difficult to
distinguish between high and low vegetation. Therefore,
they are classified as one structure.
The second data set is also taken from the Stuttgart
data set. The structure of the building roofs here is more
complex, see for example the rightmost building in
Fig. 4a. Comparing Fig. 4a to c, one can see the removal
of several points that were identified as outliers from the
data set. It can also be seen that the algorithm succeeded
in identifying and isolating the planar surfaces of the low
structure at front from the surrounding vegetation. In
Fig. 4c one can see the identification of what seems to be
low vegetation (or alternatively parking cars) along the
road that separated the two building blocks. Identifying
these objects is largely due to the surface trend
attributes.
The third data set has a lower ground spacing of
about 2.5m between points. The data set is acquired
over the Vaihingen area in Germany (Fig. 5). Buildings
here are smaller in size and lower in height; therefore,
finding structures like planar surfaces is more difficult.
The results show that the algorithm managed to identify
successfully the facets of the building at the center of the
scene and also the one at the far right. Considering
the complexity of the shape of the central building and
the point spacing, the results indicate that the algorithm
is capable of identifying fine structures without any
preliminary knowledge of their location. The fact that
the vegetation is generally correctly classified also
suggests that the algorithm does not fall into the trap
of identifying structures when none exist.
The quality of the clusters is analyzed by the standard
deviation of the laser points from the fitted surface. The
minimal size of clusters was set to seven points, which
offers redundancy of four points in plane fitting, and
also refers to the point density and the size of objects in
the Vaihingen data set (in particular roof faces). Results
are summarized in Table 3. The quality of the results is
an indication to the potential quality of information that
can be achieved by LiDAR data. As can be seen from
ARTICLE IN PRESS
Fig. 4. Clustering results for Stuttgart data (1.5m resolution).
(a) Original range data (intensities are a function of height),
(b,c) clustered data. Bright points are ones that were classified
as part of a smooth or planar surface, gray points are
vegetation points or ones with high elevation variation.
Fig. 5. Clustering results for Vaihingen data (2.5m resolution)
(a) Original range data (intensities are a function of height), (b)
clustered data. Bright points are one that were classified as part
of a smooth or planar surface, gray points are vegetation points
or ones with high elevation variation.
Table 3
Accuracy estimate of surfaces clusters
Data set std. range [m] Number of clusters (%)
Stuttgart 1 0oso.05 61
0.05oso.10 38
0.10oso.12 1
Stuttgart 2 0oso.05 66
0.05oso.10 24
0.10oso.13 10
Vaihingen 0oso.05 60
0.05oso.10 34
0.10oso.12 6
S. Filin / Computers & Geosciences 30 (2004) 1033–10411040
Table 3 in all the cases the majority of the clusters had a
std. smaller than 5 cm, which was the minimum thresh-
old that was set. In all these cases a small fraction of
clusters had a std. larger then 10 cm but did not exceed
13 cm even though the upper limit was set at 15 cm. The
results indicate that the cluster proposals manage to
propose natural clusters. The surface fitting accuracy of
the large clusters within all three data sets was below
5 cm. The size of the large clusters was of the order of
several hundreds of points per cluster and in the second
data set the largest one exceeded 1000 points. The
majority of the clusters in the high-accuracy category
had a relatively large number of points per cluster. There
is a direct correlation between the number of points per
cluster and surface quality, so in addition to the data
density the number of points has an effect on the ability
to determine the surface parameters accurately. This
realization was very evident in the Vaihingen data set,
where few of the roof faces clusters had their fitting
accuracy in the third category (10 cmostd.o12 cm),
without much place for improvement by removing
points. It was evident that these points represent a
structure, as they all were part of one roof face, so
dismissing them seemed a wrong decision. As these
objects are very likely to represent a structure in the
data that, due to low point density, cannot be
defined more precisely, these points are considered as a
coarse representation of these objects. The std. value
that is attached to these clusters serves as an indication
for that.
5. Conclusions
The paper presented a methodology for clustering
laser data surfaces. The approach that is taken is
hierarchical in nature, as it defines surface categories
as processes that instantiate the surfaces. Identifying
features that enables to distinguish the categories and
surfaces within each category are perhaps the key to the
successful identification of structure in the data. It was
ARTICLE IN PRESSS. Filin / Computers & Geosciences 30 (2004) 1033–1041 1041
shown that height variation and variation in surface
trend are sufficient to cluster the data. Their incorpora-
tion into the algorithm served both as an edge operator
and as a measure for similarity. Instead of using
standard texture evaluation techniques it was shown
that more natural ways to identify structure and texture
in the data exist. Using the properties of the data and of
the attributes, the analysis of the attribute-space did not
fix the number of clusters as the standard K-means
algorithm does, but implemented an efficient variation
on the attribute-space scanning. Finally, it can be
noticed that the implementation of the clustering
algorithm is based on several ‘‘double checks’’ of
previous results. Surface classes extracted from the
attribute-space are considered only as proposals that
are tested and validated in object-space, so even a
‘‘wrong’’ proposal has little effect on the robustness of
the algorithm. Then, the clusters themselves are eval-
uated with respect to their neighbors and are being
merged to form a more consistent segmentation if
certain criteria are being met. The results show that
even with relatively sparse data sets, a structure can be
identified, alluding to the generality of the algorithm.
References
Axelsson, P., 1999. Processing of laser scanner data—algo-
rithms and applications. Journal of Photogrammetry and
Remote Sensing 54 (2–3), 138–147.
Besl, P.J., 1988 Surface in range image understanding.
Perception Engineering. Springer, New York NY, 339 pp.
Chaudhuri, B.B., 1996. A new definition of neighborhood of a
point in multi-dimensional space. Pattern Recognition
Letters 17 (1), 11–17.
Fischler, M.A., Bolles, R.C., 1981. Random sample consensus:
a paradigm for model fitting with application to image
analysis and automated cartography. Communications of
the Association Computing Machinery 24, 381–395.
Flynn, P.J., Jain, A.K., 1991. BONSAI: 3D object recognition
using constrained search. IEEE Transactions on Pattern
Analysis and Machine Intelligence 13 (10), 1066–1075.
Girod, B., Greiner, G., Niemann, H. (Eds.), 2000. Principles of
3D Image Analysis and Synthesis. Kluwer Academic
Publishers, Dordrecht-Netherlands, 466 pp.
Haralick, R.M., Shapiro, L.G., 1992. Computer and Robot
Vision, vol. 1. Addison-Wesley, Reading, MA, 672 pp.
Hoover, A., Jean-Baptiste, G., Jiang, X., Flynn, P.J., Bunke,
H., Goldgof, D., Bowyer, K., Eggert, D., Fitzgibbon, A.W.,
Fisher, R.B., 1996. An experimental comparison of range
image segmentation algorithms. IEEE Transactions on
Pattern Analysis and Machine Intelligence 18 (7), 673–689.
Koster, K., Spann, M., 2000. MIR: an approach to robust
clustering—application to range image segmentation. IEEE
Transactions on Pattern Analysis and Machine Intelligence
22 (5), 430–444.
Lee, I., Schenk, T., 2000. 3D perceptual organization of laser
altimetry data. International Archives of Photogrammetry
and Remote Sensing 34 (3/W4), 57–65.
Maas, H.G., 1999. The potential of height texture measures for
the segmentation of airborne laserscanner data. In: Proceed-
ings of the Fourth International Airborne Remote Sensing
Conference, Ottawa, Canada, pp. 154–161.
Murakami, H., Nakagawa, K., Hasegawa, H., Shibata, T.,
Iwanami, E., 1999. Change detection of buildings using an
airborne laser scanner. Journal of Photogrammetry and
Remote Sensing 54 (2–3), 148–152.
Oude Elberink, S., Maas, H.G., 2000. The use of anisotropic
height texture measures for the segment of airborne laser
scanner data. International Archives of Photogrammetry
and Remote Sensing 33 (B3/2), 678–684.
Roggero, M., 2001. Airborne laser scanning clustering in raw
data. International Archives of Photogrammetry and
Remote Sensing 34 (3/W4), 227–232.
Rousseeuw, P.J., Leroy, A.M., 1987. Robust Regression and
Outlier Detection, Wiley, New York, NY, 329 pp.
Thoma, D.P., Gupta, S.T., Bauer, M.E., 2001. Quantifying
river bank erosion with scanning laser altimetry. Interna-
tional Archives of Photogrammetry and Remote Sensing 34
(3/W4), 169–173.
Thomas, R.H., Abdalati, W., Akins, T., Csatho, B., Frederick,
E., Gogineni, S., Krabill, W.B., Manizade, S., Rignot, E.,
2000. Substantial thinning of a major East Greenland outlet
glacier. Geophysical Research Letters 27 (9), 1291–1294.
Vosselman, G., 2001. 3D building model reconstruction from
point clouds and ground plans. International Archives of
Photogrammetry and Remote Sensing 34 (3/W4), 37–43.