isprs journal of photogrammetry and remote sensinghy471/papers/contextual... · of photogrammetry...

14
Contextual classification of lidar data and building object detection in urban areas Joachim Niemeyer a,, Franz Rottensteiner a , Uwe Soergel b a Institute of Photogrammetry and GeoInformation, Leibniz Universität Hannover, Nienburger Str. 1, D-30167 Hannover, Germany b Institute of Geodesy, Remote Sensing and Image Analysis, TU Darmstadt, Franziska-Braun-Str. 7, D-64287 Darmstadt, Germany article info Article history: Received 8 July 2013 Received in revised form 4 November 2013 Accepted 4 November 2013 Available online 7 December 2013 Keywords: LIDAR Point cloud Classification Urban Contextual Building Detection abstract In this work we address the task of the contextual classification of an airborne LiDAR point cloud. For that purpose, we integrate a Random Forest classifier into a Conditional Random Field (CRF) framework. It is a flexible approach for obtaining a reliable classification result even in complex urban scenes. In this way, we benefit from the consideration of context on the one hand and from the opportunity to use a large amount of features on the other hand. Considering the interactions in our experiments increases the overall accuracy by 2%, though a larger improvement becomes apparent in the completeness and correct- ness of some of the seven classes discerned in our experiments. We compare the Random Forest approach to linear models for the computation of unary and pairwise potentials of the CRF, and investigate the rel- evance of different features for the LiDAR points as well as for the interaction of neighbouring points. In a second step, building objects are detected based on the classified point cloud. For that purpose, the CRF probabilities for the classes are plugged into a Markov Random Field as unary potentials, in which the pairwise potentials are based on a Potts model. The 2D binary building object masks are extracted and evaluated by the benchmark ISPRS Test Project on Urban Classification and 3D Building Reconstruction. The evaluation shows that the main buildings (larger than 50 m 2 ) can be detected very reliably with a correctness larger than 96% and a completeness of 100%. Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved. 1. Introduction Automated urban object extraction from remotely sensed data is a very challenging task due to the complexity of urban scenes. There are different types of objects such as buildings, low vegeta- tion, trees, fences, and cars, that can be found in a small local neighbourhood, which makes it difficult to extract them reliably. In order to handle this problem, research often focuses on the extraction of a single object type, i.e. buildings, roads, and trees; for overviews, cf. Mayer (2008) and Rottensteiner et al. (2012). Airborne LiDAR (Light Detection And Ranging) is a particularly useful technology for the acquisition of elevation data, with appli- cations such as the generation of digital terrain models (DTM) (Kraus and Pfeifer, 1998), data acquisition for forestry (Reitberger et al., 2009), or power line monitoring (McLaughlin, 2006). LiDAR data are also well-suited for automated object detection for the generation of 3D city models. Building extraction is a prominent application in this context; two recent examples are Huang et al. (2013) and Liu et al. (2013). For many applications a basic step in LiDAR processing is a clas- sification of the point cloud. Each 3D point in the irregularly dis- tributed point cloud is assigned to a semantic object class. Due to the complexity of urban scenes this task is also difficult. It is the goal of this paper to present an approach for the classification of a LiDAR point cloud in urban areas without the use of image data providing spectral information. The only radiometric signal feature we have access to is the so-called intensity, which is a function of the amount of photons collected by the scanning device. After the classification, 2D building outlines are delivered from the labelled point cloud. 1.1. Related work In recent years research mainly focused on the use of super- vised statistical methods for classification in remote sensing be- cause they are more flexible to handle variations in appearance of the objects to be extracted compared to model-based ap- proaches. Besides generative classifiers modelling the joint distri- bution of the data and labels (Bishop, 2006), modern discriminative methods such as AdaBoost (Chan and Paelinckx, 2008), Support Vector Machines (SVM) (Mountrakis et al., 2011), 0924-2716/$ - see front matter Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.isprsjprs.2013.11.001 Corresponding author. Tel.: +49 511 762 19387; fax: +49 511 762 2483. E-mail addresses: [email protected] (J. Niemeyer), rottensteiner@- ipi.uni-hannover.de (F. Rottensteiner), [email protected] (U. Soergel). ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 Contents lists available at ScienceDirect ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs

Upload: others

Post on 10-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing

journal homepage: www.elsevier .com/ locate/ isprs jprs

Contextual classification of lidar data and building object detectionin urban areas

0924-2716/$ - see front matter � 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.isprsjprs.2013.11.001

⇑ Corresponding author. Tel.: +49 511 762 19387; fax: +49 511 762 2483.E-mail addresses: [email protected] (J. Niemeyer), rottensteiner@-

ipi.uni-hannover.de (F. Rottensteiner), [email protected] (U. Soergel).

Joachim Niemeyer a,⇑, Franz Rottensteiner a, Uwe Soergel b

a Institute of Photogrammetry and GeoInformation, Leibniz Universität Hannover, Nienburger Str. 1, D-30167 Hannover, Germanyb Institute of Geodesy, Remote Sensing and Image Analysis, TU Darmstadt, Franziska-Braun-Str. 7, D-64287 Darmstadt, Germany

a r t i c l e i n f o

Article history:Received 8 July 2013Received in revised form 4 November 2013Accepted 4 November 2013Available online 7 December 2013

Keywords:LIDARPoint cloudClassificationUrbanContextualBuildingDetection

a b s t r a c t

In this work we address the task of the contextual classification of an airborne LiDAR point cloud. For thatpurpose, we integrate a Random Forest classifier into a Conditional Random Field (CRF) framework. It is aflexible approach for obtaining a reliable classification result even in complex urban scenes. In this way,we benefit from the consideration of context on the one hand and from the opportunity to use a largeamount of features on the other hand. Considering the interactions in our experiments increases theoverall accuracy by 2%, though a larger improvement becomes apparent in the completeness and correct-ness of some of the seven classes discerned in our experiments. We compare the Random Forest approachto linear models for the computation of unary and pairwise potentials of the CRF, and investigate the rel-evance of different features for the LiDAR points as well as for the interaction of neighbouring points. In asecond step, building objects are detected based on the classified point cloud. For that purpose, the CRFprobabilities for the classes are plugged into a Markov Random Field as unary potentials, in which thepairwise potentials are based on a Potts model. The 2D binary building object masks are extracted andevaluated by the benchmark ISPRS Test Project on Urban Classification and 3D Building Reconstruction.The evaluation shows that the main buildings (larger than 50 m2) can be detected very reliably with acorrectness larger than 96% and a completeness of 100%.� 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier

B.V. All rights reserved.

1. Introduction

Automated urban object extraction from remotely sensed datais a very challenging task due to the complexity of urban scenes.There are different types of objects such as buildings, low vegeta-tion, trees, fences, and cars, that can be found in a small localneighbourhood, which makes it difficult to extract them reliably.In order to handle this problem, research often focuses on theextraction of a single object type, i.e. buildings, roads, and trees;for overviews, cf. Mayer (2008) and Rottensteiner et al. (2012).

Airborne LiDAR (Light Detection And Ranging) is a particularlyuseful technology for the acquisition of elevation data, with appli-cations such as the generation of digital terrain models (DTM)(Kraus and Pfeifer, 1998), data acquisition for forestry (Reitbergeret al., 2009), or power line monitoring (McLaughlin, 2006). LiDARdata are also well-suited for automated object detection for thegeneration of 3D city models. Building extraction is a prominentapplication in this context; two recent examples are Huang et al.(2013) and Liu et al. (2013).

For many applications a basic step in LiDAR processing is a clas-sification of the point cloud. Each 3D point in the irregularly dis-tributed point cloud is assigned to a semantic object class. Due tothe complexity of urban scenes this task is also difficult. It is thegoal of this paper to present an approach for the classification ofa LiDAR point cloud in urban areas without the use of image dataproviding spectral information. The only radiometric signal featurewe have access to is the so-called intensity, which is a function ofthe amount of photons collected by the scanning device. After theclassification, 2D building outlines are delivered from the labelledpoint cloud.

1.1. Related work

In recent years research mainly focused on the use of super-vised statistical methods for classification in remote sensing be-cause they are more flexible to handle variations in appearanceof the objects to be extracted compared to model-based ap-proaches. Besides generative classifiers modelling the joint distri-bution of the data and labels (Bishop, 2006), moderndiscriminative methods such as AdaBoost (Chan and Paelinckx,2008), Support Vector Machines (SVM) (Mountrakis et al., 2011),

Page 2: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 153

and Random Forests (RF) (Breiman, 2001; Gislason et al., 2006) areused. They usually lead to simpler models and need fewer trainingdata in relation to generative models. These classifiers are also ap-plied to LiDAR processing tasks. For instance, Mallet (2010) used apoint-based multi-class SVM for the classification of full-waveform(FW) LiDAR data, whereas Chehata et al. (2009) applied RF for thatpurpose. However, both approaches classify each point indepen-dently without considering the labels of its neighbourhood. Thisis a drawback leading to inhomogeneous results in complex scenessuch as urban areas, as demonstrated for example in Niemeyeret al. (2011). The reason is the diversity of objects’ appearanceseven within a single scene. Especially in urban areas roofs of differ-ent shapes and other challenging objects with many details occur,leading to overlapping distributions of features within each class.Shadows caused by other objects, missing data due to the objects’properties, and random errors in the sensor data aggravate this ef-fect. As a consequence, purely local decisions become uncertain.

An improvement can be achieved by incorporating contextualinformation, which is an important cue for the classification of ob-jects in complex scenes. Spatial dependencies between the objectclasses can be trained to improve the results, because some objectclasses are more likely to occur next to each other than others; forinstance, it is more probable that cars are situated on a street thanon grassland. A sound statistical model of context leads to undi-rected graphical models (Bishop, 2006) such as Markov RandomFields (MRF) (Geman and Geman, 1984). In an MRF, the class labelof an object is statistically dependent on its neighbours, whereasthe data of different objects are assumed to be conditionally inde-pendent (Li, 2009). Conditional Random Fields (CRF) (Kumar andHebert, 2006) offer a more general model. They drop the assump-tion of conditional independence of the data of different objects,expressed in the model of the unary potentials linking the class la-bels to the observations, and the interaction between neighbouringobjects is modelled to depend on both the labels and the data inthe pairwise potentials. CRFs have become a standard techniquefor considering context in classification processes, in particularfor image classification (Kumar and Hebert, 2006; Schindler,2012). They are also becoming more and more popular in the fieldsof photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification (Hoberg et al., 2012),building detection in radar images (Wegner et al., 2011), and clas-sification of façade images (Yang and Förstner, 2011).

Applications of the CRF framework differ in the way they modelthe potentials and in the definition of the graph structure. For theunary potentials, the probabilistic output of a discriminative clas-sifier is frequently used. Examples include linear models (Kumarand Hebert, 2006) and RF (Schindler, 2012). For the pairwisepotentials, most approaches use relatively simple models favour-ing identical labels at neighbouring sites by penalising labelchanges, such as the Potts model. The contrast-sensitive Pottsmodel (Boykov and Jolly, 2001) has the same effect, but adaptsthe degree of penalisation related to the Euclidean distance ofthe feature vectors. Schindler (2012) carried out a comparison ofthese smoothing models applied to high resolution images.Although both methods perform rather well in the comparison,these simple models tend to over-smooth the results. Thus, a morecomplex model might improve the results at the cost of highercomputational efforts in training and of having to provide fully la-belled training images. In Niemeyer et al. (2011) this was shownfor the classification of LiDAR data of urban areas. In this case, lin-ear models were used for both the unary and the pairwise poten-tials. In the latter case they were based on a multi-class modelfor the joint probability of the class labels at neighbouring sitesrather than on a binary model for the probability of the two labelsbeing equal. Nowozin et al. (2011) use RF classifiers for both typesof potentials, also using a multi-class model for the interactions. In

their examples, the random field is constructed over a (radiometricor depth) image grid. The neighbourhood system on which theedges of the graphical model are defined may vary with the appli-cation, but the interactions are restricted to pairs of nodes. Lucchiet al. (2012) use a CRF based on structured SVM (SSVM), which in-cludes an SVM model for the pairwise terms. In their case, thegraphical model is built on segments (superpixels), which reducesthe computational complexity compared to a pixel-basedclassification.

Lucchi et al. (2011) have doubted the contribution of CRF-likemodels for classification, showing that methods for classifyingsuperpixels and applying global features can achieve a similar per-formance as CRF-based models in classification of standard datasets. Their discussion is limited to images and to CRF-based modelsinvolving neighbourhood terms that depend on the relative align-ment of objects in an image. They also show the effects of globalconstraints based on the co-occurrence statistics of objects in ascene. We think that the type of geometrical pairwise model usedin Lucchi et al. (2011) (‘‘sky should appear above grass’’) is not appli-cable to remote sensing images, because it requires the definitionof an absolute reference direction (e.g. the vertical in images hav-ing a horizontal viewing direction). Of course, height differencesare important features in the context of point cloud classification,but the relative alignment in planimetry follows a similar structureas in aerial images. The benefits of using global energy terms suchas those based on co-occurrence statistics, also proposed in Ladickyet al. (2013), would also seem to be doubtful for the classificationof remotely sensed images. In the urban remote sensing case, weusually have a small set of objects which always occur in a scenetogether (e.g., roads, buildings, trees and cars), so that the globalinformation about their co-occurrence would not seem to carrymuch discriminative power.

The first research on the context-based classification of pointcloud labelling was carried out in the fields of robotics and mobileterrestrial laser scanning. Anguelov et al. (2005) proposed a classi-fication of a terrestrial point cloud into four object classes withAssociated Markov Networks (AMN), a subclass of MRF. Neigh-bouring points are assumed to belong to the same object class withhigh probability, which leads to an adaptive smoothing of the clas-sification results. In order to reduce the number of graph nodes,ground points are eliminated based on thresholds before the actualclassification. Munoz et al. (2008) also used point-based AMNs, butthey extended the original isotropic model to an anisotropic one, inorder to emphasise certain orientations of edges. This directionalinformation enables a more accurate classification of objects likepower lines. Rusu et al. (2009) were interested in labelling an in-door robot environment described by point clouds. For objectdetection points are classified using CRFs according to the geomet-ric surface they belong to, such as cylinders or planes. They applieda point-wise classification method, representing every point as anode of the graphical model. Compared to our application theydeal with few points (�80,000), and they even reduce this dataset by about 70% before the classification based on some restric-tions concerning the objects’ positions. Shapovalov et al. (2013)also classified point clouds in indoor scenes, building a graphicalmodel on point cloud segments. They consider long-range depen-dencies by so-called structural links, also based on special direc-tions such as the vertical, the direction to the sensor or thedirection to the nearest wall. In an indoor scenario, walls can be de-tected using heuristics (Shapovalov et al., 2013). However, in theairborne case, the number of points on walls is usually relativelylow. The classifications of points on walls might be one of the prob-lems one would like to solve by a CRF-based model, and the verti-cal and the direction to the sensor are nearly coincident. CRF werealso used by Lim and Suter (2007) for the point-wise classificationof terrestrial LiDAR data. They coped with the computational

Page 3: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

154 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

complexity by adaptive point reduction. The authors improvedtheir approach (Lim and Suter, 2009) by segmenting the points ina first step and classifying the resulting superpixels. They also con-sidered both a local and a regional neighbourhood. Introducingmultiple scales into a CRF represented by long-range links betweensuperpixels, Lim and Suter (2009) improved the classification accu-racy by 5–10%. This shows the importance of considering larger re-gions instead of only a very local neighbourhood of each 3D pointfor a correct classification. An alternative to long-range edges,which might lead to a huge computational burden if points areto be classified individually, is the computation of multi-scale fea-tures. They enable a better classification of points with locally sim-ilar features. Although belonging to different objects, the variationof the regional neighbourhood can support the discrimination be-tween the object types, and hence lead to a correct labelling. An ap-proach utilising CRFs for remotely sensed LiDAR point clouds ispresented by Lu et al. (2009). A DTM is derived from a digital sur-face model (DSM) by applying a hybrid CRF classifying the pointsto ground and non-ground points. At the same time, terrain heightsare estimated. The work of Shapovalov et al. (2010) has its focusonly on the classification of airborne LiDAR points discerning fiveobject classes ground, building, tree, low vegetation, and car. Theauthors improved the drawbacks of AMN by applying a non-asso-ciative Markov Network, which is able to model all class relationsinstead of only preferring a same labelling of both linked nodes.First, the data are over-segmented, and then a segment-wise CRFclassification is performed. Whereas this aspect helps to cope withnoise and computational complexity, the result heavily depends onthe segmentation. Small objects with sub-segment size cannot bedetected, and important object details might be lost, which is, ofcourse, a drawback of all segment-based algorithms. Shapovalovet al. (2010) show that using a segmented point cloud will leadto a loss of 1–3% in overall accuracy due to segmentation errorsand due to the fact that classes having few samples such as carsmight be merged with the background. Whereas this does notseem to be much, it may become relevant if the classes of interestare the ones most affected by these problems. Lafarge and Mallet(2012) use an MRF for the classification of a point cloud. As theirmain interest is in buildings, they set up a simple heuristic modelfor the unary potentials that requires no training, whereas thePotts model is used for the pairwise potentials. The smoothingparameter of the Potts model is also tuned manually. This maybe sufficient for the particular application in this paper, but itwould seem to be more problematic if the number of classes tobe discerned is increased. Xiong et al. (2011) show how point-based and region-based classification of LiDAR data can interact.They propose a hierarchical sequence of relatively simple classifi-ers applied to segments and points. Starting either with an inde-pendent classification of points or segments, in subsequent stepsthe output of the previous step is used to define context featuresthat help to improve the classification results. In each classificationstage, the results of the previous stage are taken for granted, andunlike with CRF, no global optimum of the posterior distributionof all labels is searched (Boykov and Jolly, 2001; Kumar and Hebert,2006).

Point cloud labelling is only a first step for the extraction of ob-jects such as buildings from a point cloud. The second step com-prises the transition from the point cloud to contiguous objects,e.g. represented by boundary polygons. Sampath and Shan (2007)derived 2D boundary polygons by applying a modified convex hullalgorithm directly to segments of points classified as buildings.However, such an approach requires parameters related to themean point distance and may be sensitive to irregular point distri-butions. Dorninger and Pfeifer (2008) use alpha-shapes (Edelsb-runner and Mücke, 1994), which also require a careful tuning ofthe parameter a. Lafarge and Mallet (2012) determine coarse

building outlines by detecting 3D line segments in the subset ofthe point cloud classified as buildings. Again, this requires theselection of a threshold. These initial boundaries are improvedafter a planar segmentation of the point cloud, which is requiredbecause the final goal of Lafarge and Mallet (2012) is the 3D recon-struction of buildings. The planar segmentation also makes use ofan MRF to obtain a geometrically consistent subdivision of thepoint cloud, but this MRF-based approach is only applied to thepart of the point cloud classified as belonging to buildings. Poullis(2013) relies on the clustering of point cloud segments to defineindividual objects. This procedure results in a binary building maskin which the building boundaries are not represented very accu-rately. Holes in the data are removed by an initial interpolation(though no details are explained in the paper). The building out-lines are improved using a graphical model which classifies bound-ary points according to the alignment of the local boundary withpreviously determined dominating orientations.

1.2. Contribution

It is the first goal of this paper to present a probabilistic ap-proach for the contextual classification of point clouds in urbanareas. For that purpose we apply a CRF framework with a complexinteraction model that is also capable to model the local spatialstructure of the data. The proposed supervised classifier is able tolearn context in order to find the most likely label configuration.Following the discussion in Section 1.1 and in Niemeyer et al.(2011), we apply a point-based classification to preserve evensmall objects. Going beyond our previous work, the pairwisepotentials are based on RF, but unlike in Nowozin et al. (2011),our graph is based on points and, thus, irregular. We comparethe new model with two variants of linear models in order todetermine their effectiveness with respect to computational timeand classification accuracy. Moreover, we analyse the influenceof individual features. Whereas Chehata et al. (2009) use it toinvestigate the variable importance for each class, we also take intoaccount interaction classes between neighbouring points. In thiscontext an experiment with only the most important features iscarried out, and the results are compared to the classification basedon all features.

The second goal of this paper is the detection of building objectsbased on the classified point cloud. In our previous work Niemeyeret al. (2013) we tried to achieve this goal using a 2D raster-basedanalysis, filling the pixels of a 2D building mask using the resultsof the point-based classification. Gaps in data were closed by per-forming a morphological closing. In this paper, we present a moresophisticated post-processing technique, using the posterior prob-abilities of the CRF-based classification in an MRF to generate a 2Dmulti-label image that is consistent with the labelled point cloud.MRF were also used by Lafarge and Mallet (2012) for this purpose,but in a more complex scenario in order to achieve the goal of 3Dbuilding reconstruction. This would seem to be a large overheadfor the goal we want to achieve here. In addition, the MRF-basedlabel propagation method proposed by Lafarge and Mallet (2012)is specific for buildings and could not be applied to other objectclasses.

The performance of the proposed method is demonstrated andevaluated on a benchmark data set with three complex urbanscenes. For the 3D classification we discern seven classes and carryout a quantitative evaluation based on 3D reference data generatedmanually. The 2D building objects are evaluated in the context ofthe ISPRS test project on urban classification and 3D buildingreconstruction (Rottensteiner et al., 2012).

This paper is organised as follows. The next section presents ourmethodology including a brief description of CRF and the genera-tion of 2D building objects. After that, Section 3 comprises the

Page 4: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 155

evaluation of the point cloud classification and extraction of ob-jects. The paper concludes with Section 4.

2. Methodology

It is the goal of point cloud classification to assign an objectclass label yi to each 3D point i. Common approaches such as RFand SVM usually consider solely the features of each point sepa-rately, and, thus, classify it independently of its neighbourhood.We use a CRF, which is able to incorporate context in the labelassignment step. As a consequence, all points are labelled simulta-neously. Typical class relations are learned and improve theresults. The following sections describe the CRF framework andthe generation of 2D building objects based on the classificationresults.

2.1. Conditional Random Fields

CRF belong to the family of undirected graphical models with anunderlying graph Gðn; eÞ consisting of nodes n and edges e. In ourcase, each node ni 2 n corresponds to a 3D point. We assign classlabels yi to all points simultaneously based on observed data x.The vector y 2 X contains the labels yi for all nodes, and hencehas the same number of elements as n. The graph edges eij are usedto model the relations between pairs of adjacent nodes ni and nj,and thus enable representing contextual relations. For that pur-pose, each point ni is linked to other points (nj 2 Ni) by edges.CRF are discriminative classifiers that model the posterior distribu-tion pðyjxÞ directly (Kumar and Hebert, 2006):

pðyjxÞ ¼ 1ZðxÞ

Yi2n

/iðx; yiÞYi2n

Yj2Ni

wijðx; yi; yjÞ !

: ð1Þ

In Eq. (1), Ni is the neighbourhood of node ni, corresponding to theedges linked to this particular node. The two terms /iðx; yiÞ andwijðx; yi; yjÞ are called the unary and pairwise potentials, respec-tively; they are explained in the next sections. The partition func-tion Z(x) acts as normalisation constant, turning potentials intoprobabilities.

2.1.1. Definition of the graphCompared to images, a point cloud is more complex because

points are irregularly distributed in 3D space. For points there isno straightforward definition of the neighbourhood that can beused to define the edges of the graph. In contrast, images are ar-ranged in a lattice, and each pixel has a defined number of neigh-bours (usually four or eight). In our case each point is linked byedges to its k nearest neighbours in 2D, which corresponds to acylindrical neighbourhood. In contrast to a spherical neighbour-hood, important edges with height differences, for instance fromthe canopy to the ground, are more likely to occur in such a graph.They might give valuable hints for the local configuration of classes(Niemeyer et al., 2011).

2.1.2. Unary potentialIn the unary potentials the data are represented by node fea-

ture vectors hiðxÞ. For each node ni such a vector is determinedtaking into account not only the data xi observed at the corre-sponding point, but also at the points in a certain neighbourhood.The particular definition of the node features depends on the datasets; the features we used in our experiments are described inSection 3.2.1. Using these node feature vectors hiðxÞ, the unarypotential /iðx; yiÞ linking the data to the class labels determinesthe most probable label for a single node given its site-wisefeatures. It is modelled to be proportional to the probability foryi given the data:

/iðx; yiÞ / pðyijhiðxÞÞ: ð2Þ

This is a very general formulation which allows to use any discrim-inative classifier with a probabilistic output for the unary potential(Kumar and Hebert, 2006). In this work we chose a linear model andan RF classifier for the computation of the unary potential to enablea comparison. Linear models were used for the definition of the un-ary potentials for instance by Kumar and Hebert (2006). RF havebeen shown to be well suited for the (non-contextual) classificationof LiDAR data in Chehata et al. (2009).

In case of the linear model, the unary potential is definedaccording to Eq. (3):

/i;LMðx; yi ¼ lÞ ¼ expðwTl � hiðxÞÞ ð3Þ

In Eq. (3), the feature vectors hiðxÞ, including an additional bias fea-ture that always takes the value 1 (Bishop, 2006), are multiplied bya weight vector wl. There is one such a vector wl for each class l, andthese vectors are determined in the training stage.

As in Niemeyer et al. (2011) we restrict ourselves to assume theclasses to be linearly separable. For many data sets this assumptionis not valid and a feature space mapping based on a quadratic fea-ture expansion (Kumar and Hebert, 2006) leads to better results.Such a generalised linear model (GLM) improved the classification(Niemeyer et al., 2012), but it comes along with significantly highercomputational costs as the number of parameters increases con-siderably. Thus, this method is only applicable for a small amountof features and is not able to handle many features as in our appli-cation. For this reason we focus on linear models and do not use aGLM in this study.

The other classification method used for the definition of theunary potentials in this paper is RF. Based on its design, this clas-sifier is directly appropriate for discerning multiple object classes,and it can handle many features (Gislason et al., 2006). RF do notrequire any assumptions about the distribution of the data. AnRF is a bootstrap ensemble classifier based on decision-trees. Itconsists of a number T of trees grown in a training step. Each inter-nal node of any tree contains a test to find the best feature and acorresponding threshold splitting the data into two parts. The com-bination optimising a criterion, for example the Gini gain (Breiman,2001), is chosen. We used an RF implementation for MATLAB(Abhishek, 2009). In this implementation, the depth of a tree de-pends on the separability by these features. A random subset ofm features is evaluated at each node, and the thresholds are ran-domly tested. Each tree is grown until there is only one samplein each node leaf. The classification is performed by presentingthe features of an unknown sample to all the trees. Each tree ti

of the nT trees casts a vote for the most likely class. If the numberof votes cast for a class l is Nl, the unary potentials is defined by

/i;RFðx; yi ¼ lÞ ¼ expðNl=nTÞ: ð4Þ

The exponent of the posteriors is used to avoid potentials of value 0for unlikely classes.

It is an advantage of RF that a value for the feature importancecan easily be obtained. For the importance measurement of afeature, its values are randomly permuted. In this way, the absenceof this feature can be modelled. Then, the number of correctly clas-sified points before and after permuting the feature is compared. Incase of a large difference between both results, the importance ofthis feature is high for the classification task. The reader is referredto Breiman (2001) for more details. This method is used to deter-mine the feature importance in our application in Section 3.2.4.

2.1.3. Pairwise potentialThe second term in Eq. (1) represents the pairwise (or binary)

potential wijðx; yi; yjÞ and incorporates the contextual relationsexplicitly in the classification. It models the dependencies of a node

Page 5: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

156 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

ni from its adjacent node nj by comparing both node labels andconsidering the observed data x. In the pairwise potentials, thedata are represented by interaction feature vectors lijðxÞ whichare computed for each edge eij. In our case, each point is linkedto points in its direct neighbourhood. Due to the similarity of thefeatures at neighbouring nodes, a difference of both node featurevectors, lijðxÞ ¼ hiðxÞ � hjðxÞ, would thus lead to a vector withmost elements close to zero. Experiments have shown that thismay work in the case of discerning only a few classes and utilisingjust a small number of features, but it is not useful to distinguishseveral classes of interactions and for using a large amount offeatures. Hence, we improve our previous work (Niemeyer et al.,2013) to obtain lijðxÞ by concatenating features of both nodefeature vectors hiðxÞ and hjðxÞ, and computing some differencesas explained in Section 3.2.1.

In many CRF-based applications, the pairwise potentials arebased on the probability of two neighbouring node labels yi andyj being identical given lijðxÞ (Kumar and Hebert, 2006):

wijðx; yi; yjÞ / pðyi ¼ yjjlijðxÞÞ: ð5Þ

The contrast-sensitive Potts model, which penalises a class changeunless indicated by a change in the data (Schindler, 2012), belongsto this group of models. More complex models can be based on thejoint posterior probability of two node labels yi and yj given lijðxÞ:

wijðx; yi; yjÞ / pðyi; yjjlijðxÞÞ: ð6Þ

These models allow to learn that certain class relations may bemore likely than others given the data. This information is used toimprove the quality of classification, with the drawback of moreparameters which have to be determined. Again, we apply a linearmodel and an RF classifier to obtain the probabilities for theinteractions. Similarly to Eq. (3), the linear model for the pairwisepotential is designed as

wij;LMðx; yi ¼ l; yj ¼ kÞ ¼ expðvTl;k � lijðxÞÞ; ð7Þ

with one edge weight vector vl;k for each label configuration l and kof adjacent nodes ni and nj. The pairwise potential based on RF isdefined by

wij;RFðx; yi ¼ l; yi ¼ kÞ ¼ expðNl;k=nTÞ: ð8Þ

In Eq. (8), Nl;k is the number of votes per interaction for class labels land k. In both cases, if c classes have to be discerned for the nodes ofthe graph, there are c2 local configurations of classes involving twoneighbouring nodes. Thus, the models for the pairwise potentialscorrespond to probabilistic classifiers having to discern c2 classes(each corresponding to a local configuration of classes).

To sum up, two types of CRFs are used in this work. In CRFLM , thepotentials /i;LM and wij;LM are based on linear models, whereas CRFRF

is modelled using RF for both potentials (/i;RF ;wij;RF). A comparisonof these approaches is carried out in Section 3.2.2. In each case theunary and pairwise potentials are weighted equally. A relativeweighting factor can be trained in future work using cross valida-tion, as proposed by Shotton et al. (2009).

2.1.4. Training and inferenceIn the context of graphical models, inference is the task of deter-

mining the optimal label configuration based on maximising pðyjxÞfor given parameters. For the large graph with cycles in our appli-cation exact inference is computationally intractable and approxi-mate methods have to be applied. We use the standard messagepassing algorithm Loopy Belief Propagation (LBP) (Frey andMacKay, 1998) as implemented by Schmidt (2012). Although thistechnique does not ensure convergence to the global optimum, ithas been shown to provide good results in Vishwanathan et al.(2006).

In the training of the linear models the weight vectors have tobe determined. We compare two versions for training of linearmodels in our experiments. For the first one, CRFLM;separate, theweights are concatenated in the two parameter vectorshunary ¼ ½w1; . . . wc�T and hpairwise ¼ ½v1;1 . . . vc;c�T with c classes. Weassume a Gaussian prior for the parameters with zero mean, thush �Nð0;r � IÞ, with standard deviation r = 1. Using this prior,we perform a Bayesian estimation of the parameter vectors byminimising the objective functions Eqs. (9) and (10) separately:

funary ¼ � logðpðhunaryjx; yÞ � pðhunaryÞÞ ð9Þ

fpairwise ¼ � logðpðhpairwisejx; yÞ � pðhpairwiseÞÞ: ð10Þ

In order to minimise these functions, we use the L-BFGS (limitedmemory – Broyden–Fletcher–Goldfarb–Shanno) optimisationmethod. It is a quasi-Newton approach that approximates the in-verse of the Hessian matrix (Liu and Nocedal, 1989).

In this case we obtain the best weights for each single potential,but these do not necessarily match the best combination of bothpotentials. The advantage of this method is that some samplesfor each class and class relation, respectively, can be drawn fromthe fully labelled training area, leading to a speed-up of the learn-ing process.

The other version, denoted as CRFLM;full, is also based on linearmodels, but it determines the parameters simultaneously by con-catenating all weights of wl and vl;k in a single parameter vectorhfull ¼ ½w1; . . . wc; v1;1 . . . vc;c�T , which is determined by optimising

ffull ¼ � logðpðhfulljx; yÞ � pðhfullÞÞ: ð11Þ

For each iteration, the value of ffull, its gradient as well as an estima-tion of the partition function ZðxÞ is required. We used the methoddescribed in Vishwanathan et al. (2006) based on L-BFGS combinedwith LBP for inference. On the one hand, a better classification re-sult might be expected in this case because this approach deliversthe optimal feature weights and results for the given combinationof both potentials. However, in this case, training requires inferenceon the graphical model (Vishwanathan et al., 2006) and, thus, a con-nected part of the training data representing all points and interac-tions. In this case usually more data have to be considered forparameter estimation in the training process compared toCRFLM;separate.

In case of CRFRF , two independent RFs have to be trained for un-ary and pairwise potentials due to the different number of classes.In this study the RF implementation considers the Gini gain fortraining of the trees for CRFRF . The number of the random featuresubset m is set to the square root of all input features, followingGislason et al. (2006). As RF optimise the overall error rate, a classwith many samples might lead to a bias in the training step. Thus,the training set is balanced by randomly selecting the same num-ber of samples for each class by applying downsampling or over-sampling, depending on the actual number of training samplesavailable for each class (Chen et al., 2004).

2.2. Generation of 2D objects

As pointed out in Section 1.2, one of the goals of our work is todetect buildings in the scene. The result of the previous step is a la-belled point cloud. Here we want to derive a 2D representation inthe form of a binary building mask, which can be used to derivepolygons describing the building outlines. For that purpose, a 2Dgrid aligned with the XY plane of the object coordinate is defined,and all points are projected to this grid. However, due to theirregular distribution of the points, some pixels remain empty. Inorder to deliver accurate object masks or boundaries, these holesin the image data must be closed. A simple morphological closing

Page 6: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 157

operation as used in our previous work (Niemeyer et al., 2013) isnot sufficient because not only the holes were closed, but alsosome spaces between two buildings resulting in false positives.On the other hand, single wrongly classified points resulted in er-rors in the object mask which we tried to remove by morphologicalopening in Niemeyer et al. (2012). However, this also removed ob-jects with a small spatial extent. Thus, it turned out to be difficultto find a post-processing step maintaining the correct informationwhile at the same time eliminating outliers. A better approach isneeded.

We build another graphical model to solve this problem. Weuse a grid-based solution, looking for a 2D building mask as the ba-sis for deriving the building outlines. In this case, the pixels of theimage grid correspond to the nodes of the graphical model. Theedges link each pixel to its four direct neighbours on the grid.We use the normalised beliefs pCRF for each node obtained by LBPin the original CRF-based classification to define the unary poten-tials in the second (the MRF-based) classification. To be precise,in each pixel i of the grid, the averages of the CRF beliefs of allpoints falling into this pixel (Np) are computed for each class l,and these average beliefs are used to define the unary potentialsfor these pixels in the MRF. For pixels not containing a single LiDARpoint, we assume all classes to be equally probable (Eq. (12)):

log /i;MRFðx; yi ¼ lÞ ¼1

Np

Xp2Np

pCRFðyp ¼ ljhpðxÞÞ if kNpk > 0

1=c if kNpk ¼ 0

8<:

ð12Þ

In Eq. (12), kNpk is the number of points falling into a pixel and c isthe number of classes to be discerned. Note that compared to theoriginal classification of the point cloud, we might reduce the num-ber of classes to be distinguished in the MRF. For instance, wallswould always appear beneath building roofs and thus would disap-pear in a 2.5D analysis. Thus, some classes are not considered in theMRF. For the excluded classes, the beliefs from the original pointcloud classification are simply not considered. As a consequence,the values used for the unary potentials, while still being consistentwith the general requirements of potentials, are no probabilities be-cause they do not necessarily sum to 1. We choose a multi-class set-ting rather than a binary classification because in the future wewant to expand this method to other objects, e.g. trees.

The pairwise potential is represented by a Potts model (Eq. (13))favouring neighbouring pixels i and j to have the same labels.

log wij;MRFðyi ¼ l; yj ¼ kÞ ¼k if l ¼ k

0 if l–k

�ð13Þ

In Eq. (13), the parameter k expresses the relative weighting of bothpotentials (and hence the degree of smoothing). It is set manually inour experiments. The smoothing effect of the MRF closes holes ofpixels without corresponding LiDAR points. As the unary potentialsof these ’empty’ pixels are initialised on the assumption of an equaldistribution of the class labels, the Potts model might infer theinformation of the neighbouring pixels to this pixel without LiDARpoints. As the parameter of the Potts model is chosen manually andbecause we use the outcomes of the first classification to define theunary potentials, we do not need an additional training step for theMRF. For a CRF a more complex model considering the interactionfeatures would be required, which usually must be trained. This isthe reason why an MRF is applied for this task. Again, we use LBPfor obtaining the optimal configuration of class labels based onthe definition of potentials according to Eqs. (12) and (13). Fromthe final multi-label image the binary building masks are derivedby considering only the building class. In a post-processing step onlybuilding pixels which are classified reliably are maintained in orderto obtain only reliable objects. For that purpose, we compute the

difference between the maximum and the second largest beliefand only maintain building pixels with a difference in belief largerthan a user-defined threshold.

3. Evaluation

This section presents the experiments to evaluate the perfor-mance of our approach. In Section 3.1 we describe the data setwhich is used for evaluation. Section 3.2 is dedicated to the evalu-ation of our CRF-based method for point cloud classification,whereas Section 3.3 presents the evaluation of our 2D buildingdetection approach.

3.1. Study area

The performance of our method is evaluated on the LiDAR dataset of Vaihingen, Germany (Cramer, 2010), in the context of theISPRS Test Project on Urban Classification and 3D Building Recon-struction (Rottensteiner et al., 2012). It was acquired in August2008 by a Leica ALS50 system with a mean flying height of500 m above ground and a 45� field of view. The average strip over-lap is 30% and the point density in the test areas is approximately8 points/m2. Multiple echoes and intensities were recorded. How-ever, only very few points (2.3%) are multiple returns, as the acqui-sition was in summertime under leaf-on conditions. Hence, thevertical point distribution within trees is such that most points de-scribe only the canopy.

For the benchmark, three test sites with different scenes areconsidered (Fig. 1). Area 1 is situated in the centre of the city ofVaihingen. Dense, complex buildings and some trees characterisethis test site. Area 2 consists of a few high-rising residential build-ings surrounded by trees. In contrast, Area 3 is a purely residentialneighbourhood with small, detached houses.

As the benchmark only provides reference data for 2D objects,we manually labelled the point cloud of the three test areas to en-able an evaluation of the 3D classification results. The combinedpoint cloud consists of 780,879 points. We discern the followingseven object classes: grassland (22.6%), road (27.6%), building withgable roof (15.3%), low vegetation (6.4%), façade (4.2%), building withflat roof (6.3%), and tree (17.6%), where the numbers in bracketsgive the distribution of the object classes in the combined refer-ence point cloud. The class low vegetation also contains the cars.In order to train the CRFs, a training area consisting of 263,368 la-belled points to the south east of Area 1 is used. All experimentsare performed with Matlab on a computer with Intel Core i7,2.80 GHz CPU and 16 GB RAM.

3.2. 3D point cloud classification

In Section 3.2.1 the definition of the site-wise feature vectors isgiven. This is followed by a comparison of different versions of ourCRF-based approach in Section 3.2.2. The results of 3D point cloudclassification are presented in Section 3.2.3. In Section 3.2.4 weanalyse the importance of the features used for nodes andinteractions.

3.2.1. FeaturesWe adapted some of the LiDAR features presented in Chehata

et al. (2009) and also used some additional ones which we considerto be well-suited this particular classification task. Note that onlyLiDAR features are utilised for the classification of the point cloud.We do not take into account features obtained from the opticalimages which are also available for this area. The following fea-tures are used:

Page 7: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

Table 1Comparison of three versions of our CRF-based classifier (CRFLM;separate; CRFLM;full;CRFRF ).For the 7 classes the completeness and correctness rates are presented. The bestresults are highlighted in bold. RF outperforms the other versions in nearly all values.

CRFLM;separate CRFLM;full CRFRF

Overall accuracy 75.7% 76.5% 80.6%Kappa index 0.70 0.71 0.76Grassland 76.3/79.4% 75.6/77.0% 82.2/81.0%Road 88.3/86.6% 87.1/84.8% 88.1/91.1%Building (gable) 83.1/85.6% 90.2/80.2% 91.1/91.2%Low vegetation 69.3/47.1% 63.8/46.9% 77.2/49.6%Façade 47.3/43.5% 39.0/63.5% 52.9/52.8%Building (flat) 91.5/52.0% 78.2/58.3% 90.3/63.4%Tree 52.0/90.0% 61.6/87.1% 61.7/91.3%

Fig. 1. Test sites of scene Vaihingen. ‘Inner City’ (Area 1, left), ‘High-Riser’ (Area 2, middle) and ‘Residential’ (Area 3, right) (Rottensteiner et al., 2012).

158 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

1. intensity;2. ratio of echo number per point and number of echoes in the

waveform;3. height above DTM;4. approximated plane (points in a spherical neighbourhood of

radius r are considered): sum, mean and standard deviationof residuals, direction and variance of normal vector;

5. variance of point elevations in a cylinder and in a sphere ofradius r;

6. ratio of point density in a cylinder and a sphere of radius r;7. Eigenvalue-based features in a sphere of radius r: 3 eigen-

values (k1; k2; k3), omnivariance, planarity, anisotropy, sphe-ricity, eigenentropy, scatter (k1/k3) (Chehata et al., 2009);

8. point density in a sphere of radius r;9. principal curvatures k1 and k2, mean and Gaussian curva-

ture in a sphere of radius r;10. variation of intensity, omnivariance, planarity, anisotropy,

sphericity, point density, number of returns, k1; k2, meancurvature, and Gaussian curvature in a sphere of radius r.

The DTM for the feature height above DTM is generated using ro-bust filtering (Kraus and Pfeifer, 1998) as implemented in the com-mercial software package SCOP++1. The features considering thelocal point distribution within a sphere or a cylinder are computedfor multiple scales with radii r= 1, 2, 3, and 5 m. The number of scaleswas chosen empirically; using more scales did not improve the clas-sification results. In total the feature vector hiðxÞ for node ni used forthe 3D classification consists of 131 entries. Note that for someexperiments (Section 3.2.2) only a subset of these features is used.

For the interactions a feature vector lijðxÞ is required for eachedge eij. As discussed in Section 2.1.3, the difference of both nodefeature vectors hiðxÞ and hjðxÞ is not promising in this case. Theheight difference is an important piece of information, but in addi-tion the actual height above ground is needed, for instance, to dis-tinguish between a relation of points on a roof or on the road level.Hence we concatenate the original feature vectors obtained forscale r=1 m of both nodes, and additionally compute the differ-ences of the elevation and intensity values. Both points have verysimilar local neighbourhoods, thus considering the other scales inaddition would not contribute significant information to supportthe classification. As a consequence, each interaction feature vectorlijðxÞ consists of 72 elements.

3.2.2. Comparison of linear models and Random ForestsIn this Section we compare the three versions of our CRF-based

classifier introduced in Section 2.1. We classified the three test

1 http://www.trimble.com/imaging/inpho/geo-modeling.aspx?dtID=SCOP.

areas independently using each version of our classifier. In allexperiments, each point was linked to its three nearest neighboursin 2D in the graphical model. In all cases we used only the 35 fea-tures with the scale r=1 m, because the versions based on the lin-ear model for the potentials require very long computation timeswhen the number of features is large. For the versions based onthe linear model we normalised the features for all points by sub-tracting the mean values and dividing by the standard deviations.

The first CRF version based on a linear model, CRFLM;separate, trainsthe unary and pairwise potentials independently from each other.This allows us to select only a subset of samples from the trainingset. For this test we consider 2000 randomly drawn samples perclass and class relation, respectively. These samples do not haveto be neighboured in this case. Training results in the best weightsfor each single potential, but these weights do not necessarily cor-respond to best combination of both potentials in the graphicalmodel.

In the second CRF version based on a linear model, CRFLM;full, theweights for both potentials are determined simultaneously. In thiscase, a complete part of the training data must be used. We utiliseda part of the entire training point cloud consisting of 156,667points, which resulted in a longer training time than the one re-quired for CRFLM;separate. This approach was also applied in Niemeyeret al. (2011).

The third CRF version to be compared in this section is based onRF (CRFRF). As described in Section 2.1.4 we make use of two inde-pendent RF classifiers for the unary and the pairwise potentials.They are trained separately, so that in this respect this approachis comparable to CRFLM;separate. This is why we use the same 2000randomly drawn samples per class and class interaction for thetraining of the RF classifiers as we use for training in the version

Training time 60.1 min 459.4 min 20.0 minClassification time 81.3 min 75.3 min 3.4 min

Page 8: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 159

CRFLM;separate. We use RF consisting of 300 trees for both potentials, avalue that was found empirically.

The results of the comparison for all three study areas can beseen in Table 1. The table shows the overall accuracies (OA), kappaindices, completeness and correctness rates for all classes as wellas the time required for training and classification. The experimentreveals that the approach CRFLM;separate achieves only slightly worseresults than CRFLM;full, in which both potentials were optimisedsimultaneously. The difference in OA is only 0.8% (75.7% and76.5%). CRFRF outperforms both versions based on a linear modelwith an OA of 80.6% (+4.1% compared to CRFLM;full). The variationsof the kappa indices are in a similar range. In most cases the com-pleteness and correctness values of the seven classes are the bestfor RF. Especially buildings with gable roofs, grassland and low vege-tation benefit from the RF approach. Concerning the time neededfor training and classification, RF is much faster than both linearmodel methods (in total 23.4 min compared to 141.4 min(CRFLM;separate) and 534.7 min (CRFLM;full), respectively). In particularthe classification time is significantly longer for the variants basedon a linear model for the potentials. The computation of potentialsas well as LBP require more time. The latter aspect might indicatethat the classes cannot be separated well by a linear decision sur-face in this case.

To sum up, CRFRF is the most accurate and fastest method in ourcomparison. This is why we use this version in the experiments inall subsequent sections. As an RF can cope with a large amount offeatures, from now on we will be able to incorporate multi-scalefeatures in the way described in Section 3.2.1. Concerning the vari-ants based on a linear model for the potentials, CRFLM;separate seemsto be a better choice than CRFLM;full because it is faster, works withindependent training samples, and leads to comparable results.

3.2.3. Classification resultsThe result of CRF classification is the assignment of an object

class label to each LiDAR point. For the experiments described inthis section we use the CRF based on RF (CRFRF). Again, the graphis built by linking each point to its three nearest neighbours in2D. For both the unary and the pairwise potentials we use RF con-sisting of 300 trees, a number that was found empirically. In con-trast to the experiment in Section 3.2.2 a larger amount offeatures and more samples are used. Following the guidelines out-lined in Section 2.1.4, a random subset of 11 features is used for thetests in the tree nodes. Accordingly, this subset has size 8 for pair-wise potential with 72 features in total. For the training of the un-ary potentials we used 3000 training samples per class. Another setof 3000 samples per class relation was used to train the pairwisepotential, having to discriminate 49 different class interactions.The results of the 3D point cloud classification are depicted inFig. 2.

A quantitative evaluation of the results based on the referencegenerated by manual labelling shows that the method CRFRF

achieves a mean OA of 83.4% and a mean kappa index of 0.80(0.75, 0.82, and 0.79 for Areas 1, 2, and 3, respectively) for the threetest areas. Keeping in mind that we differentiate a larger number ofclasses than comparable studies, e.g. Chehata et al. (2009), we con-sider this results to be rather good, given the challenging environ-ment and the fact that the LiDAR data were captured at leaf-onconditions. If we differentiate only the three classes building (merg-ing gable roof, flat roof and façade), ground (merging grass and road)and vegetation (merging tree and low vegetation), we achieve an OAof 93.3%. Area 1 is the most challenging scene with 80.3% OA, thebest result is obtained in Area 2 with 85.7%. The confusion matrixin Table 2 also presents the completeness and correctness valuesfor each class. The classes with many points such as grassland,roads, buildings and trees are detected relatively well. Especiallythe class gable roof obtains very good completeness and correct-

ness values of 93.7%. More difficult to differentiate are the classeslow vegetation and façades which are less dominant in the threescenes. Class low vegetation has a rather low correctness of 50.9%due to a relatively large number of tree and grassland points beingincorrectly assigned to this class. However, in this case even thegeneration of the reference data was difficult for a human operator,so the error might be partly explained by errors in the reference.The low correctness of façades is a consequence of the verticallydistributed points belonging to the boundaries of trees beingwrongly labelled as façades due to similar features. Another eye-catching example of challenges is the confusion between treesand gable roofs. This is because many points on trees are locatedon the canopy and show similar features to building roofs: the dataset was acquired under leaf-on conditions. Almost the complete la-ser energy was reflected from the canopy, and multiple pulseswithin trees were recorded only very rarely (about 2.3%). Most ofour features consider the point distribution within a local neigh-bourhood. For tall trees with large diameters the points on the can-opies define a very smooth surface. The deviations from a localplane are not larger than those of a gable roof. An example of thiseffect is presented in Fig. 3. Some significant confusion of road andgrassland is also observed. It explains about one third of all errors.We think that the intensity is important to distinguish these twoclasses, though this cannot be underlined by the feature impor-tance analysis (which, if carried out on a per-class-level as in Cheh-ata et al. (2009), can only show which features are most suitable todifferentiate a class from all other classes). Intensity is sensitive toincidence angle of the laser beam. We did not carry out any correc-tion of the raw intensities delivered with the ISPRS benchmarkdata set; assessing the impact of such a correction could be a partof our future work. In addition, distinguishing these classes wasnot even clear when digital orthophotos were used along with Li-DAR data for generating the reference. However, the results forboth classes (P80.9%) obtained only from LiDAR data are quitegood taking into account the absence of multi-spectral informa-tion. The relatively large number of grass points classified as lowvegetation may also be partly explained by inaccurate referencedata. Nevertheless, we consider our results to be rather good, withmost completeness or correctness values better than 80% or even90%. Even small details, for instance some garages and pavilionsin Area 2 and 3, are detected accurately and most of the car pointsare correctly labelled as low vegetation (corresponding to the classdefinition). This is an advantage of applying the point-based con-textual classification approach. In this experiment training took66 min and the computation time for inference was fast with ap-prox. 0.8, 1.2 and 1.6 min for the three test areas, respectively.

We also assess the influence of integrating contextual relationsinto the classification process. For this purpose, we compare the re-sults of the classification using our method CRFRF with a classifica-tion solely using the unary potential. The latter corresponds to astandard RF classification of the point cloud in which each pointis labelled individually from its neighbours. The interactions in-crease the OA of the three areas by 2.0% in average (1.8%, 2.7%,and 1.6% for Areas 1, 2, and 3, respectively). This does not look verymuch at a first glance, which to a certain degree can be attributedto the facts that RF is per se a strong classifier and that context isimplicitly considered in the RF-based classification by using mul-ti-scale features. However, at a second glance, the pairwise poten-tials do have a non-negligible effect, namely in improving thequality of the results for some of the classes. The differences ofthe completeness and correctness rates for all classes are presentedin brackets in Table 2. Positive values indicate a better result of theCRF considering the pairwise terms. It can be seen that most ratesare improved. Especially for the ground classes grassland and roadas well as for low vegetation, completeness and correctness in-crease by up to 5.3%. The best improvement of 10.3% can be

Page 9: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

Fig. 2. 3D view of the classification results for version CRFRF for the three study areas with seven classes: grassland (khaki), road (grey), building with gable roof (purple), lowvegetation (light green), façade (dark purple), building with flat roof (orange), and tree (green).

Table 2Confusion matrix obtained by 3D point cloud classification of the three areas with correctness and completeness values in (%), discerning the classes grassland (Grass), road (Road),building with gable roof (GR), low vegetation (LV), façade (Faç), building with flat roof (FR), and tree (Tree). The numbers in brackets show the changes by considering the interactionsare given compared to the classification based on unary potentials solely. Positive values represent improvements by context. Due to the interactions the OA increases from 81.4%to 83.4%. Compared to Table 1, more features and more training samples are considered in this experiment, which is the reason why the completeness, correctness and OA valuesdo not match.

160 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

obtained for the correctness of façade, which significantly benefitsby incorporating context represented by the interactions of points.To sum up, modelling of interactions is useful to obtain a more reli-able classification compared to points being labelled individually.

3.2.4. Feature importanceBased on a permutation importance measure (Breiman, 2001),

the relevance of features can easily be obtained by RF aside fromclassification. This kind of analysis has for instance been performed

Page 10: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

Fig. 3. Tree points are wrongly classified as gable roof.

Table 4The 14 most important node features, ordered by their rank according to featureimportance.

Rank Feature

1 Height above DTM2 Variance of point elevations (cylinder, r ¼ 1 m)3 Direction of normal vectors (sphere, r ¼ 1 m)4 Direction of normal vectors (sphere, r ¼ 2 m)5 Ratio of point density (sphere, r ¼ 1 m)6 Variance of intensity (sphere, r ¼ 1 m)7 Variance of point elevations (sphere, r ¼ 1 m)8 Variance of point density (sphere, r ¼ 1 m)9 Intensity10 Variance of point elevations (sphere, r ¼ 2 m)11 Direction of normal vectors (sphere, r ¼ 3 m)12 variance of omnivariance (sphere, r ¼ 1 m)13 Variance of principal curvature k2 (sphere, r ¼ 1 m)14 Variance of normal vectors (sphere, r ¼ 1 m)

J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 161

by Chehata et al. (2009), who investigated the relevance of featuresfor the classification of a FW point cloud. In our case, we addition-ally learn interactions between neighbouring points, and thus weare able to analyse and compare the feature importance for boththe nodes and the edges of our CRF. Feature importance can be gi-ven for each variable and for each class. However, the RF classifierthat is the basis for the pairwise terms has to distinguish 49different class relations, so that the presentation of feature impor-tance values per class would become very confusing. For the pur-pose of clarity, we mainly focus on the overall importance valuesper feature over the entire forest, which can be computed by thesum of the differences of accuracy in tree ti achieved with the cor-rect feature values and after permuting the values of that feature.

The 10 most relevant features based on the overall importancevalues (in percent) for the nodes and interactions are presented inTable 3. It can easily be seen that the height above DTM is by far themost important one. It is the strongest and best discerning featurefor all classes and relations. All the other ones are less important.Note that only the scale r=1 m was used for the computation ofthe interaction features. Both node feature vectors are concate-nated, and each corresponding feature is found with a similarimportance vector. Additionally to the absolute elevation valuesthe difference of heights is also important, but the difference of inten-sity does not seem to contribute much information (rank 40 withan importance value of 1.2%).

Moreover, the relevant features are nearly the same for nodesand interactions: For both the intensity values and its variations,the direction of the normal vectors, and the variation of height valuesin a local neighbourhood are relevant. On the contrary the featureswhich are delivered from echo number and number of echoes, suchas variation of returns and echo ratio, are hardly important for theclassification of these three areas. The reason might be that most

Table 3Overview of ten most important features for the classification of nodes andinteractions, respectively, ordered by their overall importance value (mean decreasein Gini index) obtained by RF. SP corresponds to the startpoint and EP to the endpointof an edge.

Rank Nodes Edges

Imp. (%) Feature Imp. (%) Feature

1 10.31 Height above DTM 5.07 Height above DTM (SP)2 3.81 Normal (1 m) 5.06 Height above DTM (EP)3 3.38 Normal (2 m) 4.83 Height Difference4 2.46 Var. Z in sphere (1 m) 2.74 Var. Z in cylinder (SP)5 2.26 point density ratio (1 m) 2.63 Var. Z in cylinder (EP)6 2.15 Var. Z in sphere (2 m) 2.21 Point density ratio (SP)7 2.08 Normal (3 m) 2.14 Point density ratio (EP)8 1.97 Intensity 2.14 Var. Intensity (SP)9 1.92 Var. Z in cylinder (1 m) 2.12 Var. Intensity (EP)

10 1.70 Normal (5 m) 2.09 Var. normal vector (SP)

of the points are single returns and only very few points are multi-ple returns.

As we have seen that the height is the most important feature ingeneral (Table 3), we applied an experiment with all features ex-cept the height above DTM. The OA is still surprisingly high with74.4%, and thus 9% worse than the classification with the heightfeatures. The reasons are flat building roofs (e.g. the high-riser inArea 2) being erroneously assigned to road, and vice versa. Withoutconsidering the height, both classes have very similar features (forexample same direction of the normal vector) and cannot be sepa-rated correctly any more. Flat roofs have a correctness of 49.7% andcompleteness of 48.5%. The other classes are mostly identified cor-rectly. This shows that the feature height above DTM can be ne-glected if there is no discrimination between both classes flatroof and road required. It is relatively expensive in computationas the raw point cloud has to be filtered first to derive a DTM. Analternative would be to approximate the DTM by the height ofthe lowest point within a cylinder of a large radius centred at a gi-ven point (Mallet, 2010). However, this is only suitable for flat ter-rain. In particular the terrain of Areas 1 and 2 is characterised byseveral ground levels with different elevations, which is the reasonwhy we used a standard filtering method to derive the DTM.

In another experiment we performed a 3D classification onlywith the most important features. For that purpose, the node andinteraction features were sorted by their importance values. Notethat for the edge features, the importance values for the startand end points was summed up to determine the correct orderin this case. After ordering the features, we repeatedly classifiedthe point cloud, using the nF most important features according

Fig. 4. Overall accuracy and computation time for training as a function of thenumber the nF most important features used for classification.

Page 11: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

Table 5Difference D in completeness and correctness values for the classification with allfeatures compared to the classification taking into account only the 14 best features.Positive values indicate improvements by using all features. By considering only the14 most important features, the OA decreases by 0.5% (from 83.4% to 82.9%).

Classes D Completeness (%) D Correctness (%)

Grassland 5.2 �2.6Road �3.0 4.0Gable roof 4.0 �2.0Low vegetation �2.1 2.1Façade 4.8 �14.1Flat roof 0.9 0.2Tree �3.3 3.9

162 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

to the sorted list and varying nF from 1 to 29. Fig. 4 shows the OA ofclassification and the computation times as a function of nF .

Using only the most important feature (height above the ter-rain) resulted in an OA of about 50%. Adding new features initiallylead to a sharp increase in OA, but after including about 10–14 fea-tures, a saturation effect can be observed in Fig. 4. We found thatusing the 14 most important features for the unary potential (cf.Table 4) was a good trade-off between accuracy and computationtime. Correspondingly, the interaction feature vector lijðxÞ consists

Fig. 5. Label images obtained by MRF based on the 3D classification beliefs. Four class(green).

Fig. 6. Pixel-wise result of class building

of 23 features (2� 11 node features at scale r ¼ 1 m and height dif-ference). In this case the OA is 82.9%. Compared to the classificationexploiting all features, this is a slight decrease of 0.5%, but usingadditional features only results in a very slow increase in OA. Aninteresting observation is that no feature from scale r ¼ 5 m is con-tained in the list in Table 4. A comparison of the completeness andcorrectness per object class to those that can be achieved using all131 node features is given in Table 5. Positive values indicate abetter performance of using all features. Eight of the 14 values ben-efit from utilising all features. Particularly the completeness andcorrectness of the class gable roof require information from the fea-tures neglected in this experiment; the values slightly decrease inaccuracy by up to 0.9% with the smaller feature subset. On theother hand, the correctness of façades is improved by 14.1% withless features, whereas the corresponding completeness rate de-creases by 4.8%. For most classes, the differences in Table 5 indicatea trade-off between completeness and correctness. The exceptionsare grassland and gable roof with about 2% improvement and faç-ades with about 2% decrease in quality as a trade-off between com-pleteness and correctness (cf. Eq. (14)). Using a smaller amount offeatures leads to a shorter training time. Considering only 14 fea-tures, training took 15.4 min, which is less than 1/4 of the time re-quired for training if all features are used (cf. Section 3.2.3). Taking

es are discerned in 2D: grassland (khaki), road (grey), building (purple), vegetation

s (yellow = TP, red = FP, blue = FN).

Page 12: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

Fig. 7. The point density on right building roof is very low due to the reflectance. Hence the building object is challenging to detect in 2D because of many empty pixels.

Table 6Evaluation results (%): completeness, correctness, quality.

Building Object Object P 50 m2 Per area

Area A1/A2/A3 A1/A2/A3 A1/A2/A3Completeness 86.5/85.7/83.9 100/100/100 90.8/91.4/91.6Correctness 89.2/63.2/94.1 96.6/100/100 94.5/96.4/96.7Quality 78.3/57.1/79.7 96.6/100/100 86.3/88.4/88.9

J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 163

into account all features improves the result only slightly, but thisgain comes along with a significantly higher computational cost.

3.3. 2D building objects

The results of the 3D classification with all features (Sec-tion 3.2.3) serve as input for the generation of 2D building masks.For that purpose, a grid with a pixel size of 0.5 m is defined. TheLiDAR points are projected to the xy-plane in order to determinethe pixel they correspond to. Using a relatively large pixel sizecompared to the point density of about 8 points/m2 reduces thenumber of ’empty’ pixels. Within each pixel, the averages of all be-liefs per class are computed in the way described in Section 2.2. Forthe 2D representation the class façades containing vertical objectsis neglected. Moreover, the class low vegetation is aggregated withtrees, and gable roofs with flat roofs by adding the correspondingbeliefs. Thus, we distinguish between road, grassland, vegetation,and buildings. The beliefs of façades are not considered any more.Using a multi-class approach enables to extract other object clas-ses, such as trees, easily. However, in this investigation we focuson the buildings. A smoothing of the object borders and filling ofthe holes without LiDAR points is performed using a Potts model(cf. Section 2.2). Both the unary and pairwise potentials are equallyweighted, hence the weighting factor in Eq. (12) is set to k = 1 man-ually. This enables to compensate the less meaningful unary poten-tials in data gaps by smoothing these areas. Inference by LBP takesonly a few seconds.

From the resulting label image consisting of 4 classes, we de-rived the binary object masks by considering only the buildingclass. In a post-processing step, for each building candidate pixel,the difference to the second largest belief is determined. Candi-dates with a difference smaller than 50% are eliminated in orderto obtain only reliable objects. Fig. 5 shows the multi-label imagesfor the three test areas. These label images are evaluated in thecontext of the ISPRS Test Project based on a 2D reference, usingthe method described by Rutzinger et al. (2009). The evaluation re-sults are depicted in Fig. 6.

In Fig. 6, yellow areas represent the true positives (TP), blue cor-respond to false negative (FN) and red areas to false positive detec-tions (FP). It becomes evident that the majority of the buildingswere detected correctly. Thus, the proposed approach works well

for the detection of buildings. Most of the FPs are caused by treeareas wrongly classified as building. This is due to the similar fea-tures, as mentioned before: the LiDAR points covering trees aremainly distributed on the canopy and not within the trees, whichleads to a nearly horizontal and planar point distribution. The rel-atively large FN area of the building situated in the north of Area 3is covered by only very few points due to the properties of the roofmaterial, as can be seen in Fig. 7. As a consequence, it is challengingto recover the building outlines only based on the LiDAR pointcloud. Most of the corresponding pixels of the binary image areempty, because no 3D point is projected to these pixels. This effectcan partly be compensated by the MRF (Fig. 5(c)), but the beliefsfor class building are too low for the threshold used in the post-pro-cessing step (difference >50%) to eliminate unlikely object pixels.However, our approach is nevertheless able to detect the largestpart of this building. Looking at the quantitative evaluation resultsin Table 6, we see that the area-based completeness and correct-ness values for buildings are between 90.8% and 96.7%. The Quality,which is defined as (Rutzinger et al., 2009):

Quality ¼ 1

Completeness�1 þ Correctness�1 � 1; ð14Þ

takes values from 86.3% to 88.9%. The object-based metrics, count-ing a building as a TP if at least 50% of its area is contained in thereference, can also be seen in Table 6. Having a look at the object-based evaluation results reveals that the buildings in Area 1 and 3were detected reliably with completeness and correctness valuesbetween 83.9% and 94.1%. The objects in Area 2 suffer from a rela-tively poor correctness value of 63.2%, whereas the completenesswas 85.7%. As already mentioned, the FP were caused by some smallmisclassifications of trees labelled as building. This leads to 5 FPscompared to only 14 reference building objects in the scene. Thelow number of objects quickly affects the correctness value. Consid-ering only building objects with areas larger than 50 m2 all objectsin Area 2 and 3 were detected correctly with 100% completenessand correctness. Only in Area 1 there is one larger FP area (againtwo neighbouring trees labelled as building), which results in96.6% quality. We conclude that buildings, especially the largerones, can be identified reliably by the proposed method.

4. Conclusions

In this paper, we have presented a context-based CRF classifierfor urban LiDAR point clouds. The result of our classification is a la-belled 3D point cloud; each point is assigned to one of seven objectclasses. No segmentation is performed. The point cloud is repre-sented by a graphical model, making use of a complex model forthe interaction potentials in which prominent relations betweenobject classes and the data are learned in a training step. They sup-port the classification process and improve the results. Our exper-

Page 13: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

164 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

iment revealed that the overall accuracy increased from 81.4% to83.4% by considering these interactions compared to a indepen-dent classification of single points. Even small objects such as gar-ages and pavilions are detected correctly. A comparison of threedifferent versions of a CRF-based classifier has shown that RandomForests are well suited for the computation of unary and pairwisepotentials needed for CRFs. They are faster, more accurate, and ableto handle a large amount of features compared to the versionsbased on linear models. An analysis of the feature importance val-ues delivered by RF was carried out both for the node features andfor the interaction features. In both groups the relevant featuresare nearly the same. The most important one is the height aboveDTM feature. As shown by an additional experiment, the use of alarger amount of (multi-scale) features increases the accuracy onlyslightly by 0.5% compared to a classification based on the 14 mostimportant features, which comes along with a significantly highercomputational effort. In summary, it can be stated that CRFs pro-vide a high potential for urban scene classification.

A second stage of the work flow uses the CRF beliefs for eachpoint in a Markov Random Field to derive a 2D multi-label image,which is used to define building objects. Evaluation is performed inthe context of the ISPRS Test Project on Urban Classification and 3DBuilding Reconstruction hosted by ISPRS WG III/4 (Rottensteineret al., 2012). It can be seen that very good per-area quality values(completeness and correctness >90%) are obtained. On a per-objectlevel especially the large buildings are detected very reliably. Con-sidering all objects some false positives have a negative impact onthe correctness of buildings in Area 2. The buildings in the othertwo areas are reliably detected with completeness and correctnessrates of >83.9%.

In future work we want to set up a hierarchical CRF for the 3Dclassification. The points should be aggregated to objects, on whichthe high-level CRF is applied to model the interaction betweenthese objects. Both levels should interact and may influence thedecision of a single point’s classification. Moreover there are stillsome confusion errors of tree points wrongly classified as roof. Tocope with this problem, better discriminating features as well asan optimisation of the graph structure will be investigated. Onestrategy to be pursued to achieve a better discrimination of grass-land and road may be to apply a radiometric calibration based onthe incidence angle to the intensity feature. In order to improvethe building outlines of the 2D building object images, an incorpo-ration of the 3D façade points as additional hint might be helpful.Finally, as generating training data is a tedious process, we intendto carry out tests concerning the amount of training data requiredfor a good classification performance in order to see whether onecould do with less training data than those used in ourexperiments.

Acknowledgements

The authors would like to thank the anonymous reviewers fortheir valuable comments which certainly helped to improve thispaper. The Vaihingen data set was provided by the German Societyfor Photogrammetry, Remote Sensing and Geoinformation (DGPF)(Cramer, 2010): http://www.ifp.uni-stuttgart.de/dgpf/DKEP-Allg.html.

References

Abhishek, J., 2009. Classification and Regression by Randomforest-matlab <http://code.google.com/p/randomforest-matlab>.

Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., Ng, A., 2005.Discriminative learning of markov random fields for segmentation of 3d scandata. In: Proceedings of the 2005 IEEE Conference on Computer Visionand Pattern Recognition (CVPR). IEEE Computer Society, San Diego, USA, pp.169–176.

Bishop, C.M., 2006. Pattern Recognition and Machine Learning, vol. 1. Springer, NewYork.

Boykov, Y.Y., Jolly, M.P., 2001. Interactive graph cuts for optimal boundary & regionsegmentation of objects in nd images. In: Proceedings of the Eighth IEEEInternational Conference on Computer Vision (ICCV), 2001. IEEE, Vancouver,Canada, pp. 105–112.

Breiman, L., 2001. Random forests. Machine Learning 45, 5–32.Chan, J.C.W., Paelinckx, D., 2008. Evaluation of random forest and adaboost tree-

based ensemble classification and spectral band selection for ecotope mappingusing airborne hyperspectral imagery. Remote Sensing of Environment 112,2999–3011.

Chehata, N., Guo, L., Mallet, C., 2009. Airborne lidar feature selection for urbanclassification using random forests. In: International Archives ofPhotogrammetry, Remote Sensing and Spatial Information Sciences. Paris,France, pp. 207–212.

Chen, C., Liaw, A., Breiman, L., 2004. Using Random Forest to Learn Imbalanced Data.Technical Report. University of California, Berkeley.

Cramer, M., 2010. The DGPF-test on digital airborne camera evaluation – overviewand test design. Photogrammetrie-Fernerkundung-Geoinformation 2010, 73–82.

Dorninger, P., Pfeifer, N., 2008. A comprehensive automated 3d approach forbuilding extraction, reconstruction, and regularization from airborne laserscanning point clouds. Sensors 8, 7323–7343.

Edelsbrunner, H., Mücke, E.P., 1994. Three-dimensional alpha shapes. ACMTransactions on Graphics 13, 43–72.

Frey, B., MacKay, D., 1998. A revolution: belief propagation in graphs with cycles. In:Advances in Neural Information Processing Systems, 1–6 December 1997, vol.10. MIT Press, Denver, USA, pp. 479–485.

Geman, S., Geman, D., 1984. Stochastic relaxation, gibbs distributions, and thebayesian restoration of images. IEEE Transactions on Pattern Analysis andMachine Intelligence 6, 721–741.

Gislason, P.O., Benediktsson, J.A., Sveinsson, J.R., 2006. Random forests for landcover classification. Pattern Recognition Letters 27, 294–300.

Hoberg, T., Rottensteiner, F., Heipke, C., 2012. Context models for CRF-basedclassification of multitemporal remote sensing data. In: ISPRS Annals ofPhotogrammetry, Remote Sensing and Spatial Information Sciences, 25August–1 September. Melbourne, Australia, pp. 128–134.

Huang, H., Brenner, C., Sester, M., 2013. A generative statistical approach toautomatic 3d building roof reconstruction from laser scanning data. ISPRSJournal of Photogrammetry and Remote Sensing 79, 29–43.

Kraus, K., Pfeifer, N., 1998. Determination of terrain models in wooded areas withairborne laser scanner data. ISPRS Journal of Photogrammetry and RemoteSensing 53, 193–203.

Kumar, S., Hebert, M., 2006. Discriminative random fields. International Journal ofComputer Vision 68, 179–201.

Ladicky, L., Russell, C., Kohli, P., Torr, P.H., 2013. Inference methods for crfs with co-occurrence statistics. International Journal of Computer Vision 103, 213–225.

Lafarge, F., Mallet, C., 2012. Creating large-scale city models from 3d-point clouds: arobust approach with hybrid Representation. International Journal of ComputerVision 99, 69–85.

Li, S.Z., 2009. Markov Random Field Modeling in Image Analysis. Springer.Lim, E., Suter, D., 2007. Conditional random field for 3d point clouds with adaptive

data reduction. In: International Conference on Cyberworlds, 24–26 October.Hannover, Germany, pp. 404–408.

Lim, E., Suter, D., 2009. 3d terrestrial LIDAR classifications with super-voxels andmulti-scale conditional random fields. Computer Aided Design 41, 701–710.

Liu, D., Nocedal, J., 1989. On the limited memory BFGS method for large scaleoptimization. Mathematical Programming 45, 503–528.

Liu, C., Shi, B., Yang, X., Li, N., Wu, H., 2013. Automatic buildings extraction fromLiDAR data in urban area by neural oscillator network of visual cortex. IEEEJournal of Selected Topics in Applied Earth Observations and Remote Sensing 6,2008–2019.

Lucchi, A., Li, Y., Boix, X., Smith, K., Fua, P., 2011. Are spatial and global constraintsreally necessary for segmentation? In: IEEE International Conference onComputer Vision (ICCV) 2011. IEEE, Barcelona, Spain, pp. 9–16.

Lucchi, A., Li, Y., Smith, K., Fua, P., 2012. Structured image segmentation usingkernelized features. In: 12th European Conference on Computer Vision (ECCV2012). Springer, Florence, Italy, pp. 400–413.

Lu, W., Murphy, K., Little, J., Sheffer, A., Hongbo, F., 2009. A hybridconditional random field for estimating the underlying ground surface fromairborne LiDAR data. IEEE Transactions on Geoscience and Remote Sensing 47,2913–2922.

Mallet, C., 2010. Analysis of Full-Waveform Lidar Data for Urban Area Mapping.Ph.D. thesis. Télécom ParisTech.

Mayer, H., 2008. Object extraction in photogrammetric computer vision. ISPRSJournal of Photogrammetry and Remote Sensing 63, 213–222.

McLaughlin, R.A., 2006. Extracting transmission lines from airborne LIDAR data.IEEE Geoscience and Remote Sensing Letters 3, 222–226.

Mountrakis, G., Im, J., Ogole, C., 2011. Support vector machines in remote sensing: areview. ISPRS Journal of Photogrammetry and Remote Sensing 66, 247–259.

Munoz, D., Vandapel, N., Hebert, M., 2008. Directional associative markov networkfor 3-D point cloud classification. In: International Symposium on 3D DataProcessing, Visualization and Transmission (3DPVT), 18–20 June. Atlanta, USA,pp. 1–8.

Niemeyer, J., Wegner, J., Mallet, C., Rottensteiner, F., Soergel, U., 2011. Conditionalrandom fields for urban scene classification with full waveform LiDAR data. In:Photogrammetric Image Analysis (PIA). Springer, Munich, Germany, pp. 233–244.

Page 14: ISPRS Journal of Photogrammetry and Remote Sensinghy471/papers/Contextual... · of photogrammetry and remote sensing. Some exemplary applica-tions are multi-temporal image classification

J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 165

Niemeyer, J., Rottensteiner, F., Soergel, U., 2012. Conditional random fields for lidarpoint cloud classification in complex urban areas. In: ISPRS Annals ofPhotogrammetry, Remote Sensing and Spatial Information Sciences,Proceedings XXII ISPRS Congress (TC III), 25 August–1 September. Melbourne,Australia, pp. 263–268.

Niemeyer, J., Rottensteiner, F., Soergel, U., 2013. Classification of urban LiDAR datausing conditional random field and random forests. In: IEEE Proceedings of theJoint Urban Remote Sensing Event (JURSE), 21–23 April. São Paulo, Brasil, pp.139–142.

Nowozin, S., Rother, C., Bagon, S., Sharp, T., Yao, B., Kohli, P., 2011. Decision treefields. In: IEEE International Conference on Computer Vision (ICCV), 2011. IEEE,Barcelona, Spain, pp. 1668–1675.

Poullis, C., 2013. A framework for automatic modeling from point cloud data.IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 2563–2574.

Reitberger, J., Schnoerr, C., Krzystek, P., Stilla, U., 2009. 3D segmentation of singletrees exploiting full waveform LIDAR data. ISPRS Journal of Photogrammetryand Remote Sensing 64, 561–574.

Rottensteiner, F., Sohn, G., Jung, J., Gerke, M., Baillard, C., Benitez, S., Breitkopf, U.,2012. The ISPRS benchmark on urban object classification and 3D buildingreconstruction. In: ISPRS Annals of Photogrammetry, Remote Sensing andSpatial Information Sciences, 25 August–1 September. Melbourne, Australia, pp.293–298.

Rusu, R., Holzbach, A., Blodow, N., Beetz, M., 2009. Fast geometric point labelingusing conditional random fields. In: IEEE International Conference on IntelligentRobots and Systems, 11–15 October, 2009. St. Louis, USA, pp. 7–12.

Rutzinger, M., Rottensteiner, F., Pfeifer, N., 2009. A comparison of evaluationtechniques for building extraction from airborne laser scanning. IEEEJournal of Selected Topics in Applied Earth Observations and RemoteSensing 2, 11–20.

Sampath, A., Shan, J., 2007. Building boundary tracing and regularization fromairborne Lidar point clouds. Photogrammetric Engineering and Remote Sensing73, 805–812.

Schindler, K., 2012. An overview and comparison of smooth labeling methods forland-cover classification. Transactions on Geoscience and Remote Sensing(TGRS) 50, 4534–4545.

Schmidt, M., 2012. UGM: a Matlab toolbox for probabilistic undirected graphicalmodels <http://www.di.ens.fr/�mschmidt/Software/code.html>.

Shapovalov, R., Velizhev, A., Barinova, O., 2010. Non-associative markov networksfor 3D point cloud classification. In: Proceedings of the ISPRS Commission IIISymposium – PCV 2010. ISPRS, Saint-Mandé, France, pp. 103–108.

Shapovalov, R., Vetrov, D., Kohli, P., 2013. Spatial inference machines. In: IEEEConference on Computer Vision and Pattern Recognition, 23–28 June. Portland,USA, pp. 1–8.

Shotton, J., Winn, J., Rother, C., Criminisi, A., 2009. Textonboost for imageunderstanding: multi-class object recognition and segmentation by jointlymodeling texture, layout, and context. International Journal of Computer Vision81, 2–23.

Vishwanathan, S., Schraudolph, N., Schmidt, M., Murphy, K., 2006. Acceleratedtraining of conditional random fields with stochastic gradient methods. In: 23rdInternational Conference on Machine Learning, 25–29 June, 2006. Pittsburgh,USA, pp. 969–976.

Wegner, J.D., Hansch, R., Thiele, A., Soergel, U., 2011. Building detection from oneorthophoto and high-resolution InSAR data using conditional random fields.IEEE Journal of Selected Topics in Applied Earth Observations and RemoteSensing 4, 83–91.

Xiong, X., Munoz, D., Bagnell, J.A., Hebert, M., 2011. 3-D scene analysis viasequenced predictions over points and regions. In: Proceedings of IEEEInternational Conference on Robotics and Automation (ICRA11), 9–13 May.Shanghai, China, pp. 2609–2616.

Yang, M.Y., Förstner, W., 2011. A hierarchical conditional random field model forlabeling and classifying images of man-made scenes. In: IEEE InternationalConference on Computer Vision Workshops (ICCV Workshops), IEEE, 6–13November. Barcelona, Spain, pp. 196–203.