learning to assess terrain from human demonstration using ...mobile/papers/2015icra...learning to...

8
Learning to Assess Terrain from Human Demonstration Using an Introspective Gaussian-Process Classifier Laszlo-Peter Berczi 1 , Ingmar Posner 2 , and Timothy D. Barfoot 1 Abstract— This paper presents an approach to learning robot terrain assessment from human demonstration. An operator drives a robot for a short period of time, supervising the gathering of traversable and untraversable terrain data. After this initial training period, the robot can then predict the traversability of new terrain based on its experiences. We improve on current methods in two ways: first, we maintain a richer (higher-dimensional) representation of the terrain that is better able to distinguish between different training examples. Second, we use a Gaussian-process classifier for terrain assessment due to its superior introspective abilities (leading to better uncertainty estimates) when compared to other classifier methods in the literature. Our method is tested on real data and shown to outperform current methods both in classification accuracy and uncertainty estimation. I. INTRODUCTION Mobile robots operating autonomously need the ability to safely navigate their environments. This requires the robot to be able to determine the traversability of the terrain upon which it wishes to drive, a process called terrain assessment. Often, remote sensing cannot produce data with enough resolution to be able to determine traversability from afar, and so the robot must use its on-board sensors to assess the terrain in situ. Typically, terrain assessment algorithms use a set of heuristics to determine the traversability of terrain in the near vicinity of the robot. A common example in this category is GESTALT [1], a terrain assessment method developed for and initially flown on the Mars Exploration Rovers (MER). The method uses geometric information gathered from stereo cameras to estimate the slope, step height, and roughness of the terrain, and these three quantities are then thresholded to predict traversability. Such heuristic methods often rely on the tuning of a large number of robot-specific parameters in order to make accu- rate predictions about the terrain, and therefore suffer from three major drawbacks. First, the terrain must be represented in such a way that a human can interpret the representation and create a set of rules about its traversability. This often results in representations that are oversimplified and do not capture all of the information about the terrain (Figure 3). Second, an expert is required to tune the parameters to a specific robot and environment in which the robot is traversing, a time-consuming process that often yields suboptimal results. Third, once a good set of parameter 1 Laszlo-Peter Berczi and Timothy D. Barfoot are with the Institute for Aerospace Studies, University of Toronto, Canada <{peter.berczi, tim.barfoot}@utoronto.ca> 2 Ingmar Posner is with the Mobile Robotics Group, University of Oxford, United Kingdom <[email protected]> Fig. 1. Our robot platform (a Clearpath Grizzly) used for learning terrain assessment from demonstration. The robot is equipped with a mast-mounted Bumblebee XB3 stereo camera used for generating pointclouds via block matching, and also for navigation via visual odometry. A Microstrain IMU is used to gravity-align the pointclouds. A human operator uses a hand controller to drive the robot on traversable terrain and to occasionally indicate untraversable terrain in front of robot. The robot then learns to assess the terrain on its own from these examples. values are found, they apply only to the specific robot for which they were tuned. The tuning process must be repeated for every different robot and environment in which the robot is operating. This paper aims to alleviate these issues by directly learning terrain traversability from human demonstration of traversable and untraversable terrain. Learning methods are able to handle rich, high-dimensional terrain representations that are too complex for hand-coded rules but much more powerful for representing terrain. By directly learning to predict traversability, we remove the need for an expert to hand-tune parameters for each robot; the same algorithm can be implemented on a variety of robot platforms without the need to manually retune a large number of parameters. To this end, this paper makes two novel contributions: first, we present a more complex representation of the terrain than previously used for directly learning traversability. Our representation is specifically tailored to learning from human demonstration. Second, we show that Gaussian-processes classifiers (GPCs) with a custom kernel function outperform the popular choice of support vector machines (SVMs) for terrain assessment both in classification accuracy and uncertainty estimation. The remainder of the paper is organized as follows: Sec- tion II presents relevant previous work. Section III describes our terrain model and how we learn to assess terrain from human demonstration. Section IV describes the dataset on which the method was tested and Section V analyses the

Upload: others

Post on 22-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning to Assess Terrain from Human Demonstration Using ...mobile/Papers/2015ICRA...Learning to Assess Terrain from Human Demonstration Using an Introspective Gaussian-Process Classifier

Learning to Assess Terrain from Human Demonstration Using anIntrospective Gaussian-Process Classifier

Laszlo-Peter Berczi1, Ingmar Posner2, and Timothy D. Barfoot1

Abstract— This paper presents an approach to learning robotterrain assessment from human demonstration. An operatordrives a robot for a short period of time, supervising thegathering of traversable and untraversable terrain data. Afterthis initial training period, the robot can then predict thetraversability of new terrain based on its experiences. Weimprove on current methods in two ways: first, we maintaina richer (higher-dimensional) representation of the terrainthat is better able to distinguish between different trainingexamples. Second, we use a Gaussian-process classifier forterrain assessment due to its superior introspective abilities(leading to better uncertainty estimates) when compared toother classifier methods in the literature. Our method is testedon real data and shown to outperform current methods bothin classification accuracy and uncertainty estimation.

I. INTRODUCTIONMobile robots operating autonomously need the ability to

safely navigate their environments. This requires the robotto be able to determine the traversability of the terrain uponwhich it wishes to drive, a process called terrain assessment.Often, remote sensing cannot produce data with enoughresolution to be able to determine traversability from afar,and so the robot must use its on-board sensors to assess theterrain in situ.

Typically, terrain assessment algorithms use a set ofheuristics to determine the traversability of terrain in the nearvicinity of the robot. A common example in this category isGESTALT [1], a terrain assessment method developed forand initially flown on the Mars Exploration Rovers (MER).The method uses geometric information gathered from stereocameras to estimate the slope, step height, and roughness ofthe terrain, and these three quantities are then thresholded topredict traversability.

Such heuristic methods often rely on the tuning of a largenumber of robot-specific parameters in order to make accu-rate predictions about the terrain, and therefore suffer fromthree major drawbacks. First, the terrain must be representedin such a way that a human can interpret the representationand create a set of rules about its traversability. This oftenresults in representations that are oversimplified and donot capture all of the information about the terrain (Figure3). Second, an expert is required to tune the parametersto a specific robot and environment in which the robotis traversing, a time-consuming process that often yieldssuboptimal results. Third, once a good set of parameter

1Laszlo-Peter Berczi and Timothy D. Barfoot are with the Institute forAerospace Studies, University of Toronto, Canada <{peter.berczi,tim.barfoot}@utoronto.ca>

2Ingmar Posner is with the Mobile Robotics Group, University of Oxford,United Kingdom <[email protected]>

Fig. 1. Our robot platform (a Clearpath Grizzly) used for learning terrainassessment from demonstration. The robot is equipped with a mast-mountedBumblebee XB3 stereo camera used for generating pointclouds via blockmatching, and also for navigation via visual odometry. A Microstrain IMUis used to gravity-align the pointclouds. A human operator uses a handcontroller to drive the robot on traversable terrain and to occasionallyindicate untraversable terrain in front of robot. The robot then learns toassess the terrain on its own from these examples.

values are found, they apply only to the specific robot forwhich they were tuned. The tuning process must be repeatedfor every different robot and environment in which the robotis operating.

This paper aims to alleviate these issues by directlylearning terrain traversability from human demonstration oftraversable and untraversable terrain. Learning methods areable to handle rich, high-dimensional terrain representationsthat are too complex for hand-coded rules but much morepowerful for representing terrain. By directly learning topredict traversability, we remove the need for an expert tohand-tune parameters for each robot; the same algorithm canbe implemented on a variety of robot platforms without theneed to manually retune a large number of parameters.

To this end, this paper makes two novel contributions:first, we present a more complex representation of the terrainthan previously used for directly learning traversability. Ourrepresentation is specifically tailored to learning from humandemonstration. Second, we show that Gaussian-processesclassifiers (GPCs) with a custom kernel function outperformthe popular choice of support vector machines (SVMs)for terrain assessment both in classification accuracy anduncertainty estimation.

The remainder of the paper is organized as follows: Sec-tion II presents relevant previous work. Section III describesour terrain model and how we learn to assess terrain fromhuman demonstration. Section IV describes the dataset onwhich the method was tested and Section V analyses the

Page 2: Learning to Assess Terrain from Human Demonstration Using ...mobile/Papers/2015ICRA...Learning to Assess Terrain from Human Demonstration Using an Introspective Gaussian-Process Classifier

results and effectiveness of the method. Section VI presentsour conclusions and details some ideas for future work.

II. RELATED WORK

In 2005, the Defense Advanced Research Projects Agency(DARPA) spurred research in learning methods for robotnavigation with their Learning Applied to Ground Robots(LAGR) program [2]. Much of the work focussed on ex-tending the range of geometric terrain assessment meth-ods by learning associations between geometry and imagedata [3][4]. These approaches have the drawback that thelong-range assessments can only be as good as the close-range geometric methods used.

Instead, this paper focusses on improving near-field terrainassessment predictions. Closely related to the aims of thispaper, a few previous works have used learning methods toestimate one or more aspects of robot-terrain interactionsfrom previous experience. Iagnemma et al. [5][6] and An-gelova et al. [7][8][9] learn to predict wheel slip from ex-perience. Wellington et al. [10][11][12] use a voxel-columnrepresentation of the terrain to learn the traversability of tallgrass from experience. Specifically, the algorithm attemptsto estimate the true surface normal (of the non-compliantground underneath tall grass) from voxel densities. Thesemethods improve on heuristic algorithms for estimating therobot’s response to terrain, but do not address the problemof how to determine traversability from this response.

There has also been previous work in applying learningtechniques to directly estimate traversability of the terrainfrom previous experience. Thrun et al. [13] learn to assessterrain at high speeds on a dirt road from previous experi-ences. Ollis et al. [14] learn to identify traversability fromcolour information using only previously driven locationsas traversable training examples. The traversability estimatesare used in combination with a heuristic geometric algorithm.Bajracharya et al. [3][4] use a simple geometric representa-tion of the terrain (similar to the GESTALT representation)in combination with traversable and untraversable examplesto learn traversability. Training data are gathered from humandemonstration; traversable samples are drawn from where therobot has previously driven, while untraversable examplescan be labelled offline by a human looking at logs, online bya human (indicator button), or autonomously using on-boardsensors (e.g., a bumper). Kim et al. [15] use a combination ofsimple geometric height histograms and visual data to learna model of traversability. Training examples are gathered

autonomously using motor currents and a bumper sensor.Matthies et al. [16] use a voxel representation similar to [12]to directly learn traversability of tall grassy regions in frontof the robot.

We improve on these methods by exploiting learningalgorithms to enable the use of a higher-dimensional ter-rain representation. Richer terrain models are better ableto distinguish between different types of terrain [17], andare therefore better able to discriminate between traversableand untraversable terrain. Ott and Ramos [18] use colourhistograms and binary patterns per image to learn obstacledetection, again showing the improved power of a rich terrainrepresentation.

We also improve the uncertainty estimation in thetraversability predictions, a largely unaddressed issue inthese previous works. In the literature, the main learningalgorithms used for terrain assessment are SVMs, a popularbinary classifier that is designed to maximize classificationaccuracy. SVMs perform accurately and efficiently with largedatasets, making them a popular choice for classificationproblems. Grimmett et al. [19] and Triebel et al. [20] com-pare SVMs to GPCs for the task of roadsign classification inimages and show that GPCs perform similarly well to SVMsbut maintain a better uncertainty estimate in their predictions.In the field of terrain assessment, GPCs have been used toestimate terrain shape [21][22] and robot-terrain interactions[23][24][25] but not, to the best of our knowledge, to directlyestimate traversability.

III. METHODOLOGYOur learning terrain assessment algorithm consists of three

distinct parts: terrain modelling, labelling, and learning (Fig-ure 2). The terrain modelling step processes the raw sensorinputs into a usable terrain representation. The labellingstep uses human demonstration to label terrain examplesas traversable or untraversable. Finally, the learning stepuses the labelled training examples to make traversabilitypredictions for unlabelled data.

A. Terrain Modelling

We draw inspiration for our terrain representation fromthe commonly used GESTALT [1] representation. TheGESTALT method represents terrain by fitting a plane topoints in a robot-sized patch centred at the robot. The plane-fit statistics are then used to compute the slope, roughness,and largest step height in the patch, reducing the representa-tion of the terrain to just three quantities. This simplification

Sensor DataTerrain

Modelling

Labelling

LearningMethod

Predictions

Human Input

Fig. 2. The general architecture for learning terrain assessment algorithms. The inputs to the algorithm are the sensor data and human input. The outputsare predictions about the terrain traversability. The three major processes are terrain modelling, labelling, and the learning method (to make predictionsabout the traversability of the terrain).

Page 3: Learning to Assess Terrain from Human Demonstration Using ...mobile/Papers/2015ICRA...Learning to Assess Terrain from Human Demonstration Using an Introspective Gaussian-Process Classifier

(a) (b) (c) (d)

Fig. 3. (a), (b) Two very different situations in which the GESTALT terrain statistics are almost the same, revealing that the underlying terrain representationis not rich enough to capture complex scenarios, leading to a conservative classification scheme. (c), (d) The same two situations as in Figure 3, but withour cell-height terrain representation overlaid instead of the GESTALT one. Our method computes the median heights of each cell in a grid centred at therobot and concatenates them into a vector (25 dimensions in this case). This higher-dimensional representation is richer in information than the GESTALTmethod, and is thus better able to distinguish between different terrain patches.

enables a human to directly analyse the traversability ofthe terrain, but also limits the representational power ofGESTALT. As a result, there are often cases where theGESTALT cost is similar for significantly different patchesof terrain (Figure 3). This leads to the misclassification ofsafe terrain in an effort to safeguard the rover from unsafeterrain that shares a similar representation (i.e., the modelends up being conservative).

It is important to note that GESTALT was specificallydesigned for operation on Mars and was thus subject toseveral constraints. In particular, the algorithm was designedto operate with very limited computational resources. Fur-thermore, being an algorithm intended for use in space,the focus is on protecting the robot and not necessarily ongetting the most accurate terrain assessment (false positivesare favoured over false negatives).

We make use of the fact that learning algorithms canhandle more complex inputs in choosing our terrain repre-sentation. Similarly to GESTALT, our representation bins apointcloud acquired by the robot’s sensors into a grid ofcells centred at the robot. Instead of fitting a plane to thesecells, however, we compute the median height of each celland concatenate all of the cell heights into a vector (e.g.,25 cell heights in Figures 3(c) and 3(d)). The median was

Fig. 4. The robot images the terrain in front of it with a stereo camera attime t, generating pointclouds of the terrain. As the robot is driven forwardsto time t′, it uses stereo VO to determine the previously imaged terrain overwhich it is driving. The pointclouds are then binned into a patch of gridcells centred at the robot, and labelled as either traversable or untraversabledepending on the input from the human via a hand controller.

chosen over the mean because we found that it is more robustto outliers that can appear in the pointclouds generated bystereo image matching.

Compared to GESTALT, our representation is better ableto differentiate between different terrain types as evidencedby comparing Figures 3(a) and 3(b) to Figures 3(c) and3(d). For two very different terrain patches, the GESTALTrepresentation is very similar, whereas our representation isdistinct. In fact, by choosing a large number of patches, wecan get arbitrarily close to representing the actual terrainshape (at the cost of increasing the number of requiredtraining examples and computation time).

B. Labelling Scheme

In our labelling scheme, we aim to take advantage ofhuman expertise in an efficient way. Rather than having ahuman tediously label training examples from logs gatheredby the robot (as is often done), we incorporate terrainlabelling directly in the driving process.

A human drives the robot via a hand controller for a shortperiod of time on terrain that spans the robot’s capabilities.During this training phase, any terrain that the operator drivesover is considered traversable. Additionally, the operator mayindicate untraversable terrain by driving up to it and pressing

estimated with VO

artificially added poses

Fig. 5. Stereo VO is used to estimate the robot’s path (shown in black) asthe human operator drives it via a hand controller. Any terrain that the robotdrives over is labelled as traversable (green poses). Occasionally, the humanoperator stops the robot in front of untraversable terrain and indicates sovia a button press on the hand controller. Poses are then artificially addedin front of the robot and labelled as untraversable (red poses).

Page 4: Learning to Assess Terrain from Human Demonstration Using ...mobile/Papers/2015ICRA...Learning to Assess Terrain from Human Demonstration Using an Introspective Gaussian-Process Classifier

||x ||

f (La

tent

Fun

ctio

n)

rough, uncertainterrain

flat, certainterrain

Fig. 6. A one-dimensional projection of the mean (dark blue) and 3-σ covariance envelope (light blue) of the prior used for GPC learning. At the origin,the terrain is flat and we are therefore certain that it is very safe. Far from the origin, the terrain is generally rougher and so we are less certain of itstraversability.

a button on the hand controller.As the robot drives, it is imaging the terrain in front of it

(Figure 4). Stereo image pairs are processed into pointcloudsand then stored for later use. As the robot continues to driveforwards, stereo visual odometry (VO) enables the robot todetermine on which patches of terrain (previously imaged) ithas driven. Our cell-height terrain representation is computedfor these patches and they are labelled as traversable.

When the operator indicates that the terrain in front of therobot is unsafe (via a button on the hand controller), severalrobot poses are added at regular intervals in front of the robot(again computing the terrain representation) and labelled asunsafe (Figure 5). Adding more than one robot pose helpsensure that we capture the correct patch of terrain (in ourcurrent implementation, the operator has no way of tellingthe robot how far in front there is an obstacle).

C. Learning Method

We use Gaussian-process classification (GPC) as ourlearning scheme because of its ability to maintain an accurateconfidence estimate when compared to other popular learningmethods (e.g., support vector machines) [19]. Much of thederivation for GP learning can be found in Rasmussen andWilliams [26], but we present a brief summary along withour modifications here.

Given some training inputs, X := {xi|i = 1 . . . n}, andcorresponding traversability labels, y := {yi|i = 1 . . . n}, wewant to predict the traversability, y∗ := {y∗i |i = 1 . . .m}, atsome unlabelled test inputs, X∗ := {x∗

i |i = 1 . . .m}.In our case, the training labels are classes (traversable and

not traversable), so we cannot represent them directly usinga GP. Instead, we introduce a latent function f , which wemodel with the GP, and use this latent function to makepredictions about the test class labels.

Thus, the problem is first to determine the value of thelatent function at the test inputs, conditioned on the trainingdata. We do so by marginalizing over all possible values ofthe latent function at the training inputs:

p (f∗|X∗,X,y) =

∫p (f∗|X∗,X, f) p (f |X,y) df . (1)

If both f and f∗ have Gaussian distributions then this integralis easy to compute. However, because the labels y are

classes, p (y|X, f) is non-Gaussian and therefore p (f |X,y)is non-Gaussian. We use the Laplace method as presented inRasmussen and Williams [26] to approximate the distributionas Gaussian:

p (f |X,y) ≈ q (f |X,y) := N(f̂ ,A−1

),

f̂ := argmaxf

p (f |X,y) ,

A := − ∇∇ log p (f |X,y)|f=f̂ .

where q (f |X,y) is the Gaussian approximation to p (f |X,y)and ∇ is the gradient with respect to f . We can then computethe marginal in (1):

p (f∗|X∗,X,y) = N(µ̂, Σ̂

),

µ̂ := µ∗ + K∗TK−1(f̂ − µ

),

Σ̂ := K∗∗ −K∗TK−1K∗T

+ K∗TK−1A−1K−1K∗T ,

where

p

([ff∗

])= N

([µµ∗

],

[K K∗

K∗T K∗∗

]),

specifies the prior distribution over the latent function (whereN (a,B) represents a normal distribution with mean a andcovariance B).

Since traversable examples are gathered underneath therobot continuously as it drives whereas untraversable exam-ples need to be manually indicated, we anticipate that the la-belling scheme will produce many more traversable examplesthan untraversable examples. We specifically design the meanand covariance of our prior (Figure 6) to compensate for thisimbalance: we assume that only terrain near the origin inour cell-height representation (i.e., flat terrain) is traversable(with high certainty), and that terrain far from the origin(i.e., rough terrain) is untraversable (with low certainty). Astraining data are gathered, the posterior is pulled away fromthe prior to reflect the fact that certain types of non-flat terrainare traversable.

We accomplish this using a logarithmic mean function thattakes on large negative (traversable) values at the origin,

Page 5: Learning to Assess Terrain from Human Demonstration Using ...mobile/Papers/2015ICRA...Learning to Assess Terrain from Human Demonstration Using an Introspective Gaussian-Process Classifier

Fig. 7. The 40 metre diameter University of Toronto Institute for AerospaceMarsDome, where a dataset for validating the method was collected. Theterrain consists of a mixture of sand, gravel, and medium to large rocks.It is varied in difficulty, allowing the full capability of the robot chassis tobe tested. A Clearpath A100 robot is pictured on the terrain (not the robotused in our tests).

and quickly grows to large positive values (untraversable)far from the origin:

[µ]i = θ1 log(θ2 ||xi||2

),

where θ1 and θ2 are two hyperparameters used to control thescaling of the mean function. To model our uncertainty as-sumptions, we use a combination of the squared-exponentialcovariance function and the dot-product covariance function:

[K]ij = θ3 exp

(−||xi − xj ||2

2θ4

)xTi xj ,

where θ3 controls how quickly our uncertainty grows awayfrom the origin and θ4 is a smoothing parameter. Wechose the squared-exponential covariance function becauseit produces smooth latent function values (we assume thattransitions between traversable and untraversable terrain hap-pen gradually as the terrain gets rougher). The dot-productcovariance function is used because it allows us to vary ouruncertainty based on the distance from the origin (whichgenerally corresponds to the roughness of the terrain in ourterrain representation).

To make class predictions, we marginalize over the latentfunction values at the test inputs:

p (y∗|X∗,X,y) =

∫p (y∗|f∗) p (f∗|X∗,X,y) df∗,

where p (y∗|f∗) is typically given by a sigmoid function suchas the logistic function (used in our formulation):

p (y∗i = untraversable|f∗i ) = σ (f∗i ) :=1

1 + exp (−f∗i ).

This integral is one-dimensional for each test case and there-fore easy to compute numerically. Our implementation of thealgorithm closely follows the one presented in Rasmussenand Williams [26], who have taken care to avoid numericalinstabilities and to speed up the computation of necessarymatrix inverses.

IV. DATASET

The method was tested on a dataset gathered at the Uni-versity of Toronto Institute for Aerospace Studies (UTIAS)MarsDome (Figure 7). The MarsDome is a hemisphericalenclosure (approximate diameter of 40 metres) filled withsand, gravel, and rocks. The terrain is arranged into a variety

40 m

Fig. 8. Data was gathered over a 258 metre traverse in the MarsDome.Stereo images were processed into pointclouds and then transformed toa global frame. The driven path is shown in red. In total, 1783 stereopairs were gathered and used to compute 8477 labelled training patches.The traverse is shown from the same perspective as Figure 7 to facilitatecomparison.

of features designed to test robot navigation capabilities.Traversability of the terrain ranges from extremely easyflat ground to gravel hills that the robot cannot climb.A Clearpath Grizzly robot, equipped with a BumblebeeXB3 stereo camera and a Microstrain IMU (Figure 1), wasdriven by a human via hand controller on terrain of varieddifficulty. The operator drove the robot on safe terrain andoccasionally stopped in front of unsafe terrain and indicated(via a button press on the hand controller) that the terrainwas not traversable. Our stereo VO algorithm using SURFfeatures [27] was run in the background to provide robotlocalization in order to determine over which regions of theimaged terrain the robot drove. Pointclouds were generatedfrom stereo images using a block-matching algorithm, andgravity-aligned using the inclinometer before the cell-heightand GESTALT representations were computed.

In total, 1783 stereo pairs were gathered over 15 minutesof driving. The average spacing between image acquisitionwas 15 centimetres, resulting in a total path length of258 metres (Figure 8). The operator indicated untraversableterrain 16 times throughout the training period. The imagepairs and button presses were processed to create 8477labelled training examples with 7726 traversable cases (91%)and 751 untraversable cases (9%).

Terrain patches were computed using a 5× 5 grid of cells(each with a sidelength of 0.25 metres), resulting in a patchwith sidelength 1.25 metres. Only patches whose pointclouddensity was above a specific threshold were used so as tomake the data more reliable.

V. RESULTS

We present the results from our comparison of differentterrain models and learning schemes for terrain assessment.The terrain models used were our cell-height representationand the GESTALT representation prevalent in the literature.The learning methods used were GP classifiers, SVMs1,

1We used the built-in Matlab implementation of an SVM with a radialbasis function kernel (since it outperformed the linear SVM in our tests).

Page 6: Learning to Assess Terrain from Human Demonstration Using ...mobile/Papers/2015ICRA...Learning to Assess Terrain from Human Demonstration Using an Introspective Gaussian-Process Classifier

0 10

1

FPR

TP

R

GP HeightSVM HeightGP GESTALTSVM GESTALTHeuristic GESTALT

(a) ROC

0 10

1

Recall

Pre

cis

ion

GP HeightSVM HeightGP GESTALTSVM GESTALTHeuristic GESTALT

(b) PR

0 0.2 0.4 0.6 0.8 1−100

−80

−60

−40

−20

0

20

40

Fraction of Data Rejected

MR

Pe

rce

nt

Ch

an

ge

GP HeightSVM HeightGP GESTALTSVM GESTALTHeuristic GESTALT

(c) MR Percent Change

Fig. 9. (a) ROC curves for each of the methods tested. The higher-dimensional cell-height representation in combination with GPC learning outperformedthe other methods. (b) Precision-recall curves for the different combinations of learning method and terrain model. Using a GPC in combination with thehigher-dimensional cell-height representation produced the best results. (c) As we increased our confidence threshold, we classified less of the data but wedecreased our misclassification rate (MR). The GPC methods showed greater improvement in their MR while rejecting less data (on average) than theirSVM counterparts.

and the hand-coded GESTALT method [1] with learnedparameters.

Standard multifold testing was done with k = 10 foldseach containing a random, equal share of the traversableand untraversable training examples. For each test, the modelwas trained on 8 of the 10 folds, with the remaining 2 usedfor validation and testing. The validation fold was used tooptimize the hyperparameters (discussed later), which werethen used to test the method. The results were averagedacross the folds for each of the methods tested [28].

The resulting receiver-operating-characteristic (ROC)curves for each of the terrain model/learning method combi-nations used are shown in Figure 9(a). ROC curves are a stan-dard measure of classifier performance, comparing the true-positive rate, TPR (predicted positives/actual positives), andthe false-positive rate, FPR (false positives/actual negatives),at different probability thresholds (i.e., the probability abovewhich we consider terrain to be classified as untraversable).The precision-recall (PR) curves (generated by varying thesame threshold) are shown in Figure 9(b).

The results show that our cell-height terrain model incombination with a GPC for learning made better predictionsthan the other model/learning method combinations. This isevidenced by both the ROC and PR curves, where the GPCcell-height curves dominated the other curves at almost allpoints.

Comparing the different learning methods on the cell-height terrain representation (i.e., comparing GPC Heightand SVM Height), we see that the GPC outperformed theSVM in both the ROC and PR curves. Comparing the twoclassifiers using the GESTALT representation yields the sameresults; the GPC outperformed the SVM in classificationaccuracy.

If we compare the different terrain representations us-ing a GPC for classification, we see that the cell-heightmodel performed better than the GESTALT model. SimilarlySVM Height outperformed SVM GESTALT as expectedbecause the cell-height representation is much richer than theGESTALT one and was therefore better able to distinguishbetween different terrain examples.

We note that the heuristic GESTALT method performedworst in all cases, despite the parameters being learnedfrom the dataset. Furthermore, the learned parameters thatperformed best in the heuristic model are not intuitive in thatthey do not accurately reflect their real-world counterparts.For example, the results shown were obtained by setting themaximum slope to 41 degrees and the robot clearance to5 cm, when in reality the robot should not tilt that far andhas a much higher clearance. Thus, tuning these parametersby trial and error would be difficult and unintuitive. Thisis partly due to inaccuracies in the data gathered from thesensors, but in real operating conditions noisy sensors areoften the reality and algorithms must take into account suchsensor noise.

In addition to how accurately the methods classified thedata, we are interested in how confident each of the learn-ing algorithms were when making predictions. Similarly toGrimmett et al. [19], we computed the normalized entropyfor both the accurate and misclassified cases for each clas-sifier (Figure 10). Normalized entropy is commonly used asa measure of uncertainty in a classifier, and is computed asfollows:

Hi,norm =Hi

Hmax,

Hi = −p (yi) logb p (yi)− (1− p (yi)) logb (1− p (yi)) ,

Hmax = logb c,

where p (yi) is the probability of the terrain being un-traversable, c is the number of classes (2 in our case), and bcan be chosen arbitrarily (b = 2 is a convenient choice forbinary classification since Hmax simplifies to 1).

The difference in median normalized entropy was higherfor the GP classifiers than the SVM classifiers for both thecell-height and GESTALT terrain representations, showingthat the GPC had a better estimate of its uncertainty whenmaking predictions. An example of a misclassified patch ofterrain is shown in Figure 11. Both classifiers incorrectlyidentified this patch of terrain as traversable but the GPC

Page 7: Learning to Assess Terrain from Human Demonstration Using ...mobile/Papers/2015ICRA...Learning to Assess Terrain from Human Demonstration Using an Introspective Gaussian-Process Classifier

0 0.5 10

1000

2000

Normalized Entropy

Fre

quency

(a) GPC Height (acc.)

0 0.5 10

50

100

Normalized Entropy

Fre

quency

(b) SVM Height (acc.)

0 0.5 10

50

100

Normalized Entropy

Fre

quency

(c) GPC GESTALT (acc.)

0 0.5 10

200

400

Normalized Entropy

Fre

quency

(d) SVM GESTALT (acc.)

0 0.5 10

50

100

150

Normalized Entropy

Fre

quency

(e) Heuristic GESTALT (acc.)

0 0.5 10

100

200

Normalized Entropy

Fre

quency

(f) GPC Height (miss)

0 0.5 10

5

10

Normalized Entropy

Fre

quency

(g) SVM Height (miss)

0 0.5 10

20

40

Normalized Entropy

Fre

quency

(h) GPC GESTALT (miss)

0 0.5 10

20

40

Normalized Entropy

Fre

quency

(i) SVM GESTALT (miss)

0 0.5 10

20

40

60

80

Normalized Entropy

Fre

quency

(j) Heuristic GESTALT (miss)

Fig. 10. The normalized entropy (a measure of uncertainty) for the accurate (top row) and misclassified cases (bottom row) for each method. The mediannormalized entropy is indicated with a black bar. The GPC methods had a greater difference in their median normalized entropy between their correct andincorrect predictions than the SVM and heuristic methods for both the cell-height and GESTALT representations. This shows that GPCs were more certainthan SVMs and the heuristic GESTALT when correctly classifying data, and more uncertain when misclassifying data. This means that we should be ableto use the normalized entropy to know when our predictions are likely to be incorrect for the GP classifier method; we will know when we do not know.

classifier was much more uncertain of its prediction than theSVM classifier.

Comparing the different terrain models, we can see thatthe cell-height representation resulted in a higher mediannormalized entropy than the GESTALT representation forboth the GPC and the SVM classifier, showing that the richercell-height patches lent themselves better to uncertaintyestimation than the GESTALT statistics.

We then leveraged the better uncertainty estimates toimprove the accuracy of the classifiers by choosing to onlyconsider the most confident predictions. In other words, weonly considered predictions where

p (yi = untraversable) < pthres − ε, orp (yi = untraversable) > pthres + ε,

where pthres is the threshold probability for consideringterrain untraversable and ε is a parameter we varied todetermine the level of confidence we desire.

The results of this confidence thresholding are shown inFigure 9(c), where pthres = 0.5, and ε was varied from 0 to0.5, resulting in different amounts of data being classified.We used the misclassification rate (MR) for each method:

MR =incorrect predictions

total predictions,

as a measure of classifier improvement.As we increased the value of ε, we omitted predictions

about which we were not confident, resulting in decreasedMR of our classifiers. Figure 9(c) shows that the GPclassifiers improved their misclassification rates more whilerejecting less data than their respective SVM classifiers (onaverage). Thus, the GP classifiers could classify more datawith higher accuracy than their SVM counterparts.

These results depend on the values of various hyperpa-rameters in the learning models. These values were chosenby performing an exhaustive search on a discretized grid

of values (initialized randomly). It was found that the hy-perparameters varied between the different folds, althoughnot too drastically. The best hyperparameters were chosenby evaluating the integral under the ROC curve for eachvalidation set. Similarly, a grid search was used to determinethe best parameters for the heuristic GESTALT method.

The algorithms were compared offline (in Matlab), andthe GPC methods required considerably more computationalresources than the SVM and GESTALT methods whentraining, and thus took much longer to compute results. OurGPC formulation took approximately 75 seconds to train theGPC for the cell-height representation, and approximately90 seconds for the GESTALT representation. Predictionstook approximately 2 seconds. In contrast, the SVM method

Fig. 11. An example of a terrain patch that was misclassified by boththe GPC and SVM using the cell-height representation. The gravel moundshown in the image was at the limit of rover traversability (it was difficultbut possible to drive over). While both methods classified the patch asuntraversable, the GP classifier had a much higher normalized entropy(uncertainty) in its prediction than the SVM.

Page 8: Learning to Assess Terrain from Human Demonstration Using ...mobile/Papers/2015ICRA...Learning to Assess Terrain from Human Demonstration Using an Introspective Gaussian-Process Classifier

(built-in Matlab function fitcsvm) took approximately 2seconds for both training and making predictions. We note,however, that training for the learning methods occurred onthe entire dataset at once, whereas during online operationthe data is gathered incrementally. We plan to exploreincremental methods for GPC learning to achieve speedup.

VI. CONCLUSIONS AND FUTURE WORKThis paper shows that robot terrain assessment in a gen-

eral, unstructured environment can be successfully learnedfrom human demonstration. The work improves on theliterature in two main ways. First, a higher-dimensionalinput is used to retain more information about the terrainbeing classified, which yields better classification accuracy(both for the SVMs used in the literature and for the GPCformulation presented here) than the current standard. Theterrain model is chosen to complement the learning methodsused, and is a natural extension of the GESTALT methodthat has been widely used to date.

Second, the GPC learning method with custom mean andkernel functions allows the robot to maintain a better estimateof its uncertainty when making predictions about the terrain.In other words, our GP classifier is not only more accurate,but also better at knowing when it might make an incorrectprediction. We show that this knowledge can be exploited toimprove the performance of the system.

In the next iteration, we hope to implement this algorithmonline on the robot platform. We plan to make use of theGPC uncertainty estimates to allow the robot to decide whenit is confident enough to drive on its own and when to ask forhelp from the human. Further still into the future, we planto replace the cell-height terrain model with a learned modelby applying deep learning to the very high-dimensionalsensor data (e.g., directly to the images). We believe thatthis will take advantage of the inherent underlying structureof the environment that is difficult to capture using hand-coded terrain models. These developments are aimed towardsdeveloping a long-term autonomy system that will see therobot adapting to changing conditions by leveraging itsconfidence estimates.

VII. ACKNOWLEDGMENTWe would like to thank Professor Ben Upcroft (QUT)

for many useful discussions during the preparation of thispaper. This work was supported by the Natural Sciencesand Engineering Research Council (NSERC) through theNSERC Canadian Field Robotics Network (NCFRN) andthe CREATE program. The authors would also like to thankClearpath Robotics for their support with the robot platform.

REFERENCES

[1] S. B. Goldberg, M. Maimone, and L. Matthies, “Stereo vision androver navigation software for planetary exploration,” in AerospaceConf. Proc., 2002. IEEE, vol. 5. IEEE, 2002, pp. 5–2025.

[2] L. D. Jackel, E. Krotkov, M. Perschbacher, J. Pippine, and C. Sullivan,“The darpa lagr program: Goals, challenges, methodology, and phasei results,” J. Field Robotics, vol. 23, no. 11-12, pp. 945–973, 2006.

[3] M. Bajracharya, B. Tang, A. Howard, M. Turmon, and L. Matthies,“Learning long-range terrain classification for autonomous naviga-tion,” in ICRA 2008. IEEE, 2008, pp. 4018–4024.

[4] M. Bajracharya, A. Howard, L. H. Matthies, B. Tang, and M. Turmon,“Autonomous off-road navigation with end-to-end learning for the lagrprogram,” J. Field Robotics, vol. 26, no. 1, pp. 3–25, 2009.

[5] K. Iagnemma, H. Shibly, and S. Dubowsky, “On-line terrain parameterestimation for planetary rovers,” in ICRA 2002. IEEE, 2002, pp.3142–3147.

[6] K. Iagnemma, S. Kang, H. Shibly, and S. Dubowsky, “Online terrainparameter estimation for wheeled mobile robots with application toplanetary rovers,” Robotics, IEEE Trans. on, vol. 20, no. 5, pp. 921–927, 2004.

[7] A. Angelova, L. Matthies, D. Helmick, G. Sibley, and P. Perona,“Learning to predict slip for ground robots,” in ICRA 2006. IEEE,2006, pp. 3324–3331.

[8] A. Angelova, L. Matthies, D. Helmick, and P. Perona, “Learningslip behavior using automatic mechanical supervision,” in ICRA 2007.IEEE, 2007, pp. 1741–1748.

[9] ——, “Learning and prediction of slip from visual information,” J.Field Robotics, vol. 24, no. 3, pp. 205–231, 2007.

[10] C. Wellington and A. Stentz, “Online adaptive rough-terrain navigationvegetation,” in ICRA 2004, vol. 1. IEEE, 2004, pp. 96–101.

[11] C. Wellington, A. C. Courville, and A. Stentz, “Interacting markovrandom fields for simultaneous terrain modeling and obstacle detec-tion.” in RSS 2005, 2005, pp. 1–8.

[12] C. Wellington and A. Stentz, “Learning predictions of the load-bearingsurface for autonomous rough-terrain navigation in vegetation,” in FSR2006. Springer, 2006, pp. 83–92.

[13] S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A. Aron,J. Diebel, P. Fong, J. Gale, M. Halpenny, G. Hoffmann et al., “Stanley:The robot that won the darpa grand challenge,” J. of field Robotics,vol. 23, no. 9, pp. 661–692, 2006.

[14] M. Ollis, W. H. Huang, and M. Happold, “A bayesian approach toimitation learning for robot navigation,” in IROS 2007. IEEE, 2007,pp. 709–714.

[15] D. Kim, J. Sun, S. M. Oh, J. M. Rehg, and A. F. Bobick, “Traversabil-ity classification using unsupervised on-line visual learning for outdoorrobot navigation,” in ICRA 2006. IEEE, 2006, pp. 518–525.

[16] L. Matthies, M. Turmon, A. Howard, A. Angelova, B. Tang, andE. Mjolsness, “Learning for autonomous navigation: Extrapolatingfrom underfoot to the far field,” J. Machine Learning Research, vol. 1,pp. 1–48, 2005.

[17] R. Triebel, P. Pfaff, and W. Burgard, “Multi-level surface maps foroutdoor terrain mapping and loop closing,” in IROS 2006. IEEE,2006, pp. 2276–2282.

[18] L. Ott and F. Ramos, “Unsupervised incremental learning for long-term autonomy,” in ICRA 2012. IEEE, 2012, pp. 4022–4029.

[19] H. Grimmett, R. Paul, R. Triebel, and I. Posner, “Knowing when wedon’t know: Introspective classification for mission-critical decisionmaking,” in ICRA 2013, Karlsruhe, Germany, May 2013.

[20] R. Triebel, H. Grimmett, R. Paul, and I. Posner, “Introspective activelearning for scalable semantic mapping,” in Workshop on ActiveLearning in Robotics: Exploration, Curiosity and Interaction. RSS2013, June 2013.

[21] S. Vasudevan, F. Ramos, E. Nettleton, and H. Durrant-Whyte, “Gaus-sian process modeling of large-scale terrain,” J. Field Robotics, vol. 26,no. 10, pp. 812–840, 2009.

[22] T. Lang, C. Plagemann, and W. Burgard, “Adaptive non-stationarykernel regression for terrain modeling.” in RSS 2007, 2007.

[23] S. Martin, L. Murphy, and P. Corke, “Building large scale traversabilitymaps using vehicle experience,” in Experimental Robotics. Springer,2013, pp. 891–905.

[24] K. Ho, T. Peynot, and S. Sukkarieh, “Traversability estimation for aplanetary rover via experimental kernel learning in a gaussian processframework,” in ICRA 2013. IEEE, 2013, pp. 3475–3482.

[25] ——, “A near-to-far non-parametric learning approach for estimatingtraversability in deformable terrain,” in IROS 2013. IEEE, 2013, pp.2827–2833.

[26] C. Rasmussen and C. Williams, Gaussian Processes for MachineLearning, ser. Adaptative computation and machine learningseries. University Press Group Limited, 2006. [Online]. Available:http://books.google.ca/books?id=vWtwQgAACAAJ

[27] P. Furgale and T. D. Barfoot, “Visual teach and repeat for long-rangerover autonomy,” J. Field Robotics, vol. 27, no. 5, pp. 534–560, 2010.

[28] T. Fawcett, “An introduction to roc analysis,” Pattern recognitionletters, vol. 27, no. 8, pp. 861–874, 2006.