airborne vision-based collision-detection system

21
Airborne Vision-Based Collision-Detection System John Lai, Luis Mejias, and Jason J. Ford Australian Research Centre for Aerospace Automation (ARCAA), Queensland University of Technology, GPO Box 2434, Brisbane, Queensland 4001, Australia e-mail: [email protected], [email protected], [email protected] Received 10 March 2010; accepted 19 June 2010 Machine vision represents a particularly attractive solution for sensing and detecting potential collision-course targets due to the relatively low cost, size, weight, and power requirements of vision sensors (as opposed to radar and Traffic Alert and Collision Avoidance System). This paper describes the development and evaluation of a real-time, vision-based collision-detection system suitable for fixed-wing aerial robotics. Using two fixed- wing unmanned aerial vehicles (UAVs) to recreate various collision-course scenarios, we were able to capture highly realistic vision (from an onboard camera perspective) of the moments leading up to a collision. This type of image data is extremely scarce and was invaluable in evaluating the detection performance of two candidate target detection approaches. Based on the collected data, our detection approaches were able to detect targets at distances ranging from 400 to about 900 m. These distances (with some assumptions about closing speeds and aircraft trajectories) translate to an advance warning of between 8 and 10 s ahead of impact, which approaches the 12.5-s response time recommended for human pilots. We overcame the challenge of achieving real-time computational speeds by exploiting the parallel processing architectures of graphics processing units (GPUs) found on commercial-off-the-shelf graphics devices. Our chosen GPU device suitable for integration onto UAV platforms can be expected to handle real-time processing of 1,024 × 768 pixel image frames at a rate of approximately 30 Hz. Flight trials using manned Cessna aircraft in which all processing is performed onboard will be conducted in the near future, followed by further experiments with fully autonomous UAV platforms. C 2010 Wiley Periodicals, Inc. 1. INTRODUCTION The problem of unmanned aerial vehicle (UAV) collision avoidance or “sense-and-avoid” has been identified as one of the most significant challenges facing the integra- tion of UAVs into the national airspace (DeGarmo, 2004; Unmanned Aircraft Systems Roadmap 2007–2032, 2007). The full potential of UAVs can never be realized unless the sense-and-avoid issue is adequately addressed. To this end, machine vision has emerged as a promising means of addressing the “sense” and “detect” aspects of collision avoidance, which demand the ability to automat- ically detect and track targets in naturally lit, high-noise environments. Machine vision represents a particularly attractive so- lution for sensing and detecting potential collision-course targets due to the relatively low cost, size, weight, and power requirements of the sensors involved (Maroney, Boiling, Heffron, & Flathers, 2007). Man-in-the-loop (MITL) solutions have been prototyped and demonstrated (Bryner, 2006). Furthermore, an automated detection system devel- oped by Defense Research Associates was implemented and flight tested, in which the detection approach is based on target pixel movement (or energy flow) in the image frame (Utt, McCalmont, & Deschenes, 2004, 2005). Recently, Multimedia files may be found in the online version of this article. the problem of aircraft detection using passive vision has been tackled using (1) a combination of morphological filtering and support vector machine classification (Dey, Geyer, Singh, & Digioia, 2009; Geyer, Dey, & Singh, 2009) and (2) a machine learning approach based on AdaBoost that exploits a modified version of the Viola and Jones ob- ject detector (Petridis, Geyer, & Singh, 2008). Many hurdles must be overcome in the use of machine vision for target detection and tracking. We need to contend with not only the inherent noise of imaging sensors but also noise introduced by changing and unpredictable am- bient conditions. The desire to overcome these challenges has driven the development of specialized image filtering and processing techniques that are optimized for the type of dim-target characteristics that are experienced in near- collision events between fixed-wing aircraft or fixed-wing aerial robots. Over the past three decades, a two-stage process- ing paradigm has emerged for the simultaneous detection and tracking of dim, subpixel-sized targets (Arnold, Shaw, & Pasternack, 1993; Barniv, 1985; Gandhi, Yang, Kasturi, Camps, Coraor, et al., 2003, 2006). These two stages are (1) an image preprocessing stage that, within each frame, high- lights potential targets with attributes of interest and (2) a subsequent temporal filtering stage that exploits target dy- namics across frames. The latter temporal filtering stage is often based on a track-before-detect processing concept in Journal of Field Robotics 28(2), 137–157 (2011) C 2010 Wiley Periodicals, Inc. View this article online at wileyonlinelibrary.com DOI: 10.1002/rob.20359

Upload: john-lai

Post on 06-Jul-2016

228 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Airborne vision-based collision-detection system

Airborne Vision-Based Collision-Detection System

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

John Lai, Luis Mejias, and Jason J. FordAustralian Research Centre for Aerospace Automation (ARCAA), Queensland University of Technology, GPO Box 2434,Brisbane, Queensland 4001, Australiae-mail: [email protected], [email protected], [email protected]

Received 10 March 2010; accepted 19 June 2010

Machine vision represents a particularly attractive solution for sensing and detecting potential collision-coursetargets due to the relatively low cost, size, weight, and power requirements of vision sensors (as opposed toradar and Traffic Alert and Collision Avoidance System). This paper describes the development and evaluationof a real-time, vision-based collision-detection system suitable for fixed-wing aerial robotics. Using two fixed-wing unmanned aerial vehicles (UAVs) to recreate various collision-course scenarios, we were able to capturehighly realistic vision (from an onboard camera perspective) of the moments leading up to a collision. Thistype of image data is extremely scarce and was invaluable in evaluating the detection performance of twocandidate target detection approaches. Based on the collected data, our detection approaches were able to detecttargets at distances ranging from 400 to about 900 m. These distances (with some assumptions about closingspeeds and aircraft trajectories) translate to an advance warning of between 8 and 10 s ahead of impact, whichapproaches the 12.5-s response time recommended for human pilots. We overcame the challenge of achievingreal-time computational speeds by exploiting the parallel processing architectures of graphics processing units(GPUs) found on commercial-off-the-shelf graphics devices. Our chosen GPU device suitable for integrationonto UAV platforms can be expected to handle real-time processing of 1,024 × 768 pixel image frames at arate of approximately 30 Hz. Flight trials using manned Cessna aircraft in which all processing is performedonboard will be conducted in the near future, followed by further experiments with fully autonomous UAVplatforms. C© 2010 Wiley Periodicals, Inc.

1. INTRODUCTION

The problem of unmanned aerial vehicle (UAV) collisionavoidance or “sense-and-avoid” has been identified asone of the most significant challenges facing the integra-tion of UAVs into the national airspace (DeGarmo, 2004;Unmanned Aircraft Systems Roadmap 2007–2032, 2007).The full potential of UAVs can never be realized unlessthe sense-and-avoid issue is adequately addressed. Tothis end, machine vision has emerged as a promisingmeans of addressing the “sense” and “detect” aspects ofcollision avoidance, which demand the ability to automat-ically detect and track targets in naturally lit, high-noiseenvironments.

Machine vision represents a particularly attractive so-lution for sensing and detecting potential collision-coursetargets due to the relatively low cost, size, weight, andpower requirements of the sensors involved (Maroney,Boiling, Heffron, & Flathers, 2007). Man-in-the-loop (MITL)solutions have been prototyped and demonstrated (Bryner,2006). Furthermore, an automated detection system devel-oped by Defense Research Associates was implementedand flight tested, in which the detection approach is basedon target pixel movement (or energy flow) in the imageframe (Utt, McCalmont, & Deschenes, 2004, 2005). Recently,

Multimedia files may be found in the online version of this article.

the problem of aircraft detection using passive vision hasbeen tackled using (1) a combination of morphologicalfiltering and support vector machine classification (Dey,Geyer, Singh, & Digioia, 2009; Geyer, Dey, & Singh, 2009)and (2) a machine learning approach based on AdaBoostthat exploits a modified version of the Viola and Jones ob-ject detector (Petridis, Geyer, & Singh, 2008).

Many hurdles must be overcome in the use of machinevision for target detection and tracking. We need to contendwith not only the inherent noise of imaging sensors butalso noise introduced by changing and unpredictable am-bient conditions. The desire to overcome these challengeshas driven the development of specialized image filteringand processing techniques that are optimized for the typeof dim-target characteristics that are experienced in near-collision events between fixed-wing aircraft or fixed-wingaerial robots.

Over the past three decades, a two-stage process-ing paradigm has emerged for the simultaneous detectionand tracking of dim, subpixel-sized targets (Arnold, Shaw,& Pasternack, 1993; Barniv, 1985; Gandhi, Yang, Kasturi,Camps, Coraor, et al., 2003, 2006). These two stages are (1)an image preprocessing stage that, within each frame, high-lights potential targets with attributes of interest and (2) asubsequent temporal filtering stage that exploits target dy-namics across frames. The latter temporal filtering stage isoften based on a track-before-detect processing concept in

Journal of Field Robotics 28(2), 137–157 (2011) C© 2010 Wiley Periodicals, Inc.View this article online at wileyonlinelibrary.com • DOI: 10.1002/rob.20359

Page 2: Airborne vision-based collision-detection system

138 • Journal of Field Robotics—2011

which target information is collected and collated over aperiod of time before the detection decision is made.

Generally, the goal of the image preprocessing stageis to enhance potential target features while suppressingbackground noise and clutter. There is an abundance oftechniques and algorithms available that may be consid-ered for this image processing role. In particular, nonlinearspatial techniques such as median subtraction filters (Desh-pande, Er, Venkateswarlu, & Chan, 1999) have been widelydiscussed in the literature. Another nonlinear imagefiltering approach that has received much attention overthe past decade has its basis in mathematical morphology(Dougherty & Lotufo, 2003). Numerous morphology-basedfilters have been proposed for the detection of small targetsin infrared (IR) images (JiCheng, ZhengKang, & Tao, 1996;Tom, Peli, Leung, & Bondaryk, 1993; Zhu, Li, Liang, Song,& Pan, 2000). Specific implementations of the morpho-logical filtering approach include the hit-or-miss filter(Schaefer & Casasent, 1995), close-minus-open filter(Casasent & Ye, 1997), and top-hat filter (Braga-Neto,Choudhary, & Goutsias, 2004). Although a large propor-tion of research has focused on IR images, there are recentexamples of morphological filters being incorporated intotarget detection algorithms operating on visual spectrumimages (Carnie, Walker, & Corke, 2006; Gandhi et al., 2003,2006). Moreover, a sign of the increasing popularity ofmorphological filters for small target detection is evident inthe host of studies undertaken into the issue of parameterdesign (Yu, Wu, Wu, & Li, 2003; Zeng, Li, & Peng, 2006).Finally, efforts have been made to compare existing tech-niques with the morphology-based filters (Barnett, Billard,& Lee, 1993; Gandhi et al., 2006; Tom et al., 1993; Warren,2002), with the median filtering technique often featuringin the comparison studies.

On the other hand, the temporal filtering stage thatfollows the image preprocessing is designed to extract im-age features that possess target-like temporal behavior. Forthis role, two particular filtering approaches have receivedmuch attention in the literature: Viterbi-based approachesand Bayesian-based approaches. The Viterbi algorithm hasformed the basis of the temporal filtering stage in nu-merous track-before-detect algorithms (Arnold et al., 1993;Barniv, 1985; Davey, Rutten, & Cheung, 2008; Gandhi et al.,2006; Tonissen and Evans, 1996). The popularity of theViterbi algorithm is in part due to its utility in the con-text of tracking in which, under a number of assumptions,it is able to efficiently determine the optimal target trackwithin a data sequence (Forney, 1973). Some analyses ofthe Viterbi algorithm’s detection and tracking performancehave been conducted (Barniv & Kella, 1987; Johnston &Krishnamurthy, 2000; Tonissen & Evans, 1996), and modifi-cations that enhance the algorithm’s tracking performancein the presence of non-Gaussian clutter noise have beenproposed (Arnold et al., 1993). An alternative temporal fil-ter design for track-before-detect algorithms is based onBayesian filtering (Bruno, 2004; Bruno & Moura, 1996, 2001;

Davey et al., 2008). Advances in this filtering approachhave considered the relaxation of typical white Gaussiannoise assumptions and spatially correlated clutter (Bruno &Moura, 1999). Moreover, the modeling of clutter has beenexpanded to encompass a variety of Gaussian and non-Gaussian, correlated and uncorrelated clutter types, andthe Bayesian algorithm is extended to accommodate multi-ple targets that may feature randomly varying amplitudesor intensities (Bruno & Moura, 2001). Finally, some com-parison between the Viterbi and Bayesian approaches hasbeen made at the theoretical level (Bruno & Moura, 2001),as well as on the practical level via Monte Carlo simulationtrials (Davey et al., 2008).

A survey of potential technologies for UAV sense andavoid concluded that the visual/pixel-based technologyoffered the best chances for regulator approval (Karhoff,Limb, Oravsky, & Shephard, 2006). It is interesting to notethat some studies have shown that it is actually difficult todetect and avoid a collision using the human visual sys-tem (Australian Transport Safety Bureau, 1991). To date,public domain hardware implementation of vision-basedsense-and-avoid systems has been limited to a small num-ber. Arguably, the most significant developments have beenmade by Utt et al. (2005), where a combination of field-programmable gate array chips and microprocessors usingmultiple sensors was tested in a twin-engine Aero Coman-der aircraft. A challenge that faces any vision-based sense-and-avoid system is the requirement of real-time opera-tion. Motivated by this fact, we exploit the capabilities ofdata-parallel arithmetic architectures such as graphics pro-cessing units (GPUs), which have proven to be very capa-ble parallel processing devices that can outperform currentCPUs by up to an order of magnitude (Owens, Luebke,Govindaraju, Harris, Kruger, et al., 2005).

The key contributions of this paper are as follows:(1) demonstration of the coordinated flight of fixed-wingUAVs performing collision-course scenarios for collectionof suitable test image data, (2) application of hiddenMarkov model (HMM) and Viterbi-based target detectionapproaches to a computer vision sense-and-avoid problem,(3) analysis of the target detection approaches in terms ofmaximum detection range, and (4) implementation of thetarget detection approaches using GPU-based hardwareand the demonstration of real-time detection capabilities.The use of fixed-wing UAVs for data collection presentsunique challenges, particularly in relation to the synchro-nization of aircraft flights to ensure that a realistic scenariois replicated and to maximize the target aircraft’s timespent in the camera field of view. Furthermore, we are par-ticularly interested in determining the performance limitsof passive machine vision for detecting collision-courseaircraft; hence our focus on the maximum detection rangeof our candidate detection algorithms. We acknowledgethat the detection range will be influenced by variousenvironmental conditions (such as weather and lighting);however, a detailed characterization of the quality of the

Journal of Field Robotics DOI 10.1002/rob

Page 3: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 139

collected data is beyond the scope of this paper. Finally,we also provide some insight into the impact of imagejitter on the performance of our candidate target detectionapproaches, as it has been previously shown that detectionperformance can be quite sensitive to jitter effects (Uttet al., 2005).

This paper is structured as follows. Section 2 intro-duces the basic characteristics of collision behavior thatmake vision-based collision detection a difficult problem.Section 3 separately introduces the HMM and Viterbi–based detection approaches examined in this paper. Sec-tion 4 evaluates the performance of the proposed collision-detection system in three key areas: (1) the resilience of thedetection algorithms to undesired camera motion (imagejitter), (2) the target detection capability of the detectionalgorithms, and (3) the real-time processing capacity of aGPU-based hardware implementation. Section 5 describessome of the lessons learned and future work planned.

2. BASIC CHARACTERISTICS OF POTENTIALCOLLISION-COURSE OBJECTS

It may be possible to detect collision-course objects basedon their physical appearance or the dynamics that theyexhibit. The physical attributes of collision-course objects,such as color, brightness, shape, and size, depend largelyon ambient lighting and atmospheric conditions, as wellas the distance to the object. Sophisticated models havebeen devised in an attempt to describe and predict theseobserved target attributes as a function of distance, tak-ing into account the effects of haze, atmospheric scatter-ing, and potential defocus blur due to poor or mismatchedoptics (Geyer et al., 2009). However, it is difficult to accu-rately model the complex and unpredictable nature of light-ing and atmospheric conditions, so we choose instead toexploit some simple but reliable characteristics of collision-course objects that are sufficient for our purposes of deter-mining the maximum detection range of our candidate al-gorithms. More specifically, we elect to exploit the typicalsize, shape, and dynamics of a collision-course object onthe image plane of a camera. We do not claim that any ofthese characteristics alone can uniquely identify a collision-course object, but we believe that a combination of thethree has the ability to eliminate the majority of nongen-uine targets. Given the 12.5-s reaction time recommendedfor human pilots (Federal Aviation Administration, 1983),a collision-course object must be detected at a distance ofmore than 1 km (assuming a closing velocity of 100 m/s)to avoid a collision.1 At this distance, aircraft and other ob-jects of similar dimensions may take up an area anywhere

1Even though an automated system is likely to require less time torecognize threats and take evasive action, it can be argued that inthe interests of safety, it is best to detect targets as early (i.e., as faraway) as possible.

from a few pixels to less than one pixel on the image plane.2

It can be argued that a few pixels cannot really define anysort of “shape,” but at least it can be deduced that in thecontext of early detection, collision-course objects will tendto be small, point-like features, becoming smaller and dim-mer the earlier that detection is required. Of all the charac-teristics of collision-course objects, their somewhat uniquedynamics is perhaps the most suitable attribute to exploit.Objects on a collision course appear at the output of a fixedonboard vision sensor as relatively stationary features onthe image plane (Australian Transport Safety Bureau, 1991).Features that are moving rapidly across the image plane donot correspond to collision-course objects.

The typical output from an computer vision sensoris a sequence of images (i.e., a video stream). Detectingcollision-course objects is then a matter of searching forobjects within the image sequence that possess the abovecharacteristics; that is, dim, subpixel-size targets that areslowly moving in the image frame. These target propertiescorrespond to the two-stage detection paradigm that is de-scribed in the next section: morphological filtering to detectpin-like targets and temporal filtering to detect persistentor almost stationary features.

We are aware of existing guidelines and requirements(ASTM, 2004; Ebdon & Regan, 2004; Federal AviationAdministration, 1983; Office of the Secretary of Defense,2004) stipulating functional capabilities that a sense-and-avoid system must satisfy to gain regulatory approval.These are not inconsistent with our earlier characterizationof collision-course targets; they are merely specifications ata higher functional level that we will consider at the nextstage of our research once we have determined the per-formance limits of a vision-based detection approach. Fora detailed analysis of current regulatory requirements, seeGeyer, Singh, and Chamberlain (2008).

3. DETECTION ALGORITHMS

The two candidate detection algorithms that will be con-sidered in this paper are designed to process the sensormeasurements in two stages: (1) image preprocessing,followed by (2) track-before-detect temporal filtering. Bothalgorithms will share a common “close-minus-open” mor-phological image preprocessing stage, but one approachwill be coupled to a HMM temporal filter, whereas theother approach will be coupled to a Viterbi-based temporalfilter, as illustrated in Figure 1. We highlight that afterdetection has occurred the position estimate output ofour candidate detection algorithms could then be passedto a high-level target tracking filter (such as an extendedKalman filter). This has the potential to improve trackingperformance, the analysis of which will be considered infuture investigations.

2Depending on camera resolution, field of view, etc.

Journal of Field Robotics DOI 10.1002/rob

Page 4: Airborne vision-based collision-detection system

140 • Journal of Field Robotics—2011

Figure 1. Candidate detection algorithms.

3.1. Morphological Image Preprocessing

This paper considers an image preprocessing techniquethat exploits grayscale morphological operations in orderto process discrete two-dimensional (2D) image data quan-tized to a finite number of intensity or grayscale levels,such as might be expected from the output of an electro-optical sensor. In particular, the close-minus-open (CMO)morphological filter used is based on image morphologyoperations known as top-hat and bottom-hat transforma-tions (Gonzalez, Woods, & Eddins, 2004). The effect of thetop-hat transformation is to identify positively contrasting(brighter than background) features within an image thatare smaller than a certain size (the cutoff size is specifiedthrough filtering kernels known as structuring elements),whereas the bottom-hat transformation performs a similarfunction but instead targets negatively contrasting (darkerthan background) features. It can be shown that summingthe top-hat and bottom-hat transformations of the same im-age, which defines the CMO filtering operation, simulta-neously identifies both positively and negatively contrast-ing features. This combination of morphological operationshas been referred to elsewhere in the literature as a self-complementary top-hat filtering approach (Soille, 2003).

In this paper, the CMO filter is configured to serveas a powerful tool in the identification of small point-likefeatures within the measurement image. For performanceand computational reasons, the CMO filter implemented inthis paper exploits a directional decomposition technique(Casasent & Ye, 1997). More specifically, the minimum re-sponse from a pair of CMO filters using orthogonal one-

dimensional (1D) structuring elements is used. Here, oneCMO filter operates exclusively in the vertical direction,and the other operates exclusively in the horizontal di-rection. The vertical and horizontal structuring elementsof the CMO morphological preprocessing filter are givenby sv = [1, 1, 1, 1, 1]′ and sh = [1, 1, 1, 1, 1], respectively. Anexample of the output after CMO morphological prepro-cessing is illustrated in Figure 4(b).

3.2. Temporal Filtering

In many machine vision–based detection problems, the ex-istence of a target in a three-dimensional (3D) volume ofspace must be determined from observations of a projec-tion of the target space onto a 2D image plane. Here, targetdetection can be viewed as evaluating the likelihood of twoalternate hypotheses, where H1 denotes the hypothesis thata single target is present in the camera field of view and H2denotes the hypothesis that no target is present.

The temporal filtering approaches implemented in thispaper will assume that under hypothesis H1, the projectedtarget motion resides on a 2D plane fixed in space that isrepresented by the set of discrete 2D grid points {(i, j )|1 ≤i ≤ Nv, 1 ≤ j ≤ Nh}, with vertical and horizontal resolu-tions Nv and Nh, respectively. Let N = Nv × Nh denote thetotal number of grid points. The measurements are pro-vided by an imaging sensor whose field of view is rep-resented by a 2D grid of image pixels locations alignedwith the target space and denoted {(p, q)|1 ≤ p ≤ Nv, 1 ≤q ≤ Nh}.

Journal of Field Robotics DOI 10.1002/rob

Page 5: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 141

3.2.1. Hidden Markov Model Filtering

It will be assumed that, when present, the target is locatedwithin a particular pixel of the image frame at each timeinstant. Thus, each pixel (i, j ) represents a unique state ofthe HMM in the target detection problem. For notationalconvenience, the columns of the image frame are stackedto form a vector of pixel locations. In this way, each statemay be referenced by a single index, in the sense that if thetarget is at pixel location (i, j ), this corresponds to it beingin the state m = [(j − 1)Nv + i].

Let xk denote the state (target location) at time k.Between consecutive image frames the target may moveto different pixel locations; that is, the target can tran-sition between the states. The likelihood of state transi-tions can be described by the HMM’s transition probabili-ties Amn = P (xk+1 = state m|xk = state n) for 1 ≤ m, n ≤ N ,which is the probability of moving from any one pixel po-sition (state) n to any other pixel position (state) m. Thetransition probabilities can therefore be used to describethe expected mean target motion. For example, in the caseof slow-moving targets, low probabilities tend to be as-signed for transitions between distant pixels. Moreover,initial probabilities πm = P (x1 = state m) for 1 ≤ m ≤ N areused to specify the probability that the target is initiallylocated in state m. Finally, to complete the parameteriza-tion of the HMM, there are the measurement probabilitiesBm(Yk) = P (Yk|xk = state m) for 1 ≤ m ≤ N that are usedto specify the probability of obtaining the observed im-age measurement Yk ∈ RNv×Nh , given that the target is ac-tually in pixel location (state) m [see Elliott, Aggoun, andMoore (1995) for more details about the parameterization ofHMMs]. A possible approach to estimate the probabilisticmatrices A and B is discussed later in the implementationof a HMM filter bank.

HMM Detection Strategy The HMM filtering approachperforms temporal integration of the input measurementsby recursively propagating αi

k , an unnormalized proba-bilistic estimate of the target state xi

k , over time. This isachieved via the forward part of the forward–backwardprocedure (Rabiner, 1989), which can be decomposed intotwo stages: initialization and recursion: For 1 ≤ m ≤ N ,

1. Initialization: Let αmk denote the probability

P (Y1, Y2, . . . , Yk, xk = state m). Then αm1 = πmBm(Y1).

2. Recursion: At time k > 1, set αmk =

[∑N

n=1 αnk−1Amn]Bm(Yk).

The forward procedure filtering result is closely related tothe two probabalistic measures that facilitate the detectionof targets: (1) the probability of measurements up to time k

assuming H1, given by

P (Y1, Y2, . . . , Yk|H1) =N∑

m−1

αmk , (1)

and (2) the conditional mean filtered estimate of the targetstate m given measurements up to time k and assuming H1,given by

x̂mk = E[xk = state m|Y1, Y2, . . . , Yk, H1]

= αmk∑N

n=1 αnk

, (2)

where E[.|.] denotes the mathematical conditional ex-pectation operation (Billingsley, 1995). The probabilityP (Y1, Y2, . . . , Yk|H1) may be interpreted as an indicator oftarget presence [following the probabilistic distance resultsof Xie, Ugrinovskii, and Petersen (2005)], and the condi-tional mean estimate can be regarded as an indicator oflikely target locations.

In the interest of computational efficiency, the condi-tional mean estimate is evaluated directly from the follow-ing expression (Elliott et al., 1995):

x̂k = NkBk(Yk)Ax̂k−1, (3)

where Nk is a scalar normalization factor, Bk(Yk) is anN × N matrix where the main diagonal is occupied by thevalues of Bm(Yk) for 1 ≤ m ≤ N and all other elements arezero, A is an N × N matrix with elements Amn, and x̂k is anN × 1 vector consisting of elements x̂m

k for 1 ≤ m ≤ N thatare equivalent to those given in Eq. (3). Moreover, note thefollowing relationship between the normalization factor Nk

and the probability of measurements up to time k assumingH1:

P (Y1, Y2, . . . , Yk |H1) =k∏

l=1

1Nl

. (4)

For the HMM filtering approach, let ηk , the test statisticfor declaring the presence of a target, be given by the fol-lowing exponentially weighted moving average filter witha window length of L:

ηk =(

L − 1L

)ηk−1 +

(1L

)log

(1

Nk

). (5)

In the later performance evaluation studies, we found that awindow length of L = 10 produced good detection results.A shorter window length may give earlier detections, butthis is likely to be at the expense of increased false alarms.When ηk exceeds a predefined threshold, the HMM detec-tion algorithm considers the target to be present and lo-cated at state

γk = arg maxm

(x̂mk

)(6)

Journal of Field Robotics DOI 10.1002/rob

Page 6: Airborne vision-based collision-detection system

142 • Journal of Field Robotics—2011

at time k. The definition of ηk and γk is motivated by thefiltering quantities discussed earlier.

HMM Filter Bank Previous studies have shown that amultiple-filter approach (filter bank) provides superior de-tection performance compared with a standard single-filterimplementation under a range of target speed and signal-to-noise ratio (SNR) scenarios (Lai, Ford, O’Shea, & Walker,2008). The superior performance of the filter bank approachcan be attributed largely to the collective ability of multiple-filter models to provide a more precise description of thetarget dynamics (however, this more precise modeling ofthe target dynamics can have drawbacks when the model-ing assumptions are violated; see Section 4.1). In this pa-per, a HMM filter bank consisting of four filters is imple-mented (Lai et al., 2008). Each HMM filter in the bank usesthe same preprocessed image data but otherwise operatesindependently of all other filters. The filter bank approachis less well characterized than the standard single HMM fil-ter, and its application has not been prevalent in the contextof dim-target detection from imaging sensors.

The transition probability parameters of each filter inthe HMM filter bank are designed to handle target motionin which the target either remains stationary or moves toan adjacent neighboring pixel between consecutive imageframes. In practice when the system is susceptible to the ef-fects of image jitter, we allow for the possibility of largerinterframe target motion. For example, Figure 2 illustratesa 3 × 3 patch of the possible transitions as seen in the image

Figure 2. Transition patch of size 3 × 3 corresponding to onlynine nonzero transition probabilities.

plane (which could handle target motion within 1 pixel perframe). The possibility of larger motion can be handled bylarger patch sizes (for example, a 5 × 5 patch could han-dle target motion within 2 pixels per frame). These types oftarget motions correspond to transition probability matri-ces that only have nonzero probabilities for self-transitionsand transitions to states nearby in the image plane (all othertransitions have zero probability).

An ad hoc approach to assigning the specific proba-bility values within the patch has been used in this paper;better performance might be possible by designing the pa-rameters according to relative entropy rate–based strategies(Lai & Ford, 2010).

Furthermore, note that the implemented HMM filterexploits the following probabilistic relationship betweentarget location xk and the preprocessed measurements Yk :

Bm(Yk) = P(Ym

k

∣∣xk = state m)

P(Ym

k

∣∣xk �= state m) , (7)

for 1 ≤ m ≤ N . Note the computational advantage that Eq.(7) affords, given that P (Ym

k |xk = state m) and P (Ymk |xk �=

state m) can each be determined on a single-pixel basis[rather than requiring the probability of a whole image Laiet al. (2008)].

To construct the measurement probability matrixBk(Yk), estimates of the probabilities P (Ym

k |xk �= state m)and P (Ym

k |xk = state m) are required. The former describesthe prior knowledge about the distribution of pixel valuesin the absence of a target (i.e., the noise and clutter distri-bution), and the latter captures the prior knowledge aboutthe distribution of values at pixels containing a target. Therequired probabilities for Bk(Yk) are trained directly fromsample data. The probability P (Ym

k |xk �= state m) is esti-mated as the average frequency that each pixel value re-sulted from a nontarget location; one way this can be cal-culated is by sampling pixel values from image sequenceswithout a target. Using a similar procedure, P (Ym

k |xk =state m) is estimated as the average frequency that eachpixel value measurement resulted from a target location,and this can be calculated based on image sequences con-taining a target. We acknowledge that in using an empiri-cal approach for estimating the measurement probabilities,the amount of data available will influence the quality ofthe estimates. In general, estimates of P (Ym

k |xk �= state m)tend to be easier to obtain, due to the relative abundanceof nontarget image data compared with those contain-ing targets. From our experiences, probabilities estimatedfrom at least 100 image frames have provided reasonableperformance.

Note that in the HMM filter bank, detection eventsmay be triggered by any filter. In particular, if a target ispresent and this presence is declared in more than one fil-ter, then we look to the dominant filter (the one with thehighest γk) to provide the estimate of target position. Multi-media extension 1 (see the Appendix) illustrates the typical

Journal of Field Robotics DOI 10.1002/rob

Page 7: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 143

output from the two stages of our HMM-based detectionalgorithm: stage 1, morphological preprocessing (point fea-ture detection); and stage 2, temporal filtering (HMM fil-ter bank; the dominant member filter in the filter bank isshown).

Remark Strictly speaking, the right-hand side ofEq. (7) is proportional to Bm(Yk) [the applicable scalingfactor may be absorbed by Nk in the implementation ofEq. (3)]. Moreover, this relationship holds only under thefollowing assumptions: (1) the statistical properties of pixelvalues within an image are spatially independent and (2)individual pixels do not allow the opportunity of perfectdetection, in the sense that P (Ym

k |xk �= state m) > 0 when-ever P (Ym

k |xk = state m) > 0. Admittedly, the presence ofextended (multipixel) targets or spatially correlated noisewould violate the above assumptions but has been foundto have only moderate impact on performance.

3.2.2. Viterbi-Based Filtering

A Viterbi-based temporal filtering approach that is basedon a dynamic programming algorithm (Carnie et al., 2006;Gandhi et al., 2006) was also examined. In this approach,preprocessed image frames are integrated over time alongpossible target trajectories in order to improve the SNR. Theoutput is an image, with large pixel values indicating likelytarget locations.

The motion of the target is modeled in terms of discretevelocity cells, where each cell (u, v) encompasses a rangeof possible target velocities. Assuming constant target ve-locity (i.e., no transitions between velocity cells) and a ve-locity cell resolution of 1 pixel per frame, it can be shownthat if the target is at any particular pixel (i, j ) at time k,there exists a neighborhood of four connected pixels withinwhich the target will be located at time k + 1 (Tonissen &Evans, 1996). This neighborhood of four connected pixelsis denoted by Qij and collectively referred to as the forwardstate transitions. By symmetry, if the target is at any par-ticular pixel (i, j ) at time k, there exists a neighborhood offour connected pixels within which the target was locatedat time k − 1. This neighborhood of four connected pixels isdenoted by Q̄ij and collectively referred to as the backwardstate transitions. Hence, for each velocity cell (u, v) there ex-ists a unique set of forward state transitions Q

ijuv and back-

ward state transitions Q̄ijuv corresponding to each particular

pixel (i, j ). This model for target dynamics can be modifiedto allow for transitions covering more than four pixels at atime by adjusting the resolution of the velocity cells. How-ever, previous analysis (Tonissen & Evans, 1996) showedthat better performance is achieved for smaller numbersof possible transitions, with four transition cell possibilitiesconsidered to be a reasonable choice for slow, nonmaneu-vering targets (Gandhi et al., 2006).

Viterbi-Based Detection Strategy The Viterbi-based al-gorithm performs temporal integration of the input mea-surements by recursively generating a set of intermediateimages D for each velocity cell (u, v) that is considered. Thisprocess can be divided into two stages: initialization and re-cursion.

For all (u, v), 1 ≤ i ≤ Nv , and 1 ≤ j ≤ Nh,

1. Initialization: Let Dijk (u, v) denote the ij th pixel of the in-

termediate image frame at time k for velocity cell (u, v).Then Dij

1 (u, v) = 0.2. Recursion: At time k > 1, set

Dijk (u, v) = [

(1 − β)Yijk

] + β. max(i ′ ,j ′ )∈Q̄

ijuv

[Di

′j

k−1(u, v)],

where Yijk is the ij th pixel of the preprocessed image at

time k and β represents a memory factor that can varybetween zero and one.

At any time k when a detection decision is required, takethe maximum output across corresponding pixels of the in-termediate image frames belonging to each velocity cell:

Dijmax,k = max

(u,v)

[Dij

k (u, v)], (8)

for 1 ≤ i ≤ Nv and 1 ≤ j ≤ Nh. This final image Dmax,k thatconsolidates target information from across all velocity cellsthen serves as the basis for declaring detections.

For the Viterbi-based filtering approach, a test statisticλk for declaring the presence of a target at time k is givenby

λk = maxij

(Dij

max,k

). (9)

When λk exceeds a predefined threshold, the Viterbi-basedalgorithm considers a target to be present and located atstate

ςk = arg maxij

(Dij

max,k

)(10)

at time k. The definition of λk and ςk follow from the in-terpretation of the Viterbi-based algorithm’s output as animage, where the pixel values correspond to target signalstrength.

Viterbi-Based Filter Implementation The Viterbi-basedfilter implemented here mirrors those described in the lit-erature (Carnie et al., 2006; Gandhi et al., 2006), where fourvelocity cells are used to detect targets that move with con-stant velocity in any direction but are limited to a maximumspeed of 1 pixel per frame (this is analogous to the four fil-ters in the HMM filter bank). Nonmaximal suppression isapplied to the output of the filter to reduce an undesirable“dilation” effect in which pixels in the neighborhood of thetarget also attain significantly large values (Gandhi et al.,2006).

Journal of Field Robotics DOI 10.1002/rob

Page 8: Airborne vision-based collision-detection system

144 • Journal of Field Robotics—2011

We highlight that the selection of memory factor β hasbeen investigated previously (Carnie et al., 2006; Gandhiet al., 2006). Larger β values give more weighting to pastmeasurements and can reduce the incidence of false alarmsat the expense of a delay in detection. In our implementa-tion of the Viterbi-based filter, we let β = 0.75, which hasbeen demonstrated to be a reasonable choice for the mem-ory factor (Carnie et al., 2006).

4. COLLISION-DETECTION SYSTEM EVALUATION

We begin by evaluating our two candidate target detectionapproaches in terms of robustness to image jitter and tar-get detection performance. We then discuss our proposedhardware implementation of the detection system and eval-uate the capacity of this system to support real-time execu-tion of the candidate detection algorithms. The candidatedetection algorithms are as follows:

• CMO image preprocessing with HMM temporal filter-ing (patch size 5 × 5)

• CMO image preprocessing with Viterbi-based temporalfiltering

To facilitate the evaluation of the detection algorithms, wewill consider two types of SNR quantities: (1) a target dis-tinctness SNR (TDSNR) and (2) a false-alarm distinctnessSNR (FDSNR). The TDSNR provides a quantitative mea-sure of the detection capability of the algorithm and is de-fined as

TDSNR = 20 log10

(P T

P N

), (11)

where P T is the average target pixel intensity and P N is theaverage nontarget pixel intensity at the filtering output. Ingeneral, the more conspicuous the target is at the output ofthe filter, the higher the TDSNR value. On the other hand,FDSNR measures the tendency of the algorithm to producefalse alarms and is defined as

FDSNR = 20 log10

(P F

P N

), (12)

where P F is the average of the highest nontarget pixel in-tensity and P N is the average nontarget pixel intensity atthe filtering output. Strong filter responses away from thetrue target location will tend to increase the FDSNR value.We highlight that TDSNR and FDSNR are calculated di-rectly from the output of the filter and are not based on theSNR quality of the input image measurements. For conve-nience, we will let �DSNR denote the difference betweenthe two SNR metrics:

�DSNR = TDSNR − FDSNR

= 20 log10

(P T

P F

). (13)

4.1. Impact of Image Jitter

In this paper, we consider image jitter as the apparentmotion of the image background between two consecu-tive frames of an image sequence. This apparent motion iscaused by displacement of the camera relative to the objectsin the scene and gives the illusion that objects in an imageare moving when in fact they are stationary in the environ-ment. Image jitter is inherently present in all image datarecorded on moving platforms. Passive motion-dampeningdevices or actively stabilized camera mounts may serve toreduce the effects of image jitter; however, jitter cannot becompletely eliminated as no platform can be perfectly sta-ble. In many cases, the effects of jitter can be quite severe,particularly for small UAV platforms that are more suscep-tible to unpredictable disturbances caused by wind gustsand air turbulence.

In addition to the physical mechanisms that can beemployed to reduce jitter effects while images are beingrecorded, there are ego-motion compensation techniquesthat can be used after the image is captured to producea jitter-corrected version of the raw image. Ego-motioncompensation techniques typically rely on estimating theamount of camera displacement either indirectly via gy-roscopes, accelerometers, and other inertial sensors (Jorge& Jorge, 2004) or directly through tracking salient features(Jung & Sukhatme, 2005).

In this study, we are concerned with the inherent ro-bustness of the detection algorithms to jitter-affected datathat have not undergone ego-motion compensation. Char-acterizing this inherent robustness of the detection algo-rithms is important for two main reasons. First, particularlyin an airborne environment, it would not be uncommonto encounter “blue-sky” conditions under which the back-ground is more or less uniform, rendering feature-basedcompensation techniques much less effective, if not useless.Moreover, no compensation technique is perfect, and henceit would be highly desirable for any practical detection al-gorithm to at least possess some level of resilience to imagejitter in the absence of ego-motion compensation. We willcharacterize the jitter performance of the two candidate de-tection algorithms by applying them to some test image se-quences containing varying degrees of jitter. The filteringoutput of the candidate detection algorithms will then beevaluated in terms of the two types of SNR quantities, TD-SNR and FDSNR, defined earlier.

4.1.1. Test Data

Figure 3(a) provides a sample of the type of data that wereused to characterize the detection algorithms. The typi-cal target size and background contrast can be seen ina close-up region of interest depicted in Figure 3(b). An-other close-up sample is provided in Figure 4(a). Here, thetarget occupies around 6–10 pixels, is quite distinct, andmay be considered fairly easy to detect. The relatively high

Journal of Field Robotics DOI 10.1002/rob

Page 9: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 145

(a) (b)

Figure 3. (a) Sample image from jitter characterization test data set; (b) region of interest containing target.

target SNR does not diminish the significance and practi-cal relevance of the performance characterization becauseinvestigating the impact of image jitter on algorithm per-formance is the main interest of this study, as opposed toinvestigating dim-target detection performance (this is con-sidered separately in the next section).

In the data sets, a range of jitter characteristics has beenartificially introduced, ranging from low to moderate to ex-treme. A low amount of jitter is defined as involving ap-parent interframe background motion of between 0 and1 pixels per frame. A moderate amount of jitter is definedas involving apparent interframe motion of between 1 and3 pixels per frame. Finally, an extreme amount of jitter isdefined as involving apparent interframe motion of greaterthan 4 pixels per frame.

4.1.2. Jitter Test Results

Tables I and II illustrate the typical performance of theHMM and Viterbi-based temporal filtering approaches un-der various jitter scenarios. In the presence of a low levelof jitter, both detection approaches were able to success-fully track the target (successful tracking implies that thetarget is able to be located within two pixels of its true po-

Table I. Jitter performance of HMM temporal filtering.

SNR value (dB)

Jitter level Tracking outcome TDSNR FDSNR �DSNR

Low Success 72.84 40.71 32.13Moderate Success period 71.12 40.93 30.19

Fail period −58.61 38.10 −96.71Extreme Fail −57.60 35.61 −93.21

sition). Figures 4(c) and 4(d) illustrate the typical outputof the HMM and Viterbi-based filter, respectively, underlow jitter conditions. We highlight that in this case the tar-get in the HMM filter output is considerably more distinctthan in the Viterib-based filter output and this is reflectedin the larger �DSNR value for the HMM filter. However,under moderate jitter conditions, there were periods whenthe HMM filter failed to track the target successfully. Thissuggests that the HMM filter is more sensitive to the ef-fects of jitter. Part of this sensitivity may be attributed to thejitter dynamics not being explicitly modeled in the HMMfilter. Moreover, we highlight that the HMM algorithm ex-ploits extra target dynamic behavior information (i.e., mod-eling assumptions) that is not used in the Viterbi-based fil-ter. This extra information improves baseline performanceof the HMM filter but also make the filter more sensitiveto how valid these assumptions are, some of which are vi-olated by image jitter. Hence, the impact of jitter is morepronounced on the “optimized” performance of the HMMfilter compared to the more “ad hoc” Viterbi-based filter.Finally, in the presence of extreme jitter, both filtering ap-proaches demonstrated poor tracking capabilities and ex-hibited correspondingly low �DSNR values.

Overall, the �DSNR values seem to provide a reason-able indication of the detection algorithms’ jitter tracking

Table II. Jitter performance of Viterbi-based temporalfiltering.

SNR value (dB)

Jitter level Tracking outcome TDSNR FDSNR �DSNR

Low Success 36.20 33.28 2.92Moderate Success 35.33 32.30 3.03Extreme Fail 30.00 29.60 0.40

Journal of Field Robotics DOI 10.1002/rob

Page 10: Airborne vision-based collision-detection system

146 • Journal of Field Robotics—2011

Figure 4. Sample of (a) test data frame, (b) CMO image preprocessing output, (c) HMM filter bank output (dominant memberfilter), and (d) Viterbi-based filter output, after processing 41 frames.

performance. Using as a baseline the value from the low-jitter scenario, which we denote by �DSNR0, the resultssuggest that as a rough rule of thumb successful trackingcan be accomplished under a particular jitter scenario x

when

�DSNRx > 0.5 �DSNR0, (14)

where �DSNRx is the �DSNR value corresponding to jitterscenario x.

Although neither the HMM nor Viterbi-based ap-proaches handle extreme jitter, we note that if the targetSNR is high enough it may be possible to detect potentialcollision-course targets from the output of the morphologi-cal image preprocessing stage alone.

4.2. Target Detection Performance

In a potential collision scenario it is desirable to identifythe other aircraft as early as possible to ensure that enoughtime is available to plan and carry out appropriate actions.For instance, evasive maneuvers may take a period of timeto execute as the aircraft cannot be expected to respond in-stantaneously to control inputs. In general, increasing the

range at which the threat can be detected will lead to ear-lier detection and hence more time to react. Thus, detectionrange may be considered a suitable statistic for character-izing the detection performance of the candidate filteringalgorithms.

In this study, we will quantify detection performancein terms of the maximum range at which consistent detec-tion occurs. We define consistent detection to have occurredwhen a simple threshold on the output of the filter allowsthe target to be detected in 10 successive frames with zerofalse alarms. A consistent detection, however, does not pre-clude the existence of strong filter responses at nontargetlocations that are undesirable as they may be mistaken forthe true target. Hence, �DSNR values (averaged across the10 consistent detection frames) will be presented togetherwith the detection range statistics to provide an indicationof the detection confidence.

We highlight that special experiments involving twofixed-wing UAVs flying near-collision-course trajectorieswere conducted to collect suitable test data for offline post-processing by the detection algorithms. We elaborate on thedetails of these data collection experiments in the followingparagraphs.

Journal of Field Robotics DOI 10.1002/rob

Page 11: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 147

4.2.1. Test Data

Image sequences depicting aircraft on collision course nat-urally serve as ideal test cases for evaluating collision-detection algorithms; however, due to the inherent risk offlying aircraft on converging paths, this type of image datais extremely scarce. This motivated us to conduct our ownspecial flight experiments to collect suitable test data.

Two fixed-wing UAVs were deployed to collect suit-able test data: (1) a Flamingo UAV (Silvertone UAV;www.silvertone.com) measuring 2.9 m from nose to tailwith a wingspan of 4 m; and (2) a Boomerang 60 modelairplane (Phoenix Models) measuring 1.5 m from nose totail with a wingspan of 2.1 m. The Flamingo was poweredby a 26-cc two-stroke Zenoah engine driving a 16 × 6 in.propeller. The powerplant for the Boomerang was an O.S.90 FX engine driving a 15 × 8 in. propeller.

The avionics payload of the Flamingo included aMicroPilot® MP2128g flight controller, Microhard radiomodems, an Atlantic Inertial SI IMU04 inertial measure-ment unit (IMU), a separate NovAtel OEMV-1 global po-sitioning system (GPS) device (in addition to the standardGPS capability provided by the MicroPilot® board), and anextensively customized PC104 mission flight computer. Incontrast, the Boomerang possessed only a basic setup thatfeatured a MicroPilot® MP2028g flight controller and Mi-crohard radio modems.

In our experiments, the Flamingo served as the im-age data acquisition platform and was further equippedwith a fixed nonstabilized Basler Scout Series scA1300-32fm/fc camera fitted with a Computar H0514-MP lensthat offered 51.4-deg horizontal, 39.5-deg vertical, and 62.3-deg diagonal angles of view. The camera could be turnedon and off remotely from the ground control station andwas configured to record 1,024 × 768 pixel resolutionimage data at a rate of 15 Hz with a constant shutterspeed to maintain consistent lighting in the image frames.A solid-state hard disk was used to store the recordedimage data, as opposed to conventional mechanical diskdrives, which may be susceptible to vibrations duringflight.

Figure 5 shows our UAV platforms (the Boomerangis in front of the Flamingo) configured for data collec-tion. We highlight the streamlined pod just forward ofthe Flamingo’s wing that houses the camera along withthe IMU. Our goal was to use the Flamingo, denoted asthe camera aircraft, to record images of the Boomerang,which acted as the target aircraft, while both aircraft wereflown autonomously along preset waypoints that definednear-collision-course paths (adequate height and time sep-aration was maintained at all times for safety reasons). Inthis way we could then capture realistic examples of whatwould be observed by an aircraft in the moments leadingup to a collision. In particular, we sought image data that

Figure 5. UAV platforms used to collected test image data. The Boomerang is in front of the Flamingo.

Journal of Field Robotics DOI 10.1002/rob

Page 12: Airborne vision-based collision-detection system

148 • Journal of Field Robotics—2011

Figure 6. Typical flight plan for enacting a head-on collisionscenario (not to scale). After exiting from their respective hold-ing patterns (HP), the paths of the camera aircraft (solid line)and the target aircraft (dashed line) followed predefined way-points (×) before converging at the designated area for captur-ing data (shaded region). The aircraft are approximately 2 kmapart at the start of their passing runs.

depicted the target aircraft gradually emerging from a stateof being imperceptible to the naked eye to being clearly dis-tinguishable (as opposed to data in which the target aircraftsuddenly enters into the field of view). We considered thistype of data to be ideal for evaluating the detection rangecapability of the candidate filtering algorithms. The cap-tured image data were time-stamped during flight so thatthey could be later correlated with inertial and GPS-based

position logs from both aircraft in order to estimate the de-tection range.

We conducted numerous flights at an altitude of ap-proximately 450 m above mean sea level over agriculturaland livestock fields in the town of Kingaroy, about 210 kmnorthwest of the city of Brisbane, Australia. Our testingsite covered a 4 × 4 km area centered on an unsealedrural airstrip. Apart from obtaining permission from thelandowner, no additional approvals or waivers were re-quired as our flight operations fell under existing civil avia-tion safety authority regulations (Civil Aviation Safety Reg-ulations 1998, 2009; specifically, Part 101, subpart 101.F).Figure 6 illustrates our general approach to recreating ahead-on collision scenario (not to scale). Typically aftertakeoff, both aircraft were placed in holding patterns untilthey were ready to be released to follow predefined way-points that brought the aircraft onto converging paths (asimilar strategy was used in the simulation of other colli-sion geometries). Apart from takeoffs and landings, the air-craft were flown autonomously throughout the collectionof data. Figure 7 illustrates the aircraft trajectories in an ac-tual head-on engagement scenario that was enacted.

One of the key challenges of this approach involvedtiming the release of the aircraft from their respective hold-ing patterns in a way that maximized the time spent by thetarget aircraft in the field of view of the camera aircraft. Inthe case of the head-on collision scenario, this meant hav-ing both aircraft simultaneously in the shaded region ofFigure 6 for as long as possible. However, it was difficultto perform this synchronization, and some of the data col-lected were not considered to be suitable for analysis. In

0 100 200 300 400 500 600 700 800

-300

-200

-100

0

100

200

East (m)

Nor

th (

m)

(a) (b)Figure 7. Engagement Scenario 1. (a) Camera aircraft (�) and target aircraft (◦) positions leading up to point of detection (solidline) and shortly after detection (dashed line); (b) sample image frame containing target aircraft.

Journal of Field Robotics DOI 10.1002/rob

Page 13: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 149

900 1000 1100 1200 1300 1400 1500 1600 1700 1800

-400

-300

-200

-100

0

100

200

300

East (m)

Nor

th (

m)

(a) (b)Figure 8. Engagement Scenario 2. (a) Camera aircraft (�) and target aircraft (◦) positions leading up to point of detection (solidline) and shortly after detection (dashed line); (b) sample image frame containing target aircraft.

spite of this, we were able to locate useful image sequencesfrom three separate engagement scenarios featuring twodifferent types of collision geometries. In two of the threescenarios, the camera aircraft and the target aircraft are con-verging head-on, and in one scenario the aircraft are con-verging at right angles to each other. Figures 7–9 illustrate

overhead views of the collision geometries flown in each ofthe three engagement scenarios; also shown is a zoomed-insample of the typical vision obtained of the target aircraftfor each scenario. The aircraft paths are plotted in a localeast, north, up (ENU) Cartesian coordinate system relativeto a runway used for takeoffs and landings (not shown).

900 1000 1100 1200 1300 1400 1500 1600 1700 1800-400

-300

-200

-100

0

100

200

300

East (m)

Nor

th (

m)

(a) (b)Figure 9. Engagement Scenario 3. (a) Camera aircraft (�) and target aircraft (◦) positions leading up to point of detection(solid line) and shortly after detection (dashed line); (b) sample image frame containing target aircraft.

Journal of Field Robotics DOI 10.1002/rob

Page 14: Airborne vision-based collision-detection system

150 • Journal of Field Robotics—2011

Analysis of the data in all three scenarios revealedthe presence of extreme jitter magnitudes exceeding 4 pix-els per frame. This level of jitter was not completely un-expected; the usual jitter anticipated from the use of anonstabilized camera was exacerbated by the relativelylightweight UAV platforms (compared with manned air-craft), which are more susceptible to external disturbancessuch as wind gusts and air turbulence.

Given the presence of extreme jitter and in light ofthe results from the earlier jitter performance character-ization, we considered it appropriate to undertake somelevel of jitter compensation of the image data before ap-plying the detection algorithms. As this paper is notprimarily concerned with the subject of camera-motioncompensation, we implemented a basic image correctionprocedure involving a combination of optical flow (Lu-cas & Kanade, 1981) and template matching (Brunelli,2009) techniques. Although this image correction proce-dure accounted only for translational motion, it was suf-ficient to reduce the overall effect of jitter to a moder-ate level. Image sequences with this moderate level of“residual” jitter were then processed by the detectionalgorithms.

4.2.2. Detection Results

Table III illustrates the detection performance of the HMMand Viterbi-based filtering approaches in each of the threeengagement scenarios. The detection ranges are calculatedbased on the camera and target aircraft GPS position in-formation corresponding to the image frame where detec-tion is declared [given that no unusual satellite geome-tries were encountered during the collection of the data,we estimate the 1-σ 3D position error of the GPS posi-tions to be approximately 17 m, on the basis of standardvalues for HDOP, VDOP, and UERE (Parkinson & Spilker,1996)].

In all three scenarios detection occurred a few framesearlier for the Viterbi-based algorithm when comparedwith the HMM algorithm. This is reflected in the slightlygreater detection ranges for the Viterbi-based approach.The approximate positions of the aircraft at the time of de-tection are marked on the flight paths given in Figures 7(a),

8(a), and 9(a) (due to the small differences in detectionranges, the aircraft positions corresponding to when de-tection is declared are not separately marked for the twoalgorithms). Multimedia extension 2 (see the Appendix)illustrates the performance of the HMM-based detectionalgorithm in Scenario 1. Overall, we obtained detectionranges that are generally consistent with the results re-ported in an earlier study (Carnie et al., 2006), whichapplied a detection algorithm similar to the Viterbi-basedapproach discussed in this paper. The earlier study at-tempted to detect a manned Cessna aircraft using imagedata captured by a fixed ground-based camera and re-ported detection ranges out to about 6 km. This range isroughly in proportion to our results in Table III, consideringthat a Cessna aircraft is around six to seven times the sizeof our Boomerang target aircraft. Although we do not claimthat a linear relationship exists between target size and de-tection distance, this comparison of detection range resultsreinforces the intuitive notion that larger targets should beable to be detected at greater ranges. We plan to conductfurther experiments using full-scale Cessna aircraft in orderto better understand the relationship between target sizeand detection range.

Furthermore, Table III shows the �DSNR values cor-responding to the two target detection approaches. The�DSNR values for the HMM filter are consistently higherthan the values for the Viterbi-based filter. This suggeststhat the HMM filter may be more effective at suppress-ing nontarget responses (while maintaining a strong re-sponse at the true target location), and as a consequence,we may expect in general a lower incidence of false alarmsfor the HMM filter than the Viterbi-based filter. We mayalso interpret the �DSNR values in terms of the empiricalSNR criterion (14) established earlier for successful track-ing. For the HMM filtering approach, the SNR criterion iseasily satisfied, with the �DSNR values in all three scenar-ios clearly greater than 16.065 = 0.5 × 32.13. On the otherhand, the results for the Viterbi-based filtering approach aremixed. Scenario 1 is a borderline case, and in Scenario 2the �DSNR value is just below the threshold of 1.46 =0.5 × 2.92. This suggests that the consistent detections es-tablished by the Viterbi-based filter in Scenarios 1 and 2 areperhaps less reliable, in the sense that the filter may possi-bly be on the verge of losing track of the target.

Table III. Target detection performance of HMM and Viterbi-based temporal filtering.

HMM filter Viterbi-based filter

Scenario Geometry Detection range (m) �DSNR Detection range (m) �DSNR

1 Head on 412 38.04 425 1.612 Head on 562 31.94 564 1.233 Right angle 881 19.55 893 2.83

Journal of Field Robotics DOI 10.1002/rob

Page 15: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 151

We conjecture that residual jitter effects present in theimage sequences after compensation (after all, no compen-sation technique is perfect) may be partly responsible forthe difference in detection range performance. This is be-cause residual jitter would tend to affect the HMM fil-ter slightly more so than the Viterbi-based filter in viewof our results from the earlier jitter performance analysis.We also highlight the significantly larger detection rangesreported for the right-angle collision geometry comparedwith the head-on cases. One possible explanation for thisis that the target aircraft tends to appear as a larger ob-ject from the right-angle perspective (cross-section area ap-proximately 0.25 m2) than from head-on (cross-section areaapproximately 0.1 m2).

We have used detection range as a metric for compar-ing the performances of the two candidate target detectionalgorithms. However, from an operational point of view itis the time to impact from the point of detection that is per-haps more informative, rather than the absolute distanceto the conflicting aircraft. This is because, analogous to thetime needed by human pilots to respond to a collision sit-uation, an automated system also requires an interval oftime to plan and carry out appropriate actions. Depend-ing on the closing speed of the aircraft involved, a par-ticular detection range may or may not provide sufficienttime to accommodate both processing needs and aircraftresponse lag.

The Federal Aviation Administration has issued an ad-visory circular that provides guidance on the time requiredfor a pilot to recognize an approaching aircraft and exe-cute an evasive maneuver (Federal Aviation Administra-tion, 1983). According to the advisory circular, as a generalrule a conflicting aircraft must be detected at least 12.5 sprior to the time of impact. This motivates us to undertakesome further analysis to gauge approximately how muchtime ahead of impact the system was able to detect the tar-get. We chose to conduct our analysis on the worst-case sce-narios, i.e., the head-on geometries of Scenarios 1 and 2. Inour study we projected, from the point of detection, hypo-thetical trajectories for each aircraft corresponding to theshortest possible collision-course path (this path is definedby a straight line drawn between the camera and target air-craft positions at the point of detection). Each aircraft’s av-erage speed in the last 5 s leading up to the point of de-tection was used as the speed traveled along the projectedtrajectories. Under these circumstances, the time to impactat the point of detection in Scenario 1 was estimated to bearound 8 s (based on a combined closing speed of 51 m/s),and for Scenario 2 the time was around 10 s (based on acombined closing speed of 53 m/s). It is clear that thesetimes are below the recommended 12.5 s; however, our re-sults must be considered in the appropriate context.

The advisory circular information pertains to humanpilots, with the 12.5 s broken down into the time takento perform various subtasks. Over half of the time is allo-

cated to the tasks of recognizing the existence of the colli-sion situation, making a decision to perform an avoidancemaneuver, and manipulating the aircraft controls to exe-cute the maneuver. An automated system must also carryout similar tasks, but there is potential for the tasks to becompleted much more quickly given the rapidly improv-ing processing capacities of computing hardware and thenear-instantaneous actuation of flight controls. Hence, weexpect an equivalent safe detection time ahead of impactfor automated systems to be less than 12.5 s.

4.3. Proposed System Hardware Configurationand Performance

Image processing is traditionally a very computationallyintensive task. Hence, we considered it a challenge toachieve real-time performance for our proposed vision-based collision-detection system. The increasing complex-ity of image processing algorithms is pushing at theprocessing limits of modern CPU-based computing archi-tectures, even with the use of optimized computer visionlibraries. Highly specialized hardware such as field pro-grammable gate arrays (FPGAs) and dedicated digital sig-nal processors have been used to overcome the limitationsof the CPU; however, such hardware is typically not read-ily accessible. In this paper, we investigate the potential ofwidely available commercial-off-the-shelf (COTS) GPUs forrunning our candidate target detection algorithms. The par-allel processing [single-instruction, multiple-data (SIMD)]capability of GPUs is ideally suited to computer visiontasks, which often require the same calculations to be per-formed repeatedly on each individual pixel (or group ofpixels) over an entire image.

Figure 10(a) illustrates the high-level hardware archi-tecture of our collision-detection system. Particular atten-tion was paid to the interaction between hardware compo-nents to enhance processing speed. The key design strate-gies that we followed included the following: (1) the GPUmodule should carry out all the computationally intensiveimage processing tasks, (2) memory transfers between thehost computer and the GPU module should be minimized,and (3) the host computer should be kept free to performother non-image-related tasks. By adhering to these strate-gies, we produced a system whereby each image frame cap-tured by the vision sensor was directly copied from the hostcomputer to the GPU module for processing. This allowedthe entire detection algorithm to be executed on the GPUmodule and ensured that the GPU was utilized efficiently(in addition, offloading all the image processing to the GPUmodule eliminated time-consuming memory transfers thatwould otherwise be required if the processing was sharedbetween the host and GPU). Once processing is completedon the GPU, only one additional memory transfer is neededto return the processing result (for example, a detectionprobability) to the host computer for further analysis. This

Journal of Field Robotics DOI 10.1002/rob

Page 16: Airborne vision-based collision-detection system

152 • Journal of Field Robotics—2011

Figure 10. (a) High-level system architecture; (b) GPU image processing architecture.

entire process is repeated at the capture of the next imageframe.

Recently, the parallel processing capabilities of GPUshave been exploited by other authors for tracking applica-tions (Ohmer & Redding, 2008). Figure 10(b) illustrates thegeneral way in which we exploited the GPU for processingimages. At a high level, this involved dividing the imageframe into blocks, with each block containing a number ofthreads. We aimed to allocate one thread to each pixel ofthe image frame so that operations could be carried out si-multaneously on all pixels, greatly enhancing the process-ing speed.

Overall, as a minimum standard, the system wasrequired to handle real-time processing of image data(1,024 × 768 pixel image frames, encoded at 8 bits per pixel)captured at a rate of 30 Hz. As a first step to character-izing the performance of a GPU-based system, we benchtested an implementation of the system architecture in Fig-ure 10(a) using the following desktop computer compo-nents:

• Host computer: Intel Pentium IV 3.2-GHz CPU (withhyper-threading); 1 Gb SDRAM at 666 MHz; LinuxUbuntu 32 Bit operating system

• GPU module: NVIDIA GeForce GTX 280, 1 Gb GDDR3RAM (GV-N28-1 GH-B)

The NVIDIA GeForce GTX 280 has 30 multiprocessors anda compute capability of 1.3 (NVIDIA CUDA, 2008). Pro-gramming access to the GPU module was facilitated byNVIDIA’s Compute Unified Device Architecture (CUDA)framework (NVIDIA CUDA, 2008). Using this configura-tion, we were able to achieve an average frame rate of ap-proximately 150 Hz (or an average of 7.47 ms to processeach frame) for the Viterbi-based detection algorithm and

a frame rate of approximately 130 Hz (or an average of6.51 ms to process each frame) for the HMM algorithm(these processing times include file input/output opera-tions). It is clear that the processing rates for both algo-rithms are well in excess of our earlier 30-Hz requirement.

Our bench test with desktop components has demon-strated the enormous potential of GPUs for realizing real-time performance in vision-based systems. Motivated bythese encouraging results, we have implemented a flight-ready system architecture using the following components:

• Host computer: Intel Core 2 Duo 2.4 GHz CPU; 2 GbSDRAM at 800 MHz; Linux Debian 32-bit operatingsystem

• GPU module: NVIDIA GeForce 9600 GT, 512 MbGDDR3 RAM (GV-N96TGR-512I)

Figure 11 illustrates the physical hardware layout of thesystem, which is currently being integrated on a custom-modified Cessna 172 aircraft, as shown in Figure 12.

The NVIDIA GeForce 9600 GT was selected as it of-fered a good balance between processing performance,power consumption, and volume requirements. Specifi-cally, it was the highest performing (in terms of the num-ber of multiprocessors) GPU device that did not require anexternal power connection to supplement supply from thePCI bus. Figure 11 illustrates the GeForce 9600 GT (right)alongside the host computer. The GPU has eight multipro-cessors and a compute capability of 1.1 and consumes only59 W of power (low-power version).

From experience, our implementation of the detec-tion algorithms on the GPU scales roughly linearly withthe number of multiprocessors. Hence, we anticipate pro-cessing rates on our flight hardware configuration to beapproximately four times less than those reported for the

Journal of Field Robotics DOI 10.1002/rob

Page 17: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 153

Figure 11. Flight-ready hardware configuration.

Figure 12. Cessna 172 aircraft.

bench tests. This translates to a frame rate of around 37 Hzfor the Viterbi-based algorithm and a frame rate of around32 Hz for the HMM approach, which both still satisfyour 30-Hz requirement. We highlight that there is still

room for further improvement in processing speeds, as wehave yet to exploit advanced GPU code optimization tech-niques (such as pipelining and dynamic memory allocationmethods).

Journal of Field Robotics DOI 10.1002/rob

Page 18: Airborne vision-based collision-detection system

154 • Journal of Field Robotics—2011

Figure 13. Camera-mounting bracket for Cessna aircraft.

5. LESSONS LEARNED AND FUTURE WORK

Our long-term goal extends beyond the ability to sim-ply detect potential collision threats. We aim to establisha well-refined target detection capability, before progres-sively evolving this into a fully autonomous closed-loopcollision avoidance system. This closed-loop system willinclude decision-making and control modules so that theaircraft may automatically and actively respond to any de-tected threats. In the near term, our plans are to conductfurther testing and refinement of the detection system, asdiscussed in the following sections.

5.1. Hardware Implementation and Evaluation

Implementation of the flight hardware configuration iscomplete. The next step is to verify the anticipated process-ing speed of the hardware under realistic operating condi-tions via several flight trials using full-scale manned Cessnaaircraft and UAV platforms. Should the actual speed fallbelow our expectations or our minimum requirements, wemay consider employing advanced code optimization tech-niques to improve processing speed. Alternatively, we maychoose to upgrade to a more powerful GPU device at theexpense of increased energy consumption.

5.2. Target Detection

Based on the detection range and image jitter performanceresults, neither the HMM nor the Viterbi-based algorithm

stood out as the decidedly better detection approach. Ad-mittedly, only three data sets were used in the analysis ofdetection range; hence, we intend to collect more data sothat a more comprehensive performance analysis can beconducted. Our next planned flight experiments will be car-ried out using manned Cessna aircraft, which we antici-pate may alleviate the synchronization issues that we en-countered with UAV platforms (see next section) and allowhigher quality data to be collected in larger quantities. Acustom-designed bracket has been manufactured that al-lows a camera to be mounted near the intersection of thewing and the supporting strut, as shown in Figure 13.

Moreover, our image jitter analysis results combinedwith the interframe motion observed in the collected im-age data suggest that jitter compensation is necessary forour target detection algorithms to perform effectively ona UAV platform using an unstablized camera. In additionto the simple image-based compensation approach used inthis paper, we are testing more advanced image correctiontechniques based on aircraft attitude and inertial measure-ments. If our compensation techniques prove to be inad-equate, we may consider the use of a stabilized cameraand/or increasing the data capture rate to further mitigatejitter effects in the captured data.

5.3. Data Collection

We encountered many challenges in the collection of suit-able test data, particularly in our flight trials that simulated

Journal of Field Robotics DOI 10.1002/rob

Page 19: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 155

collision-course scenarios. These experiments required theaircraft flights to be synchronized in time so that the tar-get aircraft is kept in the field of view of the camera air-craft for as long as possible. We found it particularly dif-ficult to achieve a precise sychronization of the flights. Onreflection, we have identified several factors that have con-tributed to this: (1) the inability to colocate the groundcontrol stations of each aircraft due to the range limitsof aircraft communication links (our suboptimal approachto overcoming this was to station personnel at separateground control stations and to coordinate the release of theaircraft from their holding patterns via hand-held radios);(2) the use of fixed-wing platforms, which are more dif-ficult to position accurately as they must be kept in con-stant motion (aircraft positions were monitored by sightand through the MicroPilot® Horizon software); and (3) theinability to make adjustments to flight paths once the air-craft are released from their holding patterns. This is due tothe aircraft being operated in an autonomous mode.

In the short term, we will switch to manned Cessnaplatforms in an attempt to improve the synchronization offlights in future data collection experiments. In the longterm, we are considering upgrades to antennas and cablingto improve signal gain so that the UAV ground control sta-tions can be colocated. This will allow the flights to be bettercoordinated. It may also be possible to improve flight syn-chronization by introducing more flexibility into the flightplan; for example, by adding a small segment of manualcontrol just after the aircraft’s release from the holding pat-tern so that adjustments and corrections can be made to im-prove timing if necessary (remotely piloting the aircraft forthe entire experiment is not considered feasible as it is toodifficult to maintain a consistent altitude for an extendedperiod of time). An alternative is to develop new softwarethat will allow both UAV platforms to be controlled by asingle control station. This will open up the opportunity tohave the aircraft flights automatically synchronized with-out human intervention. Overall, the field and operationalexperiences gained from our flight trials will be invaluablein fine-tuning future data collection experiments.

6. CONCLUSION

This paper proposes a vision-based collision-detection sys-tem for aerial robotics. It considers two target detectionapproaches and a GPU-based hardware implementation.Special flight experiments were conducted to collect suit-able test data for evaluating the detection capabilities of aHMM-based detection algorithm and a Viterbi-based algo-rithm. The detection algorithms were able to detect targetsat distances ranging from 400 to around 900 m (dependingon the collision geometry). Based on aircraft closing speedsleading up to the point of detection and a hypothetical ex-trapolation of aircraft trajectories, these detection distanceswould provide an advance warning about 8–10 s prior to

impact. These results are not too far from the 12.5-s re-sponse time recommended for human pilots and are anencouraging sign for vision-based collision-detection ap-proaches. We exploit the parallel processing capabilities ofGPUs to enable our detection algorithms to execute in realtime and anticipate our flight-ready hardware configura-tion to achieve frame rates of approximately 30 Hz, on thebasis of earlier bench-testing results.

APPENDIX: INDEX TO MULTIMEDIA EXTENSIONS

The videos are available as Supporting Information in the on-line version of this article.

Extension Media type Description

1 Video Illustration of HMM-based detectionalgorithm in laboratory testingusing ground-captured image data

2 Video Performance of HMM-baseddetection algorithm in head-oncollision Scenario 1 (plus additionalaircraft following scenario)

ACKNOWLEDGMENTS

This research was supported under the Australian ResearchCouncil’s Linkage Projects funding scheme (Project NumberLP100100302) and the Smart Skies Project, which is funded,in part, by the Queensland State Government Smart StateFunding Scheme. The authors also gratefully acknowledgethe contribution of Scott McNamara and Angus Thomsonfor support with the GPU experimentations.

REFERENCES

Arnold, J., Shaw, S. W., & Pasternack, H. (1993). Efficient targettracking using dynamic programming. IEEE Transactionson Aerospace and Electronic Systems, 29, 44–56.

ASTM. (2004). Standard specification for design and perfor-mance of an airborne sense-and-avoid system, F2411-04.West Conshohocken, PA: ASTM International.

Australian Transport Safety Bureau. (1991). Limitations of thesee-and-avoid principle (Tech. Rep. 0642 160899). Aus-tralian Transport Safety Bureau, Australian Government.

Barnett, J. T., Billard, B. D., & Lee, C. (1993, April). Nonlinearmorphological processors for point-target detection ver-sus an adaptive linear spatial filter: A performance com-parison. In Proceedings of Signal and Data Processing ofSmall Targets, Orlando, FL.

Barniv, Y. (1985). Dynamic programming solution for detectingdim moving targets. IEEE Transactions on Aerospace andElectronic Systems, AES-21, 144–156.

Barniv, Y., & Kella, O. (1987). Dynamic programming solutionfor detecting dim moving targets, part II: Analysis. IEEETransactions on Aerospace and Electronic Systems, AES-23, 776–788.

Journal of Field Robotics DOI 10.1002/rob

Page 20: Airborne vision-based collision-detection system

156 • Journal of Field Robotics—2011

Billingsley, P. (1995). Probability and measure, 3rd ed. NewYork: Wiley.

Braga-Neto, U., Choudhary, M., & Goutsias, J. (2004). Auto-matic target detection and tracking in forward-lookinginfrared image sequences using morphological con-nected operators. Journal of Electronic Imaging, 13, 802–813.

Brunelli, R. (2009). Template matching techniques in computervision: Theory and practice. Chichester, UK: Wiley.

Bruno, M. G. S. (2004). Bayesian methods for multiaspect targettracking in image sequences. IEEE Transactions on SignalProcessing, 52, 1848–1861.

Bruno, M. G. S., & Moura, J. M. F. (1999). The optimal 2D multi-frame detector/tracker. International Journal of Electron-ics and Communications (AEU), 53, 1–17.

Bruno, M. G. S., & Moura, J. M. F. (2001). Multiframe detec-tor/tracker: Optimal performance. IEEE Transactions onAerospace and Electronic Systems, 37, 925–945.

Bryner, M. (2006). Developing sense & avoid for all classesof UAS. In Unmanned aircraft systems: The globalperspective 2007/2008 (pp. 142–146). Paris: Blyenburgh& Co.

Carnie, R., Walker, R., & Corke, P. (2006, May). Image process-ing algorithms for UAV “sense and avoid.” In Proceed-ings of IEEE International Conference on Robotics andAutomation (ICRA), Orlando, FL.

Casasent, D., & Ye, A. (1997). Detection filters and algorithmfusion for ATR. IEEE Transactions on Image Processing, 6,114–125.

Civil Aviation Safety Regulations 1998. (2009). Tech. Rep. SR1998 No.237. Civil Aviation Safety Authority, AustralianGovernment.

Davey, S. J., Rutten, M. G., & Cheung, B. (2008). A comparisonof detection performance for several track-before-detectalgorithms. EURASIP Journal on Advances in Signal Pro-cessing, 2008, 1–10.

DeGarmo, M. T. (2004). Issues concerning integration ofunmanned aerial vehicles in civil airspace (Tech.Rep. MP 04W0000323). McLean, VA: The MITRECorporation.

Deshpande, S. D., Er, M. H., Venkateswarlu, R., & Chan, P.(1999, July). Max-mean and max-median filters for detec-tion of small targets. In Proceedings of Signal and DataProcessing of Small Targets, Denver, CO.

Dey, D., Geyer, C., Singh, S., & Digioia, M. (2009, July). Passive,long range detection of aircraft: Towards a field deploy-able sense and avoid system. In Proceedings of Field andService Robotics, Cambridge, MA.

Dougherty, E. R., & Lotufo, R. A. (2003). Hands-on morpho-logical image processing. Bellingham, WA: SPIE OpticalEngineering Press.

Ebdon, M. D., & Regan, J. (2004). Sense-and-avoid requirementfor remotely operated aircraft (ROA). Air Combat Com-mand, Langley Air Force Base.

Elliott, R. J., Aggoun, L., & Moore, J. B. (1995). Hidden Markovmodels: Estimation and control. Berlin: Springer-Verlag.

Federal Aviation Administration. (1983). FAA Advisory Circu-lar: Pilots’ role in collision avoidance (Tech. Rep. AC 90-

48C). Federal Aviation Administration, U.S. Departmentof Transportation.

Forney, G. D. J. (1973). The Viterbi algorithm. Proceedings ofthe IEEE, 61, 268–278.

Gandhi, T., Yang, M.-T., Kasturi, R., Camps, O., Coraor, L., &McCandless, J. (2003). Detection of obstacles in the flightpath of an aircraft. IEEE Transactions on Aerospace andElectronic Systems, 39, 176–191.

Gandhi, T., Yang, M.-T., Kasturi, R., Camps, O., Coraor, L.,& McCandless, J. (2006). Performance characterisationof the dynamic programming obstacle detection algo-rithm. IEEE Transactions on Image Processing, 15, 1202–1214.

Geyer, C., Dey, D., & Singh, S. (2009). Prototype sense-and-avoid system for UAVs (Tech. Rep. CMU-RI-TR-09-09).Robotics Institute, Carnegie Mellon University.

Geyer, C., Singh, S., & Chamberlain, L. (2008). Avoiding col-lisions between aircraft: State of the art and require-ments for UAVs operating in civilian airspace (Tech. Rep.CMU-RI-TR-08-03). Robotics Institute, Carnegie MellonUniversity.

Gonzalez, R. C., Woods, R. E., & Eddins, S. L. (2004). Morpho-logical image processing. In Digital image processing us-ing MATLAB (pp. 334–377). Upper Saddle River, NJ: Pear-son Prentice Hall.

JiCheng, L., ZhengKang, S., & Tao, L. (1996, May). Detection ofspot target in infrared clutter with morphological filter. InProceedings of National Aerospace and Electronics Con-ference, Dayton, OH.

Johnston, L. A., & Krishnamurthy, V. (2000, June). Performanceanalysis of a track before detect dynamic programming al-gorithm. In Proceedings of IEEE International Conferenceon Acoustics, Speech, and Signal Processing (ICASSP), Is-tanbul, Turkey.

Jorge, L., & Jorge, D. (2004). Inertial sensed ego-motion for 3Dvision. Journal of Robotic Systems, 21, 3–12.

Jung, B., & Sukhatme, G. S. (2005). Real-time motion trackingfrom a mobile robot (Tech. Rep. CRES-05-008). Center forRobotics and Embedded Systems, University of SouthernCalifornia.

Karhoff, B., Limb, J., Oravsky, S., & Shephard, A. (2006, April).Eyes in the domestic sky: An assessment of sense andavoid technology for the army’s “warrior” unmannedaerial vehicle. In Proceedings of IEEE Systems and In-formation Engineering Design Symposium (SIEDS), Char-lottesville, VA.

Lai, J., & Ford, J. J. (2010). Relative entropy rate based multiplehidden Markov model approximation. IEEE Transactionson Signal Processing, 58, 165–174.

Lai, J., Ford, J. J., O’Shea, P., & Walker, R. (2008, December).Hidden Markov model filter banks for dim target detec-tion from image sequences. In Proceedings of Digital Im-age Computing: Techniques and Applications (DICTA),Canberra, Australia.

Lucas, B. D., & Kanade, T. (1981, April). An iterative image reg-istration technique with an application to stereo vision. InProceedings of International Joint Conference on ArtificialIntelligence (IJCAI), Vancouver, Canada.

Journal of Field Robotics DOI 10.1002/rob

Page 21: Airborne vision-based collision-detection system

Lai et al.: Airborne Vision-Based Collision-Detection System • 157

Maroney, D. R., Boiling, R. H., Heffron, M. E., & Flathers, G. W.(2007, October). Experimental platforms for evaluatingsensor technology for UAS collision avoidance. In Pro-ceedings of Digital Avionics Systems Conference (DASC),Dallas, TX.

NVIDIA CUDA. (2008). Compute unified device architec-ture: Programming guide version 2.0. Retrieved February10, 2010, from http://developer.nvidia.com/object/cudadocs.html.

Office of the Secretary of Defense (2004). Airspace integrationplan for unmanned aviation (Tech. Rep. ADA431348).

Ohmer, J. F., & Redding, N. (2008, December). GPU-accelerated KLT tracking with Monte-Carlo-based featurereselection. In Proceedings of Digital Image Comput-ing: Techniques and Applications (DICTA), Canberra,Australia.

Owens, J. D., Luebke, D., Govindaraju, N., Harris, M.,Kruger, J., Lefohn, A. E., & Purcell, T. J. (2005, August).State-of-the-art-reports: A survey of general-purposecomputation on graphics hardware. In Proceedings of Eu-ropean Association for Computer Graphics Conference(Eurographics), Dublin, Ireland.

Parkinson, B. W. & Spilker, J. J., Jr. (Eds.) (1996). GPS erroranalysis. In Global positioning system: Theory and appli-cations (vol. 2, pp. 478–483). Washington, DC: AmericanInstitute of Aeronautics and Astronautics.

Petridis, S., Geyer, C., & Singh, S. (2008). Learning to detectaircraft at low resolutions. In Lecture Notes in ComputerScience (vol. 5008, pp. 474–483). Berlin: Springer.

Rabiner, L. R. (1989). A tutorial on hidden Markov models andselected applications in speech recognition. Proceedingsof the IEEE, 77, 257–286.

Schaefer, R., & Casasent, D. (1995). Nonlinear optical hit-miss transform for detection. Applied Optics, 34, 3869–3882.

Soille, P. (2003). Opening and closing. In Morphological imageanalysis: principles and applications, 2nd ed. (pp. 105–137). Berlin: Springer.

Tom, V. T., Peli, T., Leung, M., & Bondaryk, J. E. (1993, April).Morphology-based algorithm for point target detection ininfrared backgrounds. In Proceeding of Signal and DataProcessing of Small Targets, Orlando, FL.

Tonissen, S. M., & Evans, R. J. (1996). Performance of dy-namic programming techniques for track-before-detect.IEEE Transactions on Aerospace and Electronic Systems,32, 1440–1451.

Unmanned Aircraft Systems Roadmap 2007–2032 (2007). Tech.Rep. ADA475002. Office of the Secretary of Defense, De-partment of Defense.

Utt, J., McCalmont, J., & Deschenes, M. (2004, September). Testand integration of a detect and avoid system. In Proceed-ings of AIAA Infotech, Chicago, IL.

Utt, J., McCalmont, J., & Deschenes, M. (2005, September).Development of a sense and avoid system. In Proceedingsof AIAA Infotech, Arlington, VA.

Warren, R. C. (2002). A Bayesian track-before-detect algo-rithm for IR point target detection (Tech. Rep. DSTO-TR-1281). Melbourne, Australia: Aeronautical and MaritimeResearch Laboratory, Defence Science and TechnologyOrganisation.

Xie, L., Ugrinovskii, V. A., & Petersen, I. R. (2005). Probabilis-tic distances between finite-state finite-alphabet hiddenMarkov models. IEEE Transactions on Automatic Control,50, 505–511.

Yu, N., Wu, H., Wu, C., & Li, Y. (2003). Automatic target de-tection by optimal morphological filters. Journal of Com-puter Science and Technology, 18, 29–40.

Zeng, M., Li, J., & Peng, Z. (2006). The design of top-hat morphological filter and application to infraredtarget detection. Infrared Physics and Technology, 48,67–76.

Zhu, Z., Li, Z., Liang, H., Song, B., & Pan, A. (2000, July).Gray-scale morphological filter for small-target detection.In Proceedings of SPIE International Symposium on Opti-cal Science and Technology: Infrared Technology and Ap-plications, San Diego, CA.

Journal of Field Robotics DOI 10.1002/rob