probabilistic search and surveillance using a video...

Enhanced UAS Surveillance Using a Video UtilityMetric

Peter Niedfeldt Brandon Carroll Joel Howard Randal BeardBryan Morse Stephen Pledgie ∗

Abstract

A successful mission for an Unmanned Air System (UAS) often depends on the ability ofhuman operators to utilize data collected from onboard imaging sensors. Many hours are spentpreparing and executing flight objectives, putting a tremendous burden on human operators bothbefore and during the flight. We seek to automate the planning process to reduce the workload forUAS operators while also optimizing the quality of the collected video stream. We first proposea metric based on an existing image utility metric to estimate the utility of video captured bythe onboard cameras. We then use this metric to plan not only the UAS flight path, but alsothe path of the cameras’s optical axis projected along the terrain and the zoom level. Sincecomputing an optimal solution is NP-hard and therefore infeasible, we subsequently describe astaged sub-optimal path planning approach to autonomously plan the UAS flight path and sensorschedule. We apply these algorithms to precompute UAS and sensor paths for a surveillancemission over a specified region. Simulated and actual flight test results are included.

1 Introduction

Unmanned Air Systems (UASs) offer a platform to strategically position sensors such as EO/IRcameras, communication equipment, etc. to perform a variety of tasks. The breadth of possible UASapplications and the frequency of their utilization continues to increase. Developing autonomousUAS systems to complete these tasks will reduce the burden and risk for human operators whowould otherwise perform them. Current civilian and military applications include Wilderness Searchand Rescue [36, 9, 17, 16, 27], law enforcement [5], surveillance [24], reconnaissance [10], and targettracking [34, 35, 22].

A critical need of UAS research is to further reduce the workload for UAS operators. The state-of-the-art UAS technology requires multiple human operators to both plan and fly a UAS, whilesimultaneously controlling the onboard sensors [26, 2, 35, 14]. While some aspects of the planningprocess have been automated, we have not found a method that simultaneously plans all flight andsensor paths. By autonomously planning the UAS flight path, the sensor gimbal angles, and the sensor

∗This work was supported by Army SBIR Contract W911W6-09-C-0047 and also by the DoD, Air Force Office ofScientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a.Peter Niedfeldt is pursuing his PhD and Brandon Carroll received his MS degree from the Electrical and ComputerEngineering Department, Joel Howard is a MS student in the Computer Science Department, and Randy Beard andBryan Morse are Professors of Electrical and Computer Engineering and Computer Science, all at Brigham YoungUniversity. Stephen Pledgie is Chief Scientist for Mosaic ATM’s Autonomous Systems Group.

1

resolution, UAS operators would only need to focus their attention on the sensor display and makehigh-level decisions. In this paper we develop an algorithm to simultaneously plan the flight path andthe sensor schedule for surveillance over a region of interest.

Mission goals and other high-level decisions are often based on sensor data collected during theflight. To provide usable video to users, an automated planner would need to predict and plan pathsresulting in high-utility sensor measurements. We define high-utility measurements for an EO/IRcamera to be video where the operator can perform detection/recognition/identification (DRI) with ahigh probability of success. Several image utility metrics have been developed, such as in [19, 20, 39, 29].However, we are unaware of existing metrics that estimate the probability of DRI for aerial video.

In this paper we extend the Targeting Task Performance (TTP) metric for still images proposedin [39] to video sequences. The resulting metric is the Video TTP (VTTP) metric. The VTTP metricis used as a utility function to autonomously plan UAS flight paths and sensor schedules by predictingand measuring video utility. We use VTTP to estimate the probability of DRI of objects as if theywere in the video stream. By optimizing the predicted probability of DRI along the planned paths,we can autonomously generate flight paths that improve video utility.

The planning algorithms presented in this paper are formulated for surveillance applications butcan be extended to probabilistic search and other applications. All UAS probabilistic search missionsrequire path planning. These missions do not have a specified goal state; rather, we desire to maximizethe probability of DRI over the UAS path and sensor schedule. A common solution to this NP-hard problem is to use a multi-step look-ahead algorithm to plan the UAS path [17, 27, 8, 11, 40].Unfortunately, this approach alone is not computationally feasible when the dimensionality of thesearch space includes position of the UAS, gimbal pointing, and sensor zoom level.

Quigley et al. present a semi-autonomous search method to locate targets by fixing the gimbalazimuth and elevation and flying the UAS manually until the target is located. After localizing thetarget, they switch to fully autonomous target tracking algorithms that orbit the target [34]. Weextend their work to further reduce the operator workload by autonomously planning both the UASflight path and the sensor schedule to acquire high-utility video.

Waharte et al. develop algorithms that allow a hovercraft to vary the number of terrain cells inthe sensor field of view (FOV) by adjusting its altitude, effectively controlling the zoom during thesearch. They then use a greedy algorithm to control both the sensor resolution and the hovercraftnorth-east coordinates [40]. We extend their work by allowing our fixed wing aircraft to gimbal thesensor while planning the flight path and the zoom level.

Chung and Burdick propose five methods of path planning for searching applications. Two ofinterest are the Saccadic search and the Drosophila-inspired search, where the UAS is commandedto point the camera at the locations of highest probability while either ignoring or accounting forgimbal slew-rate constraints, respectively [11]. We improve on these concepts by allowing the plannerto maximize video utility along the UAS and sensor paths between high-reward locations.

Our work uses a technique similar to the chain-based method developed by Argyle et al. thatplans 2D UAS paths [4]. We extend their work by planning a sensor path with dynamic link lengthsto allow the sensor FOV to dwell in regions of interest. We then plan the UAS path and the zoomschedule according to the sensor path.

This paper elaborates and extends our previous work presented in [32] and describes our contribu-tions to the state of the art in four areas. First, we present a novel video utility metric that extendswell-established image utility metrics. Second, we describe the theoretical, optimal parameterized UASflight path and sensor schedule that maximizes the reward or utility over the path. Unfortunately,while this formulation solves for the optimal path in theory, we are unaware of any method that

2

can solve this infeasible, NP-hard problem. Therefore, we exchange the guarantee of the optimalsolution for a computationally feasible solution by developing a staged approach to plan the UASflight and sensor paths. While we utilize standard non-linear optimization techniques to solve eachsub-problem, our third contribution is the reformulation of the theoretical path planning problem intothree planning stages. Lastly, we extend the planning algorithm presented by Argyle, et al. to accountfor dynamic link lengths used to plan a sensor path.

We apply these contributions to a surveillance mission to provide higher-utility video footage ofthe environment. We begin by describing the problem and our notation in Section 2. In Section 3we describe the VTTP video quality metric. Section 4 presents the path planning and schedulingalgorithms. We plan a flight path and sensor schedule in simulation and fly the path in hardware,presenting our results in Section 5. This autonomous planner helps to reduce the workload of theUAS operators.

2 Problem Formulation and Notation

In this paper, the objective is to survey a given location and the surrounding terrain using a UAS witha gimbaled imaging sensor with zoom capabilities. Let the position and attitude of the UAS and thepayload configuration be represented by the state vector x, which evolves according to the dynamicequation x = f(x, u) given x(0) = x0 and the control input u(t). We parameterize the control inputu(t), or the desired UAS path and sensor schedule, by a set of N waypoints Θ = (Θ>s ,Θ

>u ,Θ

>ζ )> where

Θs = (ns,1, es,1, . . . , ns,N , es,N)>

Θu = (nu,1, eu,1, . . . , nu,N , eu,N)>

Θζ = (ζ1, . . . , ζN)> ,

and where ns,i and es,i are the ith north and east sensor waypoints that indicate where the sensoroptical axis should intersect the terrain, nu,i and eu,i are the ith UAS north and east waypoints, andζi is the ith focal length waypoint indicating the zoom level of the sensor at the waypoint.

With an abuse of notation, we let u(Θ) indicate the control input that will be needed to maneuverthe vehicle and sensor through the waypoints designated by Θ over a fixed look-ahead horizon T . Wedefine the control input such that the closed-loop UAS dynamics follow a constant roll angle betweenwaypoints, resulting in the UAS flying along a connecting arc. The heading at each waypoint isdeterministic if the heading at the previous waypoint is known. Thus, setting the initial heading forthe first waypoint in the path recursively determines the headings for all the following waypoints. Tocalculate the heading ψi at Θu,i, we construct a circle that passes through both Θu,i−1 and Θu,i and istangent to the initial heading ψi−1 at Θu,i−1. Figure 1 illustrates the geometry of this problem.

Using geometric principles, it can be shown that the change in heading ∆ψi after traveling fromΘu,i−1 to Θu,i is given by ∆ψi = 2ε, where the angle ε is calculated as the angle between the headingψi−1 and the vector Θu,i −Θu,i−1. The UAS position and pose can be determined by calculating the

distance traveled along the arc between Θu,i−1 and Θu,i of the circle with radius ρ =√

Lu

2(1−cos (χ), where

χ = ∆ψi is the angle between the radii of the circle to each waypoint and Lu = ‖Θu,i−1 −Θu,i−1‖. Weassume that the UAS flies at a fixed altitude to maximize the endurance of the vehicle [23].

The desired sensor state trajectory is given by linearly interpolating between each waypoint in Θs.Note that the sensor path is not parameterized using the gimbal angles, as linear interpolation usingthat parameterization creates inherently non-linear motion of the sensor point of interest along the

3

ψi

ψi−1

ρ

ρ

∆ψi

ε

ε

Θu,i

Θu,i−1

Lu

North

χ

Figure 1: Geometry used to calculate the heading ψi at UAS waypoint Θu,i given a previous headingψi−1 at UAS waypoint Θu,i−1.

terrain, resulting in optical flow that is unpleasant for the operator to view. The desired zoom stateis a linear interpolation between the waypoints defined by Θζ . When planning, we consider both thecontinuous range of focal lengths [ζmin, ζmax] and also a fixed number of discrete zoom levels.

Let ξ(t, τ,Θ) be the family of solutions to x = f(x, u) with initial condition ξ(t, 0,Θ) = x(t) andparameterized by Θ. These paths are planned over the finite look-ahead window τ ∈ [0, T ], whereT is the look-ahead horizon. In other words, the predicted UAS states at time t + τ when flying apath described by Θ are given by the function ξ(t, τ,Θ). The look-ahead horizon is a function of thenumber of waypoints N and the fixed time between each waypoint ∆t, such that T = N ∆t.

Let Z be a set of discrete terrain points, where each z ∈ Z is an inertially defined north, east,down position that specifies a point in R3 that resides on the terrain surface. Let zbase be the terrainlocation representing the center of the region of interest. We define the set FOV(x) to be the set ofterrain points z that are in the field of view given the state vector x. We desire to know the probabilitythat an operator will be able to correctly perform DRI tasks for each element in FOV(x). Since wecannot determine the true probability of DRI, we model it using the VTTP metric, which will beformally defined in Section 3.

The VTTP metric estimates the probability of an operator successfully performing a DRI task ifan object were located at terrain point z. The specific VTTP value is determined by the state vectorx and a set of parameters P that include the terrain elevation model, the dimensions of potentialobjects of interest, and other parameters that will be discussed in Section 3. Therefore, the VTTPvalue is given by the function VTTP(z, x,P) ∈ [0, 1]. Section 2.1 introduces an evolving reward mapto denote regions of interest in the map. In Section 4 we describe a path planning algorithm thatmaximizes the reward weighted by the V TTP (z, x,P) along a path to plan UAS flight path andsensor schedule to observe regions of interest with higher video utility.

2.1 Reward Map

When performing surveillance missions there may be regions of the terrain that are more importantto observe than others. We desire to view these regions of interest frequently and with higher qualityto provide better utility to the operator. This capability is enabled by an evolving reward map, which

4

Figure 2: Top and side views of the evolving reward function after 20 seconds.

represents the reward available to the system if it observes the terrain point z. The reward map isdenoted as r(t, z, x), which indicates the available reward at the terrain point z at time t. The rewardmap is updated after sensor measurements according to the current state vector x.

In this paper, and without loss of generality, we define the evolution of r(t, z, x) as

r(z, x) = ω(z) e−12

(z−zbase)>Σ−1r (z−zbase), (1)

or a two dimensional Gaussian distribution centered on a region of interest or home base zbase, withcovariance matrix Σr weighted by ω(z), where each ω(z) > 0. The weighting factor ω(z) is definedsuch that each terrain point z can have a unique priority level. Again without loss of generality, wechoose a ω(z) that assigns one of two reward levels ω or ω to each location z to represent either ahigh or low reward area, respectively. Note that the reward grows at a constant rate for each terrainpoint z.

The reward map is updated after each sensor measurement. We want to capture how well we viewthe terrain points z in the FOV, so we use the VTTP metric to scale the reward in the FOV. Thisprocess is expressed as

r+(t, z, x) =

{r−(t, z, x) z 6∈ FOV(x)

r−(t, z, x)− γr r−(t, z, x) VTTP(z, x,P) z ∈ FOV(x)

, (2)

where the reward is unchanged when the point z is not in the FOV, and decreases by γr times theobserved reward when z is in the FOV, where 0 < γr ≤ 1 and is a tuning parameter describing themaximum amount of reward that can be collected per frame.

Figure 2 represents reward over the test flight area used in Section 5. The locations z that are partof the road network are defined to be high reward. The figure displays an example of what a rewardmap initialized to zero looks like after evolving for 20 seconds. Note that the specific parameters forthis section and all subsequent parameters are summarized in Table 3 in Appendix A. Also note thatall terrain coordinates are described using the Universal Transverse Mercator geographic coordinatesystem, based in meters.

5

2.2 Optimal UAS and Sensor Planning

Given the reward map r described above, we want to find the parameterized path Θ that maximizesthe reward-weighted VTTP metric collected along the path. The reward-weighted VTTP metriccollected by the sensor at time t+τ is given by

R(t, τ,Θ,P) =∑

z∈FOV(ξ(t,τ,Θ))

r(t+τ, z, ξ(t, τ,Θ)) VTTP(z, ξ(t, τ,Θ),P) dz. (3)

In other words, R(t, τ,Θ,P) describes the amount of instantaneous reward observed within the sensorfield of view at time t+ τ .

The optimal path that results in the maximum reward collected over the look-ahead window0 ≤ τ ≤ T is given by

Θ∗ = arg maxΘ

∫ T

τ=0

R(t, τ,Θ,P) dτ . (4)

Unfortunately, this problem is NP-hard and can only be solved in trivial cases. In Section 4 we presenta suboptimal strategy for approximately solving this problem.

3 Video Utility Metric

We desire to model the probability of an operator successfully performing a DRI task using a videostream. For this purpose we propose the VTTP metric, which is a measure of video utility and relatesto the probability of DRI. In other words, the VTTP metric quantifies the ability of an operator toperform a particular DRI task using the sensor video stream. We calculate the VTTP values for thepoints on the terrain map that fall within the field of view of the camera. These VTTP values thendetermine how much reward is collected from the corresponding points on the reward map.

For each terrain point visible in a video frame, the VTTP calculation consists of a TTP calculationmodified by additional components. These additional components account for various facets of thevideo imaging and viewing process that TTP does not take into account. We develop two componentsfor the VTTP metric: the motion M and the observer viewing model O. The motion component ofthe VTTP metric discounts the probability of DRI based on the motion blur induced by the egomotionof the sensor. The observer viewing model discounts the probability of DRI if they are far from thevideo frame center. We also develop a method for estimating the contrast at terrain point location,which is needed to compute the TTP metric.

For the VTTP metric to be useful, we must be able to predict it for planned paths as well ascalculate it for actual paths traversed by the UAS. For this reason, a VTTP value is calculated for eachterrain point in the FOV as if a hypothetical object existed at that point. The ability to predict theVTTP value for a planned path allows our planner to search for a path that maximizes the expectedreward for the video. Once a path has been executed, we use the VTTP metric to evaluate how wellareas on the map were actually seen. Future work will use this evaluation to close the loop for thenext stage of planning so that poorly covered areas can be revisited in real-time.

When planning a path, we calculate the predicted VTTP values for points in the field of view as

VTTPp(z, ξ,P) = TTPp(z, ξ,P)Mp(ξ,P)Op(ξ,P) (5)

where ξ(t, τ,Θ) is the predicted telemetry and the subscript ‘p’s denote predicted values. Once thevideo has been obtained from following a path, we calculate the actual VTTP as

VTTPa(z, x,P) = TTPa(z, x,P)Ma(v)Oa(x,P) (6)

6

Table 1: Requirements to calculate each component of VTTP.

Component Ele

vati

onm

ap

Pre

dic

ted

Tel

emet

ry

Act

ual

Tel

emet

ry

Cap

ture

dvid

eo

Ob

ject

dim

ensi

ons

Su

np

osit

ion

Am

bie

nt

&d

irec

tli

ghti

ng

Vie

win

gm

od

el

PredictedTTPp • • •

Lighting (Lp) • • •Motion (Mp) • •

Observer Viewing Model (Op) • •ActualTTPa • • •

Motion (Ma) •Observer Viewing Model (Oa) • •

where x(t) is the actual telemetry, v is the video stream, and the subscript ‘a’s denote actual values.Each of the components of the VTTP metric in Equations (5) and (6) requires certain types ofknowledge about the situation. Table 1 summarizes the knowledge that is needed to compute eachcomponent. The remainder of this section, we describe the various components comprising the VTTPmetric. The TTP image metric and the underlying contrast estimation are described in Section 3.1.In Sections 3.2 and 3.3 we describe the effects of motion and human viewing tendencies, both of whichare needed to calculate video utility. Finally, in Section 3.4 we begin the model validation process bycomparing actual and predicted VTTP values during a flight and by conducting a small user study.

3.1 TTP

The Targeting Task Performance (TTP) metric is an image utility metric developed by Vollmerhausenand Jacobs as an improvement over Johnson’s criteria [39]. The TTP metric estimates the probabilityof DRI for humans viewing static images. It accounts for the size of the object in the FOV andconsiders the contrast between the object and the background over all relevant spatial frequencies.Because the TTP metric was designed for still images, it does not take into account the spatiotemporalnature of video. We build upon the foundation laid by the TTP metric to develop the VTTP metric.

We use the notation TTP(z, x,P) to denote the TTP value of the terrain point z given the UASstates x and a subset of the parameters P , including the dimensions of the object of interest, a contrastmodel, and empirically determined parameters as explained in [31]. When searching for an unknownobject, we calculate the TTP value according to the dimensions of the smallest possible object thatthe operator wishes to detect. The TTP metric can be calculated in four main steps. In the firststep, we calculate the viewed area of the object of interest in the image frame assuming it were at theterrain point z, denoted A(z, x,P), where P contains an estimate of the dimensions of the object ofinterest.

Secondly, we estimate the contrast Γ(z) of the object with respect to the background. In previouswork, we compute the relative contrast of an object of interest with respect to the background using

7

the Phong Illumination model [31]. Unfortunately, this requires making numerous assumptions aboutthe object and the background’s lighting characteristics, which may not be known. In an effort toreduce the number of uncertain parameters, we propose using the lighting information Λ(z) to predictthe contrast in an image. This method requires no prior knowledge of the appearance model of theobject or background and is described in detail in the Predicted Contrast section that follows.

In the third step we compute the TTP integral proposed in [39]. We use the Minimum ResolvableContrast curve MRC (κ,P) to quantify the human eye and the sensor’s ability to detect contrast at aspatial frequency κ. An exact computation of MRC (κ,P) is given in [3]. However, we approximateMRC (κ,P) using a parabola as described in [31] given by

MRC (κ,P) = exp

(−a2 ±

√a2

2 − 4a1(a3 − κ]pixel)

2a1

),

where a1, a2, and a3 are parameters defining the curve and ]pixel is the field of view of a single pixel.The resulting TTP integral is given by

Φ(z, x,P) =

∫ κcut

κlow

[Γ(z)

MRC (κ,P)

] 12

dκ, (7)

where κlow and κcut are the lower and upper bounds of relevant spatial frequencies, and Γ(z) is thecontrast between the target and the background.

The final step in computing the TTP metric is to estimate the probability of DRI according to alogistics curve. The parameters of the logistics curve are empirically determined according to boththe type of task being performed and the type of imaging system. The TTP value is given by

TTP(z, x,P) =(Nresolved/N50)E

1 + (Nresolved/N50)E, (8)

whereE = a+ b(Nresolved/N50),

and

Nresolved =

√A(z, x,P) Φ(z, x,P)

sd(z, x),

and where A(z, x,P) is the viewed area of the object of interest, sd(z, x) is the current slant rangeor euclidian distance from the UAS to the terrain point, and N50 is an empirically determined factorthat represents the average number of cycles necessary for a group of analysts to correctly identifyobjects 50% of the time. The coefficients a and b are statistically determined from laboratory and/orfield data.

Predicted Contrast

A key element of TTP is the contrast between the object of interest and the background. Determiningthe contrast requires prior knowledge of the characteristics of the specific object, making it less usefulfor general surveillance problems. If the coloring of a specific object is known, one can use terrainreference imagery to predict the contrast of the object against known, and presumably static, terrain.

Regardless of the inherent and often unknown contrast between the object and the environment,one factor that always affects the imaged contrast of the object is the scene lighting—a fixed object and

8

background will have higher visual contrast under brighter lighting conditions and less under darkerconditions. We thus use a scene lighting model to predict object/background contrast. Withoutknowledge of object coloring, this at least gives us an estimate that is proportional to the actualcontrast.

There are several ways to model scene lighting, with varying complexity and prior knowledgerequired. Given a terrain map with reasonable resolution the improvements provided by more complexmethods become marginal. We use a simple Lambertian lighting model that considers both ambientlighting as a result of cloud and atmospheric scattering and direct lighting as a result of sunlightfalling directly on a given area of the surface. We require the user of the system to set a parameterα ∈ [0, 1] to indicate the relative mix of ambient and direct lighting at the time of flight, which isessentially an estimate of how cloudy the day is. Setting α = 1 indicates complete cloud cover, whilesetting α = 0 denotes a clear day with full sunlight. However, note that there is always some ambientlighting due to atmospheric scatter, so even on a clear day a small ambient component should be used.

Using this model, the illumination Λ of each terrain point z is thus calculated as a weighted mixof direct and ambient lighting,

Λ(z) = (1− α)D(z) + α, (9)

where D(z) is the amount of direct lighting per unit area of the terrain surface. The direct lightingis dependent on the angle made between the direction of incoming sunlight ` and the terrain normalη(z), subject to terrain shadowing. It is calculated as

D(z) =

{0 If z is occluded from sun

` · η(z) Otherwise.

To calculate the lighting direction ` and resulting shadow locations, we assume a known or estimatedposition of the sun. This can be gathered from readily available astronomical data, such as from theNOAA Solar Position Calculator [12].

Equation (7) requires knowledge of the contrast information Γ(c) in order to compute the TTPintegral. In the absence of the information necessary to compute the actual contrast as in [31], wesimply estimate it as Γ(z) ≈ Λ(z). If the sun does not move significantly during the time in which thevideo is collected, the lighting for each point in the area can be precomputed. A sample precomputedlighting map is shown along with the orthoimagery for the area in Figure 3.

3.2 Motion

Motion of the video sensor causes blur in the video and makes objects more difficult for the humaneye to track. Because of this, the utility of the video is affected by the gimbaling of the sensor andthe path flown by the UAS. The motion component of the VTTP metric calculates the amount ofmotion in the image plane in pixels per second and adjusts the VTTP metric by a multiplicative factoraccordingly. Since the amount of blur is proportional to the amount of motion in the image plane, themultiplicative factor drops off linearly from 1 to 0 as the motion increases from 0 to a threshold. Thissection shows how to compute both the predicted motion factor Mp and the actual motion factor Ma.We assume the use of an EO video camera for this derivation, but it applies to other types of videosensors as well.

9

Figure 3: Orthoimagery (left), orthoimagery with the lighting map overlaid (middle), and the lightingmap Λ(z) (right) for one of our test areas. The lighting map was computed for 8:20 a.m. on Sep. 15,2011, near Eureka, UT.

3.2.1 Predicted Motion

To predict the value of Mp, we calculate the apparent motion of 3D points as projected into theimage plane of the camera. This calculation, known as a motion field, requires knowledge of thepredicted state vector ξ(t, τ,Θ) as well as the terrain elevation model and camera parameters from P .We compute the motion field by taking the time derivative of the computer graphics equation thatprojects the terrain points into the camera’s image plane,

xc

yc

zc

= CPRgbR

bvT

vi

zn

ze

zd

1

, (10)

where (xczc, yczc

) is the coordinate of the point in the image plane, C is the camera matrix, P is the

projection matrix, Rgb is the rotation matrix from the body frame to the gimbal frame, Rb

v is therotation matrix from the vehicle to the body frame, T v

i is the translation matrix from the inertial tothe vehicle frame, and (zn, ze, zd) is the inertial north, east, down coordinate of the terrain point. Thecamera and projection matrices are defined as

C =

ζdimx

0 ox

0 ζdimy

oy

0 0 1

P =

0 1 0 00 0 1 01 0 0 0

,where ζ is the focal length, (ox, oy) is the coordinate of the camera’s origin, and dimx and dimy arethe width and height of a pixel on the camera’s sensor. The projection matrix rearranges the axesfrom NED coordinates to X/Y image coordinates. The motion of the terrain point in the image plane(mx,my) is found by taking the derivative with respect to time of the point’s image plane coordinate,[

mx

my

]=

d

dt

[xczcyczc

]=

[zcxc−zcxc

z2czcyc−zcyc

z2c

], (11)

10

where xc, yc, and zc are obtained by time differentiating Equation (10),

xc

yc

zc

=(CPRg

bRbvT

vi + CP

(Rg

bRbvT

vi +Rg

bRbvT

vi +Rg

bRbvT

vi

))zn

ze

zd

1

.The projection matrix P and the coordinates of the terrain point (zn, ze, zd) do not change with time,and thus do not play a part in the product rule used to calculate the derivative. However, the cameramatrix C must be included in the derivative because the focal length can change with time if thecamera has zoom capabilities.

Once the motion of the terrain point has been calculated, the discount factor Mp is calculated forthat point as

Mp(ξ,P) = max

(0, 1−

√m2

x +m2y

mT

), (12)

where mT is the threshold amount of motion. Note that Mp does not affect the VTTP value whenthere is no motion and zeros out the VTTP value when the motion exceeds the threshold.

3.2.2 Actual Motion

To calculate the actual motion in a captured video, we use the pyramidal Lucas-Kanade opticalflow algorithm implemented in OpenCV [37, 7]. Although we could calculate a motion field usingthe actual telemetry, error from sensor measurements and imperfect syncing between the video andtelemetry would make this calculation less accurate. The optical flow algorithm does not depend onthe telemetry because it directly tracks features from frame to frame within the video. We configuredthe optical flow algorithm to search for 60 features with a minimum distance apart of 60, a qualitylevel of 0.01, a 60 × 60 search window, and 6 pyramidal levels. These settings gave us a relativelyuniform spread of features across the frame and worked well for all the situations that we test.

Because the motion tends to be fairly consistent throughout the frame, we simplify our algorithmby using the average motion of all the tracked features in the frame. Without this simplification, wewould need to match the tracked features up with the nearest terrain point projected into the imageplane. The equation for the actual motion discount factor is given by

Ma(v) = max

(0, 1− w

NfmT

Nf∑i=1

√m2

x,i +m2y,i

), (13)

where v is the video stream, w is the frame rate of the video, Nf is the number of tracked features,and (mx,i,my,i) is the motion of the ith feature between one frame and the next.

3.3 Observer Viewing Model

When watching video, it is well documented that humans tend to spend more time looking at areasnear the center of the video frames. Because of this, they are more likely to detect objects that arelocated in the middle of the frame. This effect is described by Tatler, and is now commonly known asthe center-bias effect [38, 6]. While there are many factors affecting where human operators will directtheir attention on a video screen, Judd et al. found that a Gaussian distribution is a good model ofwhere observers focus their attention [21].

11

The VTTP metric incorporates this center-bias effect in order to more accurately estimate theprobability of a human operator successfully performing DRI tasks using actual and predicted flightvideo. We utilize the result presented by Judd et al., by applying a scalar discount to the VTTP valuefor objects that are further from the center of the image according to a Gaussian distribution, withcovariance ΣO. The scalar discount, or observer viewing model O(ξ,P) ∈ [0, 1], is given by

O(ξ,P) = e(−12

([x,y]>−[ox,oy ]>)>Σ−1O ([x,y]>−[ox,oy ]>)), (14)

where ox and oy are the coordinates of the center of the image. The image center coordinates andobserver model covariance are contained in the user defined parameter set P .

3.4 Validation

We performed two preliminary validations of the VTTP metric. First, we verified that the predictedand actual VTTP values for a flight path are consistent with each other. This consistency is necessaryto ensure that flight paths that maximize the predicted video utility also maximize the attained utility.Second, we performed a small user study to verify that the actual VTTP values are correlated withhuman assessment of video utility. Together, these validations show that the VTTP metric can beused to plan paths that improve the viewer’s experience.

3.4.1 Predicted vs Actual

To compare the predicted and actual VTTP metric, we compute VTTPp from a simulated path andVTTPa using actual video and telemetry. We also consider the predicted VTTP metric computedusing actual telemetry instead of a planned path, which we denote as

VTTPp+(z, x,P) = TTP(z, x,P)Mp(x,P)O(x,P).

The difference between this equation and VTTPa is the method we use to estimate the motion discount.This allows us to visualize how well the apparent motion predicts optical flow.

The values for VTTPp, VTTPp+ , and VTTPa computed over a short path are shown in Figure 4.We observe that VTTPp acts much like an upper bound since it models ideal flight. Using actualtelemetry, we observe jitter due to turbulence and other disturbances that reduces the video utility.When we compare VTTPp+ to VTTPa, we conclude that Mp is an adequate approximation of Ma.These results indicate that VTTPp is a reasonable approximation to VTTPa and can be used to planpaths that maximize video utility.

3.4.2 User Study

In order to use the VTTP metric with confidence, we need to know whether there is a correlationbetween the VTTP values and the observer’s ability to perform DRI tasks. Ideally, a user studyshould be conducted where experienced UAS operators could perform DRI tasks on a wide variety ofvideos. Unfortunately, for this stage of the project we did not have access to trained UAS operators,nor could we attempt to cover the breadth of possible video scenarios that operators are asked toobserve. However, we have performed a small user study where users were asked to rate each of a setof videos for DRI quality.

For the user study, ten video clips with durations of 10–14 seconds were selected where a charcoal-gray extended cab truck was in the video stream for the duration of each clip. The videos were selected

12

0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time in seconds

VT

TP

val

ue

Comparison of Predicted vs Actual VTTP

VTTPp

VTTPp+

VTTPa

Figure 4: Using simulated and actual flight data, we compare VTTPp to VTTPa for a single terrainpoint. We also compute VTTPp+ , which is defined to be the predicted VTTP metric computed usingactual telemetry but no video. In this flight, the UAS is approaching the specified terrain point. Theincreasing VTTP values near imply that the slant range has the largest effect on the VTTP value.The VTTPp curve is smoother because it assumes ideal flight and does not model disturbances suchas wind gusts. The spikes in the VTTPp curve should be examined, but are probably caused by thesystem switching between waypoints.

to cover many, but not all, combinations of standard viewing conditions. There were videos where thesensor was zoomed in or out, where the truck was either near or far, where the truck was parked inthe middle of the road or off to the side, where the truck was in a field or an area with tall shrubs,and in the middle of the day or when there were shadows. In all cases, however, the truck remainedstationary.

Twenty participants were administered a test we developed similar to a test developed by Likert [25]In our test, participants were asked how hard it was for them to detect and identify the truck inthe video clip, where the presentation order of the videos was randomized. Users scored one of 5preferences: Very Easy, Easy, Average, Hard, or Very Hard. We compared their responses to theaverage VTTP value of the truck location during the videos. The participants were all college studentsand most were familiar with the project and affiliated with the BYU MAGICC lab, where this researchwas being conducted. However, project members did not participate in the study.

We cannot remove response bias from the user’s ratings, which makes it impossible to correlateuser answers with a numerical value to compare against the VTTP metric. Instead, for each user, wecompute the difference in preference of all pairwise videos. Also, we compute the difference in VTTPscores for each pairwise combination of videos. We expect that videos with the same difference inLikert rating correlates with difference in VTTP. When the user preference is vastly different betweenvideos, we expect the difference between the VTTP values to be larger.

In the original analysis, we found that there were three videos that skewed the results due tomodeling error. Two videos were scored highly by users because the truck was large in the FOV dueto high zoom and close proximity. However, the truck was constantly shaking around in the videoand received a low VTTP value due to large optical flow. This implies that the motion threshold mT

13

Table 2: User study results comparing the user preference difference and the corresponding VTTPvalues between pairwise videos. Notice that as the preference difference increases, so does the meanof the TTP differences for those same videos.

Preference Difference Mea

nV

TT

PD

iffer

ence

Std

.D

evia

tion

ofV

TT

PD

iffer

ence

Nu

mb

erof

Sam

ple

s

Sig

nifi

can

ce

0 3.83× 10−19 0.0940 290 p < 0.00011 0.0534 0.1217 153 p < 0.00012 0.1178 0.1191 110 p < 0.00013 0.1631 0.1131 63 p = 0.01524 0.2006 0.0688 19 p = 0.1757

should be scaled inversely proportionally to the ground sample distance (GSD), where the GSD is thedistance between pixel centers as measured on the ground. A third video scored a high VTTP valuebut received a low score from users because it was hard to distinguish the truck parked up againstthe road near large bushes. This shows that the metric should be modified by including the effects ofclutter and by improving the contrast model.

Removing these three outlier videos from the study, we compare the user preference difference withthe mean VTTP difference. We note in Table 2 that as the difference in user preference increases,so does the mean VTTP difference. The standard deviations of the VTTP differences for each testare reasonable, and the p-values from the Student T test give a measure of confidence that thesevalues are statistically significant from the adjacent preference difference. Notice that the statisticalsignificance is weakened when there are fewer number of samples. While more user studies are requiredto increase confidence in the validity of the VTTP metric, we feel that there is evidence to supportits effectiveness at estimating video utility.

4 Staged Path Planning

The UAS and sensor path planning problem is NP-hard and the optimal path described in Equation (4)can only be found in trivial cases. To reduce the dimensionality of the search space, we propose anovel, staged-approach to generate an estimate of Θ∗ by finding the optimal UAS and sensor paths.We decompose the planning into three stages. In Stage 1 we plan the sensor path Θ†s, In Stage 2 weplan the UAS path Θ†u given a fixed sensor path, and in Stage 3 we plan the zoom schedule Θ†ζ giventhat the sensor and UAS paths are fixed.

We first plan the sensor path to provide higher-utility video for the operator based on how wellwe can observe the areas of interest defined by the reward map. We then plan the UAS path, whichadjusts the UAS flight path to maximize the VTTP weighted reward of the sensor path. Lastly, weplan the zoom level to adjust the FOV and the ground sample density. In other words, we solve the

14

following optimization problems in order:

Θ†s = arg maxΘs

∫ T

τ=0

R(t, τ,[Θs,Θ

1u,Θ

1ζ

],P) dτ (15)

Θ†u = arg maxΘu

∫ T

τ=0

R(t, τ,[Θ†s,Θu,Θ

2ζ

],P) dτ (16)

Θ†ζ = arg maxΘζ

∫ T

τ=0

R(t, τ,[Θ†s,Θ

†u,Θζ

],P) dτ , (17)

where for Stage 1 we assume the UAS path and zoom schedule are given by the known paths Θ1u and

Θ1ζ , while for Stage 2 we assume that the zoom schedule is given by Θ2

ζ .By dividing the planning process into multiple stages we reduce the dimensionality of the opti-

mization problem described by Equation (4); however, each stage remains a difficult NP-hard problem.In fact, each stage is still more difficult than the Traveling Salesman Problem, due to the underlyingnon-linearities and the time-evolving structure of the search space. For this reason, we utilize twojudicious simplifications to find solutions more quickly. First, we use a receding horizon approach witha finite time horizon T , where we plan N waypoints and fly the first Nfly waypoints before re-planning.Note that the reward map is updated after each time step based on the predicted flight paths. Second,we use the full, time-varying reward map only when planning the final zoom path in Stage 3. DuringStages 1 and 2, we use a static reward map at time t to find the optimal sensor and flight path overthe time horizon of T seconds. In the subsections below, we describe algorithms to estimate eachoptimal sub-path.

4.1 Stage 1: Sensor Planning

The maximization problem to estimate the optimal sensor path is posed in discrete time as

maximizeΘs

T∑τ=0

R1(t, τ,[Θs,Θ

1u,Θ

1ζ

],P)

subject to Sensor Dynamic Constraints,

(18)

where we optimize over a modified version of Equation (3)

R1(t, τ,Θ,P) =∑


r(t, z, x0) TTP(z, x0,P), (19)

where the underlying reward map does not vary with the time of flight parameter τ . During Stage 1,the flight path and the zoom path have not yet been planned, so we use the initial UAS states x0 toaccount for the viewing conditions. Errors induced by this estimate are negligible if the time horizonT is not too large. The sensor dynamic constraints can be physical limits on the movement of thesensor or they can represent constraints imposed by the operator to improve the viewing experience.In this section, we describe the method by which we estimate the optimal path Θs.

We estimate the solution to Equation (18) by applying a planning technique presented by Argyle,et al., where they use a chain-in-a-forcefield technique to plan a UAS path. In their work, the links inthe chain are defined as the UAS flight path waypoints acted on by various forces used to estimatethe optimal path. Some of the forces acting on the links include link length constraints, turn angleconstraints, and the gradient of an underlying probability density function [4].

15

Magnitudeof

Rew

ard

Beg

inningof

Sen

sorPath

Initial Chain

Optimized Chain

Figure 5: High-level concept of a sensor path in a 1D world, where the tick marks are waypointsdefined every ∆t seconds. Notice how the optimized sensor path has waypoints that have shiftedto clump in the highest regions of reward, while maintaining a maximum link length constraint andallowing the path to find and explore other nearby local optima. This results in the sensor footprintdwelling over the regions of high reward-weighted TTP, while moving quickly through the regions oflow reward-weighted TTP.

We let a chain composed of N links, c = [z1, z2, . . . , zN ], represent the sensor path Θs, where theposition of the ith link is zi. Argyle, et al. use a probability density function to calculate gradientforce acting on the individual links zi. Instead, we use the time-invariant, TTP-weighted reward maprt(z) given by

rt(z) = r(t, z, x0) VTTP(z, x0,P),

or in other words the reward at time t of terrain point z weighted by the VTTP value. By multiplyingthe static reward by the VTTP value given the initial conditions, we account for the effects of slantrange, grazing angle, occlusions, and contrast when planning Θs, while avoiding the complexity of thetime-varying gradients.

Argyle, et al. apply a link-length constraint to enforce equal distances between UAS flight pathwaypoints. When planning the path of the sensor optical axis, a specific link-length distance is notdesired. We extend the work by Argyle, et al. to allow for dynamic link-lengths so that the sensorfootprint can dwell in regions of high reward while minimizing the time spent in regions of low reward.A high-level, one-dimensional illustration of this concept is shown in Figure 5.

We now calculate the forces that act on each link zi in the chain. Let gi be the unit vector pointingin the same direction as the gradient, such that

gi =1

β

(∂

∂zrt(z)

∣∣∣∣z=zi

),

where β is a normalizing constant. The unconstrained dynamics for link i become zi = γ1 gi, whereγ1 is a tunning parameter. Since the gradient is calculated from a discretized grid, we use a methodsimilar to the Downhill Simplex Method described in [33], which is a variant of the Flexible PolyhedronMethod first proposed by in [30]. In essence, our algorithm constructs a triangle from three adjoiningterrain points and estimates the gradient of the reward map from the reward corresponding to thosepoints.

The remaining forces acting on each waypoint zi are the link-length forces λi(zi−1) and λi(zi+1)due to the previous and next waypoints in the chain, respectively. Conceptually, there is a correlation

16

between the link-length distance and the motion blur of the image due to sensor and vehicle dynamics.Obviously, a large link-length distance would induce large motion blur, reducing the video qualityof terrain in the sensor FOV. The parameter Lmax describes the maximum acceptable link-lengthdistance for a given altitude and a particular operator. A minimum link-length distance can alsobe defined to keep the sensor footprint moving. Applying these constraints, the link-length distancebetween waypoints of the sensor optical axis will fall between

0 ≤ Lmin ≤ ‖Θs,i,Θs,i−1‖ ≤ Lmax. (20)

A third parameter Lnom is also defined to be the nominal link-length distance for average conditions.Using the three parameters Lmin, Lmax, and Lnom, a link-length force is generated based on distance

between waypoints, which falls in one of three regions. First, a repelling zone [Lmin, Lr), when the twowaypoints are too close together. Second, an attracting zone (La, Lmax], when the two waypoints aretoo far apart Finally, there is a nominal zone [Lr, La] where we do not apply any link-length forces.The bounds Lr and La are determined by the reward level, given by

Lr = Lmin + (1− rt(z±))Lnom

La = Lmax − rt(z±)Lmax,

where we abuse notation and let rt(z±) be the reward level at either the next or previous link in thechain, depending on which force we are currently calculating.

The link-length forces λi(zi−1) and λi(zi+1) due to the previous and next waypoints, respectively,are given by

λi(zi−1) =

(zi−zi−1)sprev

if (sprev < Lmin)(Lr−sprev)

(Lr−Lmin)(zi−zi−1)sprev

if (sprev > Lmin) and (sprev < Lr)

− (sprev−L)

(Lmax−La)(zi−zi−1)sprev

if (sprev < Lmax) and (sprev > La)

− (zi−zi−1)sprev

if (sprev > Lmax)

0 otherwise,

λi(zi+1) =

− (zi+1−zi)snext

if (snext < Lmin)

− (Lr−snext)(Lr−Lmin)

(zi+1−zi)snext

if (snext > Lmin) and (snext < Lr)(snext−La)(Lmax−La)

(zi+1−zi)snext

if (snext < Lmax) and (snext > La)(zi+1−zi)snext

if (snext > Lmax)

0 otherwise,

where sprev = ‖zi − zi−1‖, and snext = ‖zi+1 − zi‖. The constrained dynamics of link zi can now bewritten as

zi = F = γ1gi + γ2λi(zi−1) + γ3λi(zi+1), (21)

where γi are tunning parameters. To avoid offsetting link length forces, we require γ2 > γ3. Figure 6shows graphically the piece-wise linear function describing both the regions of attraction and repulsionand the magnitude of the link-length force for waypoints in both high and low reward regions.

To better explore the search space, four initial paths are used to seed the planner, with the bestresultant path selected to be Θs. Each initial path originates from the initial ns,0 and es,0 coordinates,with N equidistant links spread to the edge of the defined map. In many situations, the initial pathwill violate the link length constraints, but as the forces act to satisfy them, the path will explore

17

LrLmin La Lmax

Distance Between Waypoints

Magnitudeof

Force

Magnitude of Force in High Reward

Repelling Force Attracting Force

LrLmin La Lmax

Distance Between Waypoints

Magnitudeof

Force

Magnitude of Force in Low Reward

Repelling Force Attracting Force

Figure 6: Link length constraints of a link in both high and low reward areas. Notice that links in highreward are allowed to clump together without experiencing a repelling force, and they are discouragedfrom being too far apart. The opposite is true for links in low reward areas to encourage them to findthe high reward areas.

many local maxima to find the best nearby path. To overcome non-smooth reward maps, we can alsodown sample rt to create an intermediate path to seed the full resolution planner to obtain the finalpath.

Figure 7 shows a simulated sensor path in a reward field consisting of many local minimum. Theparameters for this particular simulation were Lmin = 2, Lmax = 11, and Lnom = 5 and acted on achain starting at the coordinates [97, 105]. Note that the best path found passes through two localmaxima, and that the waypoints are bunched in the high reward regions.

4.2 Stage 2: UAS Planning

The next step is to plan the UAS path Θu given the sensor path Θ∗s. The optimization problem atthis stage can be stated as

maximizeΘu

T∑τ=0

R2(t, τ,[Θ†s,Θu,Θ

2ζ

],P)

subject to Vehicle Dynamic constraints

Optional No-Fly Zone constraints.

(22)

We again use a static reward map by modifying Equation (3) to become,

R2(t, τ,Θ,P) =∑


r(t, z, ξ(t, τ,Θ)) VTTP(z, ξ(t, τ,Θ),P),

where the reward map depends only on time t, and not the flight parameter τ .We define the vehicle constraints according to fixed-wing UAS dynamics, and define a maximum

change in heading ∆ψmax and a maximum distance between UAS waypoints Lu. These constraintsare given by

|∆ψi| < ∆ψmax

|‖Θu,i,Θu,i−1‖ − Lu| < ∆Lu,

where ∆Lu is a tolerance in the length between UAS waypoints to allow the solver the flexibility toadjust paths to find a solution. The path Θ2

ζ is simplified such that only one terrain point z, defined by

18

Figure 7: Sensor path planned from the initial location represented by the white waypoint. Noticethat the path traverses two local maximum and that the dynamic link length constraints are evidentas waypoints are spread apart in lower reward and bunched together in high reward.

Θs, is in the FOV. This greatly reduces the computation time needed to calculate the reward-weightedVTTP value for each terrain point z in the FOV. Effects due to zoom are considered in Stage 3.

To solve this problem, we utilize a method based on the sequential quadratic programming (SQP)non-linear solver proposed by [15]. Since the solution space has many local minima, we seed the solverwith five different initial trajectories ranging from hard left and right turns to maintaining a straightorbit. Figures 8 and 9 gives examples of the UAS path generated from Stage 2. The paths generatedin Figures 8 and 9 use flat terrain to help simplify the example, but this simplification is removed inSection 5 when a full simulation is conducted. Note that the results indicate that the planner tendsto reduce the line of sight distance to the terrain point.

4.3 Stage 3: Zoom Planning

The zoom path Θζ is planned during the third stage of the path planner. In this stage we use thetime-varying reward map r to let the planner adjust the zoom level to maximize the reward collectedduring the path. We wait until this stage to vary the reward map because we do not need to makeany assumptions about Θs and Θu since they were planned in Stages 1 and 2.

We originally posed the optimization problem as defined in Equation (17). However, when we didthis the planner tended to zoom out as much as possible, probably because even at low resolution theVTTP metric was high enough to capture much of the reward for the terrain in the sensor FOV dueto the altitude we were flying. To allow the operator more direct control of the zoom level, we pose

19

−300 −200 −100 0 100 200 300 400−300

−200

−100

0

100

200

300

East Position in meters

Nor

th P

ositi

on in

met

ers

UAS PathSensor Position

Figure 8: UAS path generated by Stage 2 given a stationary sensor location. Note that the center ofthe sensor footprint is at (0, 0), and the initial UAS position is at (300 E,−300 N) with a velocity of17 meters per second and an initial heading of −45◦. Total simulated flight time is 400 seconds.

−1000 −500 0 500 1000−1200

−1000

−800

−600

−400

−200

0

200

400

600

East Position in meters

Nor

th P

ositi

on in

met

ers

UAS PathSensor Path

Figure 9: UAS path generated by Stage 2 given the above sensor path starting at (0, 0) and has avelocity of 10 m/s. Note that the initial UAS position is at (−100 E,−100 N) with a velocity of 17meters per second and an initial heading of 45◦. Total simulated flight time is 400 seconds.

20

the optimization problem for Stage 3 as

maximizeΘζ

T∑τ=0

min

(R(t, τ,

[Θ†s,Θ

†u,Θζ

],P), γζ

R(t, τ,[Θ†s,Θ

†u,Θζ

],P)

Nz(τ)

)subject to Θζ ≤ ζmax

Θζ ≥ ζmin,

(23)

where Nz(τ) is the number of terrain points z in the sensor FOV at time t + τ and γζ is a tunningparameter. In other words, at each time step the planner maximizes the minimum between the totalreward in the FOV and average reward in the FOV scaled by γζ . The scalar value γζ allows the userto control how much the system tends to zoom in or out. High values result in zoom paths that tendto zoom out, while lower values allow the system to zoom in.

Since the time to compute Θζ constitutes the largest amount of time spent planning a path, weconsidered three different methods of controlling the zoom. Obviously, the simplest method would beto use a fixed zoom level during the entire flight. Another planner uses estimated gradient informationto aid a standard SQP non-linear solver [15]. Finally, we use a greedy search with a fixed number ofzoom level settings.

The implementation of SQP solver took the longest time to compute because it required multiplecomputations of the reward level in the FOV during a single iteration of the planner in order tocompute the gradients. To reduce the computational complexity, we used a greedy search to implementEquation (23). The greedy search considered only one temporal step at a time instead of entiretime horizon [0, T ]. Also, instead of considering the continuous range of zoom levels [ζmin, ζmax], weconsidered only finite number of zoom levels. This does not allow the planner to optimally accountfor future viewing opportunities, but we believe that these effects are minimal. We found that thegreedy planner can compute a zoom path an order of magnitude faster than the full gradient basedplanner with very little degradations in the reward-weighted VTTP values.

5 Results

We apply the staged-path planner described in Section 4 to find a parameterized path Θ that maintainssurveillance on a specific location of interest and the surrounding area, especially the adjoining roadnetworks. The path Θ consists of 100 waypoints characterizing the UAS flight path, the sensor path,and the sensor resolution, where Θ parameterizes 400 seconds of simulated flight. Section 5.1 explainshow we implement these algorithms on a standard desktop to develop a flight plan and sensor scheduleto maintain surveillance over the region. Section 5.2 discusses our testbed UAS and compares flighttest results to predicted results. Finally, in Section 5.3 we discuss the computational complexity ofthis technique as well as some advantages and disadvantages of using this algorithm.

5.1 Planning and Simulation

The purpose of the staged path planner is to reduce the effort needed to prepare for and fly asurveillance mission by autonomously computing both a flight path and a sensor schedule. Weimplement Equations (18), (22), and 23 on a standard desktop in Matlab. The set of Z discreteterrain points defining the region of interest are available from the US Geological Survey onlinedatabase [1]. Specifications and other parameters are listed in Table 3 of Appendix A. The rewardmap over which the path is planned is the same as was presented in Section 2.1.

21

beard

Cross-Out

beard

Inserted Text

in

Figure 10: UAS and camera path planned in simulation to perform surveillance.

Figure 10 presents the paths generated from the staged planner. Note that the planner successfullykept areas with high reward levels within the field of view of the camera . Specifically, notice that forthe majority of the simulation run the camera path followed the road network, occasionally crossinglow reward areas to switch to other high reward regions. Also, notice that due to the time evolvingnature of the reward map, the simulated home base marked by the red dot was visible on threeseparate occasions. Finally, it is also evident that the UAS path tends to minimize the line of sightdistance to the sensor path, resulting in loops around the sensor path.

5.2 Flight Test

We next want to fly the planned path showin in Figure 10 and compare the predicted coverage towhat we actually observe. We used a Zagi style airframe with a 72 in. wingspan equipped with aKestralTMautopilot and a Sony FCB - IX11A Block color camera, which has a 640× 480 pixel imager,with focal lengths that can range from 4.2mm to 42mm. The camera was mounted on a gimbal thatdeployed below the body of the UAS, as shown in Figure 11. We used Virtual CockpitTMas theground station controlling the UAS and OnPointTMsoftware to record and synchronize video with thetelemetry. The autopilot, ground station, and image recording software are all produced by LockheedMartin, Procerus Technologies.

During flight, we interfaced MATLAB with Virtual Cockpit to send the next set of desiredwaypoints based on the staged planner. The UAS control surfaces, gimbal servos, and zoom level arecontrolled to follow their respective path using onboard low-level control algorithms. Unfortunately,we can send only one UAS waypoint, camera waypoint, or zoom level every 300 ms. We therefore setan arrival radius of 80 meters which triggered the UAS to proceed to the next waypoint. Meanwhile,the camera gimbal and zoom levels were adjusted every 600 ms. Real-time telemetry received fromthe UAS triggered when to change between camera and zoom waypoints, and the algorithm linearly

22

beard

Comment on Text

This is not obvious just by observing Figure 10. The road network is essentially hidden. Is there a better way to make this statement obvious?

beard

Inserted Text

f

beard

Comment on Text

hard to see

beard

Cross-Out

beard

Replacement Text

The goal of the flight demonstration was

beard

Cross-Out

beard

Replacement Text

what was

beard

Inserted Text

d

beard

Cross-Out

Figure 11: Figure showing the UAS testbed and the extended gimbal.

interpolated the desired sensor position between the waypoints.Figures 12 and 13 show the planned versus actual paths when flown with our testbed UAS. When

flying the paths, the UAS system endured an average wind speed of 3.66 m/s from a heading of 305◦,which caused error between the simulated and actual UAS path. The worst overshoot due to windwas about 80 meters, but typical errors were less than half the maximum error. The errors in thesensor paths shown in Figure 12 are more pronounced due to wind gusts and other turbulence duringflight. However, the low level gimbal pointing algorithms did a reasonable job stabilizing the sensorgimbal and following the path. The zoom paths shown in Figure 13 are very similar. Differencesseen in Figure 13 are due to the simulation expecting a constant time between each waypoint, andthen actual flight resulting in varied time between waypoints due to zoom level; this correlates to atime-shift and a time-scaling of the zoom path.

Errors in the UAS and gimbal position affect the shape of the FOV on the ground, which changeswhere the reward is collected within the reward map. To view these effects, we include the evolvingreward map after 25, 50, 75, and 100 waypoints of both the simulated and actual flight test in Figure 14.While the planned and actual reward maps are very similar, there are differences due to errors causedby modeling errors and disturbances while following the planned paths.

5.3 Analysis and Future Work

We have successfully demonstrated our ability to plan and implement an autonomous UAS flightpath, the sensor gimbal schedule, and the sensor zoom schedule to perform surveillance over a desiredregion of interest. Note that we do not claim these paths to be optimal, but the generated pathsare intuitive and provide usable video to the operator. Unfortunately, there does not exist in theliterature an alternative method that we can compare our algorithm against. We present our resultsas a motivation for others as a starting point, with several areas of possible improvement listed below.

The computational complexity of this technique remains high. The planned path shown in Figure 10was generated in about 2.5 hours using an Intel Core2 Duo processor running at 3.16GHz. Thesubprocess requiring the majority of the computation is the calculation of the zoom path. In Stage3, the VTTP value is calculated for each terrain point in the camera field of view, and is computedfor each discrete zoom level possibility. Though not presented in the here, using a gradient basedmethod to plan the zoom level took much longer, about 24 hours, due to the repeated calculation of

23

beard

Inserted Text

of

beard

Cross-Out

beard

Cross-Out

4.071 4.072 4.073 4.074 4.075 4.076 4.077 4.078 4.079 4.08 4.081

x 105

4.4256

4.4258

4.426

4.4262

4.4264

4.4266

4.4268

x 106

East

Nor

th

Desired UAS PositionActual UAS Position

4.071 4.072 4.073 4.074 4.075 4.076 4.077 4.078 4.079 4.08 4.081

x 105

4.4256

4.4258

4.426

4.4262

4.4264

4.4266

4.4268

x 106

East

Nor

thDesired Sensor PositionActual Sensor Position

Figure 12: Comparison of the planned and actual UAS and sensor paths.

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Waypoints

Zoo

m L

evel

(%

)

Desired Zoom LevelActual Zoom Level

Figure 13: Comparison of the planned and actual zoom levels.

24

0 100 200 300 400 500 600 700 800 900 1000

Figure 14: Reward map evolution after flying 25, 50, 75, and 100 waypoints for both simulated andactual flight.

25

the observed reward needed to compute the gradient.One area of future research is to develop real-time algorithms in order to include a feedback

loop. By using video feedback, the system would be robust to disturbances such as wind gustsand other turbulence by planning paths that account for areas that were not viewed adequatelyon the initial exploration. At the expense of increased computational complexity, an intermediatetechnique that would allow the paths to converge to more optimal solutions would be to iterate throughEquations (15)–(17), using the paths from the previous iteration to seed the next solution.

We acknowledge that the planned UAS flight paths only maintain C1 continuity between thewaypoints. In other words, at each waypoint there is a discontinuity in the commanded roll angle.We expect this to cause minor transient effects; however, these effects are smoothed during real flight.Flying connected arcs is also an improvement over other existing methods such as Dubins paths, whichrequire up to three discontinuities in the commanded roll angle between each waypoint [13, 28].

The VTTP metric should be subjected to a more thorough verification and validation processto ascertain its validity as a model of probability of DRI tasks. Such studies should evaluate theaccuracy of both the metric as a whole and its individual components. Also, the various parametersused within the metric need to be tuned. A well tuned metric might eliminate the need to include γζin Equation (23), such that we could let γζ = 0 and still plan a satisfactory zoom schedule. Finally,we could consider the uncertainty in the parameter set by averaging over their individual distributionsto obtain

VTTP(z, x) =

∫P

VTTP(z, x |P) p(P) dP ,

where p(P) is probability density function over the parameters P .The VTTP metric has potential for improvement in its current components and for addition of

new components. We set a constant motion threshold in the motion component, but in reality thethreshold should scale proportionally to ζ

s(z,x), where ζ is the focal length and s(z, x) is the slant range

from the sensor to the point on the terrain. A clutter component using, for example, one of the cluttermetrics used by Henderson et al. could be added to model the difficulty of finding the object in acluttered versus a non-cluttered environment [18]. The actual lighting of an area could be estimatedfrom the video by taking both the brightness of the pixels and the exposure time of the camerainto account. A Fourier or wavelet transform of the video could be passed into a machine learningalgorithm, like a neural network, trained to account for factors not covered by other components ofthe metric. There is also potential for detecting moving objects and accounting for the effects of theshadow cast by the object itself.

6 Conclusions

This paper has extended existing image utility metrics to develop the VTTP metric used to quantifythe utility of video imagery. We have shown through a user study that relative differences in VTTPvalues correlate with differences in the user preferences. We also develop optimization techniques toplan paths that maximize the predicted VTTP metric, thereby improving the video usability and theoperator’s viewing experience.

We provide the theoretical framework to plan optimal paths for the UAS path, the sensor pathon the ground, and the zoom scheduling. To bridge theory and application, this paper presents astaged approach to successively plan the sensor path along the terrain, the UAS flight path, and thesensor zoom level. Computation in real time in flight is not yet possible, but these algorithms were

26

successfully applied to precompute a UAS path and sensor schedule to survey a specified region inboth simulation and hardware.

One of the objectives of this work was to reduce the workload for UAS operators. Currently,for any surveillance mission, one or more operators determine a flight plan, then manually controlthe sensor gimbaling and the zoom level while monitoring the video screen for potential objects ofinterest. In performing flight tests simulating these movements, we found that three people couldperform these tasks comfortably: one person directing the sensor and zoom movements, one personcontrolling the UAS path, and the final person focused completely on DRI tasks and directing the UASand gimbal movements. In contrast, our system requires the operator to load the flight parametersand then execute the planning algorithms. Upon launching the UAS, the planned path is executedautonomously and the operator is free to focus on the video stream. However, we believe that muchmore work could be done to reduce the operator workload further, and encourage others to improveon the methods presented here.

Acknowledgments

This research was made with Government support in part by Army SBIR Contract W911W6-09-C-0047 and also under and awarded by DoD, Air Force Office of Scientific Research, National DefenseScience and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a.

A List of Parameters

We acknowledge that there are many parameters needed to implement this planning scheme. Table 3contains a complete list of the parameters used to generate the results presented in this paper. Notethat only some of the listed parameters are tuning parameters. The parameters listed in Table 3 fallinto three categories: Hardware specifications, empirically determined parameters, and user-definedparameters. Parameters defining the UAS velocity or the camera imager are defined solely by thehardware specifications. Empirically determined parameters, such as the parameters needed for theVTTP metric, are parameters that have been verified in existing work through extensive user studies,as in [39]. Finally, the remaining user-defined parameters are those that the operator can to tune toaffect the performance of the planner to suit their preferences or their needs.

References

[1] “USGS National Map Viewer.” [Online]. Available: http://nationalmap.gov/viewer.html

[2] “Unmanned Systems Integrated Roadmap FY2011-2036,” Report of the Office of the Secretaryof Defence, DOD, USA, Tech. Rep., 2011.

[3] J. S. Accetta and D. L. Shumaker, The Infrared and Electro-Optical Systems Handbook. AnnArbor, MI. and Bellingham, WA: Infrared Information Analysis Center, 1993.

[4] M. Argyle, C. Chamberlain, and R. Beard, “Chain-Based Path Planning for Multiple UAVs,” in2011 IEEE Conference on Decision and Control, 2011.

27

http://nationalmap.gov/viewer.html

beard

Cross-Out

beard

Cross-Out

Simulation and Flight Test ParametersUAS

Initial north position 40758mInitial east position 442080m

Altitude (Height above launch) 300mInitial heading -180◦

Velocity 17m/s

Camera

Minimum focal length ζmin 4.2mmMaximum focal length ζmax 42mm

Center x pixel ox 320 pixelsCenter y pixel oy 240 pixels

Pixel width dimx 4.65µmPixel height dimy 4.65µm

Pixel FOV ]pixel 2 arctan(

(ox dimx)2ζ

)/ox

Frame rate w 15 frames/sec

Reward

High reward weighting ω 2Low reward weighting ω 0.4

Reward covariance Σr [ 150000 00 150000 ]

Home base location zbase [ 40758m442605m ]

Reward collection rate γr 1

VTTP

Object dimensions (height, width, depth) 2m× 0.5m× 0.5mLowest spacial frequency κlow .1Highest spacial frequency κcut 5000

MRC curve empirical parameters [ a1, a2, a3 ] [ −0.0235, 0.0742, 1.4445 ]TTP integral empirical parameters [ b1, b2 ] [ 1.33, 0.23 ]

Minimum line pairs on target N50 30 pairsAmbient lighting parameter α 0.2

Number of image features to track Nf 60 featuresMotion magnitude threshold mT 40pixels/frameObserver model covariance ΣO [ 40000 0

0 22500 ]

StagedPath

Planner

Number of waypoints in lookahead window N 7 waypointsWaypoints to fly before re-planning Nfly 4 waypoints

Time between each waypoint ∆t 4 secMinimum sensor slew distance Lmin 0mMaximum sensor slew distance Lmax 100m

Ideal sensor slew distance Lnom 20mChain parameters [ γ1, γ2, γ3 ] [ 1, 2, 1 ]

Max heading change ∆ψmax 0.1545 rad/secFlight path waypoint tolerance ∆Lu 8m

Zoom parameter γζ 550

Table 3: Parameters used to plan UAS flight and sensor paths.

28

[5] C. Bolkcom, “Homeland Security: Unmanned Aerial Vehicles and Border Surveillance,” CRSReport for Congress, Code RS21698, Tech. Rep., 2008.

[6] A. Borji and L. Itti, “State-of-the-art in Visual Attention Modeling,” in IEEE Transactions onPattern Analysis and Machine Intelligence, 2012.

[7] J. Bouguet, “Pyramidal Implementation of the Lucas Kanade Feature Tracker. Description ofthe Algorithm,” Intel Corporation, Microprocessor Research Labs, Tech. Rep., 1999.

[8] F. Bourgault, T. Furukawa, and H. Durrant-Whyte, “Coordinated Decentralized Search for aLost Target in a Bayesian World,” in Proceedings 2003 IEEE/RSJ International Conference onIntelligent Robots and Systems, 2003, pp. 48–53.

[9] ——, “Optimal Search for a Lost Target in a Bayesian World,” in Proceedings of the InternationalConference on Field and Service Robotics, 2003, pp. 209–222.

[10] N. Ceccarelli, J. Enright, E. Frazzoli, S. Rasmussen, and C. Schumacher, “Micro UAV PathPlanning for Reconnaissance in Wind,” in Proceedings of the 2007 American Control Conference,Jul. 2007, pp. 5310–5315.

[11] T. H. Chung and J. W. Burdick, “A Decision-Making Framework for Control Strategies in Proba-bilistic Search,” in Proceedings 2007 IEEE International Conference on Robotics and Automation,Apr. 2007, pp. 4386–4393.

[12] C. Cornwall, A. Horiuchi, and C. Lehman, “NOAA Solar Position Calculator.” [Online].Available: http://www.esrl.noaa.gov/gmd/grad/solcalc/azel.html

[13] L. Dubins, “On Curves of Minimal Length with a Constraint on Average Curvature, and withPrescribed Initial and Terminal Positions and Tangents,” American Journal of Mathematics,vol. 79, no. 3, pp. 497–516, 1957.

[14] J. Franke, V. Zaychik, T. Spura, and E. Alves, “Inverting the operator/vehicle ratio: Approachesto next generation UAV command and control,” in Proceedings of AUVSI, 2005.

[15] P. Gill, W. Murray, and M. Wright, Practical Optimization. London: Academic Press, 1981.

[16] M. Goodrich, B. Morse, C. Engh, J. Cooper, and J. Adams, “Towards Using UAVs in WildernessSearch and Rescue: Lessons from Field Trials,” Interaction Studies, vol. 10, no. 3, 2009.

[17] S. Hansen, T. McLain, and M. Goodrich, “Probabilistic Searching Using a Small UnmannedAerial Vehicle,” in Proceedings of AIAA Infotech@Aerospace, 2007.

[18] J. Henderson, M. Chanceaux, and T. Smith, “The Influence of Clutter on Real-World SceneSearch: Evidence from Search Efficiency and Eye Movements,” Journal of Vision, vol. 9, no. 1,pp. 1–8, 2009.

[19] J. Johnson, “Analysis of Image Forming Systems,” Image Intensifier Symposium, vol. AD 220160,pp. 244–73, 1958.

[20] W. Johnson, John; Lawson, “Performance Modeling Methods and Problems,” in Proceedings ofthe IRIS Imaging Systems Group, 1974, pp. 40–44.

29

http://www.esrl.noaa.gov/gmd/grad/solcalc/azel.html

[21] T. Judd, K. Ehinger, F. Durand, and A. Torralba, “Learning to predict where humans look,” inIEEE 12th Internation Conference On Computer Vision, 2009, pp. 2106–2113.

[22] J. Kim and Y. Kim, “Moving Ground Target Tracking in Dense Obstacle Areas using UAVs,” inProceedings of IFAC World Congress, 2008.

[23] G. P. Kladis, J. T. Economou, K. Knowles, J. Lauber, and T.-M. Guerra, “Energy conservationbased fuzzy tracking for unmanned aerial vehicle missions under a priori known wind information,”Engineering Applications of Artificial Intelligence, vol. 24, no. 2, pp. 278–294, Mar. 2011.

[24] M. Kontitsis and K. Valavanis, “A UAV Vision System for Airborne Surveillance,” in Proceedingsof the 2004 IEEE International Conference on Robotics and Automation, Vol. 1, 2004, pp. 77–83.

[25] R. Likert, “A Technique for the Measurement of Attitudes,” Archives of Psychology, vol. 22, no.140, pp. 1–55, 1932.

[26] G. Limnaios, N. Tsourveloudis, and K. Valavanis, Introduction, in Sense and Avoid in UAS:Research and Applications, P. Angelov, Ed. Chichester, UK.: John Wiley & Sons, Ltd, 2012.

[27] L. Lin and M. A. Goodrich, “UAV Intelligent Path Planning for Wilderness Search and Rescue,”in IEEE International Conference on Intelligent Robots and Systems, 2009.

[28] E. Magid, D. Keren, E. Rivlin, and I. Yavneh, “Spline-Based Robot Navigation,” in EEE/RSJInternational Conference on Intelligent Robots and Systems. Ieee, Oct. 2006, pp. 2296–2301.

[29] B. Morse, C. Engh, and M. Goodrich, “Aerial Video Coverage Quality Maps and PrioritizedIndexing,” in Proceedings of the 5th ACM/IEEE RAS International Conference on Human-RobotInteraction, 2010.

[30] J. Nelder and R. Mead, “A Simplex Method for Function Minimization,” Computer Journal,vol. 7, no. 4, pp. 308–313, 1965.

[31] P. Niedfeldt, R. Beard, B. Morse, and S. Pledgie, “Integrated Sensor Guidance using TargetProbability of Operator Detection,” in Proceedings of the 2010 American Control Conference,2010, pp. 788–793.

[32] P. C. Niedfeldt, B. T. Carroll, R. W. Beard, and S. Pledgie, “A Staged Path Planner for anUnmanned Air System Performing Surveillance and Tracking,” in Proceedings of the AIAAGuidance, Navigation and Control Conference, 2012.

[33] W. H. Press, S. A. Teukolsky, W. Vetterling, and B. P. Flannery, Numerical Recipies 3rd Edition:The Art of Scientific Computing. Cambridge University Press, 2007.

[34] M. Quigley, M. Goodrich, S. Griffiths, A. Eldredge, and R. Beard, “Target Acquisition, Localiza-tion, and Surveillance using a Fixed-Wing mini-UAV and Gimbaled Camera,” in Proceedings ofthe 2005 IEEE International Conference on Robotics and Automation, 2005, pp. 2600–2605.

[35] F. Rafi, S. Khan, K. Shafiq, and M. Shah, “Autonomous Target Following by Unmanned AerialVehicles,” in Proceedings of SPIE Defence and Security Symposium, 2006, pp. 6230–6242.

[36] Setnicka TJ; Andrasko K, Wilderness Search and Rescue. Boston, MA: Appalachian MountainClub, 1980.

30

[37] J. Shi and C. Tomasi, “Good Features to Track,” in Proceedings 1994 IEEE Conference onComputer Vision and Pattern Recognition, 1994, pp. 593–600.

[38] B. Tatler, “The central fixation bias in scene viewing: Selecting an optimal viewing positionindependently of motor biases and image feature distributions,” Journal of Vision, vol. 7, pp.1–17, 2007.

[39] R. Vollmerhausen and E. Jacobs, “The Targeting Task Performance (TTP) Metric: A New Modelfor Predicting Target Acquisition Performance,” U.S. Army CERDEC, Tech. Rep., 2004.

[40] S. Waharte, A. Symington, and N. Trigoni, “Probabilistic Search with Agile UAVs,” in Proceedingsof the 2010 IEEE International Conference on Robotics and Automation, 2010, pp. 2840–2845.

31

probabilistic search and surveillance using a video...

Documents