cooperative human-robot search in a partially-known ......cooperative human-robot search in a...

Cooperative human-robot search in a partially-known environmentusing multiple UAVs

Shivam B. Chourey

Thesis submitted to the Faculty of the

Virginia Polytechnic Institute and State University

in partial fulfillment of the requirements for the degree of

Master of Science

in

Computer Engineering

Kevin B. Kochersberger, Chair

Creed F. Jones III

Ryan K. Williams

August 13, 2020

Blacksburg, Virginia

Keywords: UAVs, Cooperative search, Path planning, Human-Robot cooperation

Copyright 2020, Shivam B. Chourey


Shivam B. Chourey

(ABSTRACT)

This thesis details out a system developed with objective of conducting cooperative search

operation in a partially-known environment, with a human operator, and two Unmanned

Aerial Vehicles (UAVs) with nadir, and front on-board cameras. The system uses two phases

of flight operations, where the first phase is aimed at gathering latest overhead images of the

environment using a UAV’s nadir camera. These images are used to generate and update

representations of the environment including 3D reconstruction, mosaic image, occupancy

image, and a network graph. During the second phase of flight operations, a human operator

marks multiple areas of interest for closer inspection on the mosaic generated in previous step,

displayed via a UI. These areas are used by the path planner as visitation goals. The two-step

path planner, which uses network graph, utilizes the weighted-A* planning, and Travelling

Salesman Problem’s solution to compute an optimal visitation plan. This visitation plan is

then converted into Mission waypoints for a second UAV, and are communicated through a

navigation module over a MavLink connection. A UAV flying at low altitude, executes the

mission plan, and streams a live video from its front-facing camera to a ground station over

a wireless network. The human operator views the video on the ground station, and uses it

to locate the target object, culminating the mission.


Shivam B. Chourey

(GENERAL AUDIENCE ABSTRACT)

This thesis details out the work focused on developing a system capable of conducting search

operation in an environment where prior information has been rendered outdated, while

allowing human operator, and multiple robots to cooperate for the search. The system

operation is divided into two phases of flight operations, where the first operation focuses

on gathering the current information using a camera equipped unmanned aircraft, while the

second phase involves utilizing the human operator’s instinct to select areas of interest for a

close inspection. It is followed by a flight operation using a second unmanned aircraft aimed

at visiting the selected areas and gathering detailed information. The system utilizes the

data acquired through first phase, and generates a detailed map of the target environment.

In the second phase of flight operations, a human uses the detailed map, and marks the areas

of interest by drawing over the map. This allows the human operator to guide the search

operation. The path planner generates an optimal plan of visitation which is executed by

the second unmanned aircraft. The aircraft streams a live video to a ground station over

a wireless network, which is used by the human operator for detecting the target object’s

location, concluding the search operation.

Dedication

Dedicated to my Parents - who sacrificed their time, money, health, and retirement for my

dreams.

iv

Acknowledgments

I’d like to thank my advisor Dr. Kevin Kochersberger for giving me the opportunity to join

the Unmanned Systems Lab, and be part of a motivated research group. His continuous

support and guidance enabled me to learn and improve. I’d also like to thank the committee

members, Dr. Ryan Williams, and Dr. Creed Jones for their constant support.

I’d like to thank the lab members - Manav, Felipe, and Avery for their valuable support

in this project, and assisting with the numerous flight operations. I’d also like to thank the

former lab-tech Drew.

At Virginia Tech, I had been part of two great departments - ME, and ECE. I wish to

thank the HODs, and the department staff of both the departments for their incredible sup-

port and guidance.

Finally I’d like to thank my family, and friends- who made it all possible.

v

Contents

List of Figures ix

List of Tables xiii

1 Introduction 1

2 Review of Literature 3

2.1 Autonomous UAV systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Human-operated UAV systems . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Cooperative Human-UAV systems . . . . . . . . . . . . . . . . . . . . . . . 6

3 System Design 7

3.1 Target Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2.1 UAVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2.2 Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.3 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.4 Mounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

vi

3.3.1 Phase-1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3.2 Phase-2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.3 High Altitude Surveyor (HAS) . . . . . . . . . . . . . . . . . . . . . 15

3.3.4 Low Altitude Inspector (LAI) . . . . . . . . . . . . . . . . . . . . . . 16

3.3.5 Local Ground Stations . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3.6 Human-in-the-loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Results 19

4.1 Phase-1 Flight Operations: HAS Flights . . . . . . . . . . . . . . . . . . . . 19

4.2 3D-Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2.1 Geotagging images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2.2 Reconstructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3.1 Shadow and road detection . . . . . . . . . . . . . . . . . . . . . . . 28

4.3.2 Texture based segmentation . . . . . . . . . . . . . . . . . . . . . . . 29

4.4 On-line Image mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.5 Occupancy Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.6 Perimeter Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.7 Network Graph Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.7.1 Network Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

vii

4.7.2 Non-contour Edge-Cost Factor (ECF) . . . . . . . . . . . . . . . . . 44

4.8 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.9 Path planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.9.1 Dijkstra’s planning algorithm . . . . . . . . . . . . . . . . . . . . . . 50

4.9.2 A* planning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.9.3 Travelling Salesman problem . . . . . . . . . . . . . . . . . . . . . . 52

4.9.4 Two-Step Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.10 Pixel to GPS Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.11 LAI Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.12 Phase-2 Flight Operations: LAI Flights . . . . . . . . . . . . . . . . . . . . . 57

5 Summary & Conclusions 63

Bibliography 66

Appendices 69

Appendix A Path Planned for different set of visitation goals 70

Appendix B Pixel to GPS transformations 75

B.1 Data values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

B.2 Homography matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

viii

List of Figures

3.1 S-500 with raspberry-pi, and nadir camera . . . . . . . . . . . . . . . . . . . 9

3.2 X-500 with front-facing camera . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 The Imaging Source USB 2.0 camera mounted on the Quadcopter . . . . . . 12

3.4 The Raspberry-Pi, and the front-facing TIS camera mounted on the X-500

using a mount with vibration isolators . . . . . . . . . . . . . . . . . . . . . 13

3.5 Phase-1 of the system involved capturing images of the environment by the

HAS, which was followed by using the data-set for creating scene representa-

tions. The HAS executed a pre-planned flight mission to fly over the target

environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.6 The tasks involved in Phase-2 of the system . . . . . . . . . . . . . . . . . . 15

4.1 Coverage flight plan over the KEAS lab at the Kentland Farms . . . . . . . 21

4.2 Flowchart displaying the process of Phase-1 flight operations . . . . . . . . . 22

4.3 Metashape reconstruction for geotagged downsampled image-set . . . . . . . 25

4.4 Pix4D reconstruction for non geotagged downsampled image-set . . . . . . . 26

4.5 Overhead mosaic image of the target environment located near KEAS Lab at

the Kentland Farms in Blacksburg. This image was generated from the 3D

reconstruction created by the Pix4D Mapper . . . . . . . . . . . . . . . . . . 27

4.6 Roads identified in mosaic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

ix

4.7 Shadows identified in mosaic . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.8 A grayscale image showing four different materials. By looking at this image

it is possible for humans to distinctly identify the materials even without color

information. [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.9 Figure showing the images of the 48 filters of the Leung-Malik (LM) filter

bank [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.10 Image showing the result of texture based segmentation that used LM filters

to generate the feature descriptors and k-means clustering for segmenting the

pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.11 The sub-image (a) shows the last image region isolated in the existing mosaic;

Sub-image (b) shows Mask representing the last image in the existing mosaic 36

4.12 Matches detected between an existing mosaic and a new image . . . . . . . . 36

4.13 The image showing the mosaic generated from 25 images of the Kentland

farm area, passed sequentially, using the original code developed. The mosaic

contains multiple inconsistencies including a discontinuous pole, and shed.

Hence, this solution was not used in the system. . . . . . . . . . . . . . . . . 38

4.14 The image showing the mosaic generated from 30 images of the Kentland

farm area using the OpenCV functionality when all the images are provided

at once. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.15 The mosaic generated by the OpenCV functionality for a set of images of the

Kentland farm, passed to it sequentially. The program exited with an error

after processing 15 images, and is unable to finish processing the 25 image set. 39

x

4.16 Occupancy image shows the sheds and structures taller than 1.5 m. These

structures are obstacles for the LAI flight. . . . . . . . . . . . . . . . . . . . 41

4.17 Image displaying the contours generated enclosing the sheds and other struc-

tures in the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.18 Image displaying the contour and non-contour edges of the graph created for

the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.19 Path generated by the path planner for various values of non-contour edge

cost factor. For values of ECF greater than 1, the planner favors edges along

contours, while for lower values the path favors non-contour edges . . . . . . 47

4.20 The image shows UI displaying the overhead mosaic image with a rectangle

drawn (in red) over an area of interest. At the top a ”Save visitation goals”

button is present that saves the central coordinates of all the rectangles drawn

by the human operator in a text file on disc. . . . . . . . . . . . . . . . . . . 49

4.21 The image shows the expansion as the Dikjstra’s algorithm as it finds the

optimal path from start node (green) to the goal node (red). [20] . . . . . . 51

4.22 The image shows the expansion as the A* algorithm as it finds the optimal

path from start node (green) to the goal node (red). [20] . . . . . . . . . . . 52

4.23 Tasks carried out by the two steps of the path planner . . . . . . . . . . . . 54

4.24 The image shows the reference points used to compute the transformation

matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.25 The image shows the process flow chart for the phase-2 flight operations . . 58

xi

4.26 Figure (a) shows the areas of interest selected by the human operator for closer

inspection. Figure (b) shows path planned by the two-step path planner while

the Figure (c) shows the Mission Planner screenshot showing the LAI flight

path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.27 Figure (a) shows the LAI during the flight and Figure (b) shows the screenshot

of video streamed by the on-board TIS camera to the ground station-2. . . . 62

A.1 Path generated by the path planner for a random set of visitation goals -1 . 72

A.2 Path generated by the path planner for a random set of visitation goals -2 . 74

xii

List of Tables

4.1 Time taken for 3D-reconstruction with Metashape for different options using

the geo-tagged image data-set . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 Time taken for 3D-reconstruction with Pix4D for different options for geo-

tagged image data-set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3 Comparison of the capabilities of the OpenCV Image Stitcher and the on-line

image mosaicking solution developed for this work. . . . . . . . . . . . . . . 38

B.1 The table shows the data for the Pixel coordinates and GPS coordinates for

the reference points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

xiii

List of Abbreviations

HAS High Altitude Surveyor

KEAS Kentland Experimental Aviation System

LAI Low Altitude Inspector

QGC QGroundControl

ROS Robot Operating System

SfM Structure from Motion

UAV Unmanned Aerial Vehicle

xiv

Chapter 1

Introduction

The Unmanned Aerial Vehicles (UAVs), popularly known as drones, have seen a continuous

rise in their utility in various research, and commercial applications, in the recent years.

Reduction in cost of components, portable electronics, lighter designs, and longer flight times

have been some of the most important contributing factors to their fame [7]. The UAVs are

capable of carrying application specific payloads which makes them a valuable asset in a wide

range of applications. While traditionally UAVs were used primarily for reconnaissance, and

surveillance applications by the military, recent trends indicate the growing use of UAVs in

the domains of research, and commercial applications.

One of the many such research applications is disaster management assistance [6], [16]. The

UAVs are utilized at various stages of disaster management- from monitoring, and scouting to

search and rescue operations. Search and rescue scenarios involve gathering information by

surveying an area to identify the target location, and assess the environment before planning

a rescue operation. The ability of drones to cover large areas in short time [15], and fly over

obstacles to provide information, otherwise inaccessible, makes them indispensable in such

operations. Often such operations involve locating and extracting humans from situations

like debris of buildings, flooded areas, wildfires, deep forests etc. Disasters typically modify

the structure of an environment, and any prior information like floor plans, images, or maps

are rendered outdated or inaccurate. Hence, the prior information cannot be relied upon for

conducting rescue operations, and information about the current state of the environment

1

2 Chapter 1. Introduction

should be gathered before planning a search or rescue operation.

In search operations the UAV control could be autonomous, human-assisted, or cooperative.

The search environments are often unstructured which reduces the efficiency of autonomous

systems. The search and rescue operations are time critical, and since the flight-time of UAVs

is currently limited, it is important to develop optimal strategies for UAV-assisted search and

rescue operations. The author in [6] noted that the autonomous scouting of a disaster area

using the self-learning techniques may not be optimal, and has advocated human guidance

for these operations. Currently, humans are more suitable for versatile tasks than automated

learning algorithms which often tend to be specific to a singular application. Humans are

more adept at contingencies which are common occurrence during disaster scenarios. In such

cases, an application specific program may not be sufficient. Hence, a cooperative framework

is needed where the human operator, and the UAV(s) can work together to fulfill objectives

of a search and rescue operation in an environment that is partially known.

This work focuses on developing a system that is capable of performing search operation in

a partially known environment while allowing UAVs and human operator to work together

efficiently. This involves using one UAV for surveying the environment to gather latest en-

vironmental features, a human operator to identify areas of interest within the environment,

and another UAV for visiting these areas of interest for a closer inspection.

Chapter 2

Review of Literature

As discussed in the Introduction section, the UAVs are being increasingly used in a wide

range of research applications, one of them being Search and rescue operations. A growing

number of research groups have worked in this area and published their findings. Different

research groups have adopted different strategies of utilizing the vehicles during these oper-

ations. Based on the level of autonomy of the UAVs during the operations, these search and

rescue systems could be broadly divided into three categories - Autonomous systems, Manual

systems, and Cooperative systems. In the case of Autonomous systems the path planning,

navigation, and target-detection are carried out autonomously by systems on-board the air-

craft without human intervention. The second group of researchers have used a human

operator to control or assist path planning, and navigation. In these systems, the human

operator is also responsible for the target detection. The third category of researchers have

used UAVs for search operations in collaboration with humans. In these systems human op-

erator, and autonomous systems may be involved in mission planning, navigation, or target

detection stage. The following subsections discuss these approaches in further details.

2.1 Autonomous UAV systems

Various research groups have worked on development of a completely autonomous system

for performing the search and rescue operations using UAVs without human intervention.

3

4 Chapter 2. Review of Literature

In [21], the researchers attempted to create a fully autonomous system using a quadcopter

UAV for urban search and rescue missions. The group focused on developing a system

with high on-board processing capacity, and modular, and flexible sensing and planning

capabilities. The UAV is equipped with a nadir stereo camera system, front-facing camera,

upward facing camera, and a laser scanner which are used in combination for odometry,

depending on lighting and other factors. This is done to add flight capabilities in indoor as

well as outdoor environments. The stereo camera system is also used for target identification

in the search mission. The autonomous system relies upon recognition, and action modules

for exploring the environment and finding the target. In this system while all the processing

is done on-board the aircraft, it lacks the object avoidance capabilities. In [18] the research

group used a UAV equipped with thermal/visual cameras, and avalanche beacon sensors for

autonomous search and rescue during avalanches. The research group developed autonomous

terrain following capabilities for the drone, using a laser distance sensor. The UAV flies

forward, and backward, and changes direction as needed in order to acquire the maximum

beacon signal strength. The flight is guided by beacon sensor signal and ends at the signal

acquisition. The aircraft performs automated landing using the altitude from the laser

distance sensor. The system is capable of conducting autonomous flights in order to perform

search and rescue mission.

In [21], and [18], the research groups have developed systems capable of performing au-

tonomous flights for the search and rescue mission with a single UAV. From the take-off to

target detection, all the activities are carried out by autonomous systems without any human

intervention. Both the research groups used quadcopter UAVs. While [21] used autonomous

vision based systems to identify target, [18] used the beacon signal strength to locate its

goal.

2.2. Human-operated UAV systems 5

2.2 Human-operated UAV systems

Contrary to the previous section 2.1 where the UAVs are equipped with autonomous map-

ping, planning, navigation, and target detection systems for performing the search and rescue

operations, another group of researchers at the Brigham Young University (BYU), have used

human guided UAVs such operations. In [11], and [10], the authors have discussed systems

for UAV assisted wilderness search and rescue missions. These operations typically involve

locating a missing person or their signs in the wilderness or deep forests. The system de-

veloped for this operation used a single fixed-wing UAV for the flight operations. The UAV

was equipped with an on-board nadir camera, and is used to gather signs of the missing

person. The path planning, and flight operations are assisted by a human operator. The

human operator guides the vehicle through a series of waypoints over the areas of interest

to gather information during the flight missions. The images, and videos acquired using the

nadir camera are analysed by a human for detecting the missing person or signs of one. This

search operation is driven by human operators, from planning, and navigation to the target

detection.

While the autonomous systems face difficulties in unstructured environment, having a human

operator limits the speed of operations. A trade-off can be made in order to bring the best

of both worlds together, and develop cooperative human-robot systems. In [6] the author

concedes that the autonomous scouting of a disaster area using a UAV is not optimal. In

[4], the author has discussed that effective human-robot interaction can improve the search

and rescue operations resulting in increase in the success rate.

Humans are currently more efficient than autonomous systems at assessing wide variety of

situations, handling contingencies and using environmental cues for decision making. Util-

ising human operator for these tasks in a cooperative system could prove more efficient than

6 Chapter 2. Review of Literature

both - the fully autonomous systems, as well as the human operated UAV systems.

2.3 Cooperative Human-UAV systems

In [17] the authors have developed a multi-UAV autonomous system for search and rescue

operations. For conducting field experimentation, the authors had used 4 UAVs. Two of these

were equipped with Matrixvision cameras and Mastermind processors, while the other two

UAVs were equipped with Logitech webcams and Atom processors. All the UAVs featured

nadir or downward-facing cameras. The system operated on Robot Operating System (ROS)

and streamed videos from all the cameras were sent to the observer stations on the ground

during the flight,using the wireless communication infrastructure developed. The flight paths

for the UAVs were pre-planned to optimise the search time, and flight plan were sent to the

each of the UAVs as waypoints, through wireless communication. The search objective for

the on-board visual detection modules were to detect a person with red-jacket. In this

system, the path planning was manual, but the navigation, and target detection activities

were carried out by autonomous systems.

These examples display the versatile approaches to the UAV involvement in search and rescue

operations. While the autonomous systems perform the operations ranging from take-off to

target detection without human intervention, the manual systems involve a human operator

at every stage of the operation. The cooperative systems divide the tasks between human,

and autonomous systems leveraging the human intuition, and efficiency of autonomous sys-

tems. The cooperative systems have the potential to perform more efficiently than manual

systems, while being more flexible, and versatile than the autonomous systems.

Chapter 3

System Design

3.1 Target Environment

The goal of this work is to develop a cooperative system capable of searching for an object of

interest in a partially-known environment efficiently, while allowing a human operator and

UAVs to work together in cooperation. The cooperation between a human operator, and the

UAV in this system allows it to leverage the intuitive abilities of humans during planning,

and the efficiency of autonomous systems during the flight operations. This makes for an

efficient search and rescue system.

The KEAS lab testing facility at the Kentland Farms in Blacksburg was selected for the field

experimentation. It consisted of disparate features including open grassy areas, gravel roads,

equipment sheds of different shapes and sizes, and agricultural containers. The equipment

sheds were of two varieties open sheds and closed sheds. In total, the target area consists of

4 sheds and 2 metallic agricultural storage containers. The environment and its feature were

assumed to be static for this work. For this problem area, the preferred areas of interest for

closer inspection included the equipment sheds and the containers, as they could house the

object of interest concealing it from an overhead camera.

7

8 Chapter 3. System Design

3.2 Hardware

The following section discusses the hardware components used for conducting the two-phases

of the flight operations.

3.2.1 UAVs

Multi-rotor UAVs are lightweight and cheaper, which has made them a popular choice among

researchers. These UAV are also capable of vertical take-off and landing (VTOL), which

makes them useful in environments with limited flight space. One of the disadvantages of the

multi-rotors is their low endurance or flight time. Among, multi-rotor UAVs the quadcopters

and hexcopters are widely used in research, and commercial application. Quadcopters are

generally smaller in size, and lighter than Hexcopters. For the field experimentation of this

project, a UAV with light payload capacity and a flight time of 10 minutes was needed.

Hence, a decision was made to use the Quadcopters for both High Altitude Surveyor (HAS),

and Low Altitude Inspector (LAI) to fly the field missions.

Two different quadcopter UAVs were used for the field operations of this project. For the first

phase of operations, a Holybro S-500 quadcopter was used as the HAS. S-500 is lightweight

at 1.3 Kg, and came with a telemetry radio, assembled power management board with ESCs,

PixHawk 4 autopilot, and GPS system. With dimensions of 383 * 385 * 240 mm, it’s a small

UAV that provided 12 minutes of flight time, without payload, with the 5000 mAh Lipo

batteries. Figure 3.1 shows the Holybro S-500, with the Raspberry Pi and nadir facing TIS

USB camera mounted on it.

For the second phase a flight operations, a HolyBro X-500 quadcopter was selected as the

LAI. The X-500 quadcopter is an improved version of the S-500. The X-500 featured carbon-

3.2. Hardware 9

Figure 3.1: S-500 with raspberry-pi, and nadir camera

fibre arms, unlike the plastic arms of S-500. This resulted in a increased durability, and a

lighter aircraft at 0.98 Kg. It also included an updated version of power management unit

from Holybro and was slightly larger with dimensions of 410 * 410 * 300 mm. With a similar

battery, this UAV provided an additional 2-3 minutes of flight time compared to the S-500.

This Quadcopter version was unavailable during the first phase flights. The X-500 equipped

with a front facing camera TIS USB camera and Raspberry Pi is displayed in Figure 3.2.

3.2.2 Raspberry Pi

Raspberry Pi is a widely used single board computer in the field of robotics research. The

Raspberry Pi is light-weight, portable, relatively inexpensive, and provide support to a wide

range of peripheral devices which account for its popularity.

For this project, Raspberry Pi version 4b with 2 GB RAM was selected. It had an on-board


Figure 3.2: X-500 with front-facing camera

wireless networking, which allowed creation of local WiFi network. It also featured a gigabit

Ethernet port, two USB 2.0, two USB 3.0 ports, and two micro-HDMI port. The USB

ports allowed connectivity to peripheral devices like camera, mouse, and keyboard, while

the micro-HDMI ports could be used for connecting to a display monitor.

3.2.3 Camera

During the first phase of flight operations, two different camera system were used for gath-

ering the image data-set.

The first one was the The Imaging Source (TIS) DFM 42BUC03-ML, a USB 2.0 camera,

shown in Figure 3.3. It’s a 1.2 MP camera capable of capturing upto 25 frames per second.

3.2. Hardware 11

Primary reason for selecting this camera were its features which included a global shutter, its

USB 2.0 support, and an SDK for interacting with the camera that was supported on Linux

based systems. The global shutter avoided motion-blurs allowing acquisition of high quality

images, especially during relative motion between the camera and the target object. The

images taken from a rolling shutter camera are susceptible to inconsistencies due to wobble

(also known as Jello effect), and skew distortions. From previous experience, the USB 3.0

ports were known to interfere with the Wireless signals. Since, the Raspberry Pi on-board

the UAV was expected to provide the wireless network for connecting to the local ground

station system, a USB 2.0 camera was preferable to a USB 3.0 camera. Having an associated

SDK with set of APIs was necessary to customize various camera parameters for acquiring

high quality data, and remotely trigger image/video acquisition.

Apart from the TIS camera, a Sony RX0 camera was also used during the first phase of

flight operations. The RX0, a compact digital camera, featured Exmor RS CMOS sensor,

capable of taking up to 15 MegaPixel (MP) images. The images captured by the RX0 were

high resolution (15 MP as opposed to 1.2 MP of TIS camera), and well exposed owing to its

well-tuned firmware. These images were more suitable to be used for 3D reconstruction of

the target environment on the Kentland farms.

3.2.4 Mounts

To place the Raspberry Pi and the USB camera on-board the UAV, a mount was designed

to fix on the same rods that hold the battery mount. This location was chosen as it was

sufficiently away from the propellers which ensured safety of the assembly, and also clear the

UAV components from the field of view of the camera. The mount was designed and 3-D

printed in the lab by, colleague in the lab, Felipe. It allowed the Raspberry Pi and the USB


Figure 3.3: The Imaging Source USB 2.0 camera mounted on the Quadcopter

camera to be screwed - fixing them safely, while the mount itself rested on the bars that

support the battery mount. Since the HAS needed a nadir facing camera, while the LAI

needed a front facing camera orientation, the mount was designed to be versatile, to allow

to change the camera orientation between vertical and horizontal. This could be done by

changing the position of a sub-mount to which camera was attached. This sub-mount also

had cut-out that allowed the USB cable to be plugged into the camera from behind, while

being mounted as shown in Figure 3.3. In later version of the mount, vibration insulators

were incorporated in the design to protect the camera images from getting distorted due to

vibrations during flight. The mount is displayed in Figure 3.4.

3.3. Architecture 13

Figure 3.4: The Raspberry-Pi, and the front-facing TIS camera mounted on the X-500 usinga mount with vibration isolators

3.3 Architecture

The system design has been divided into two phases based on the flight operations. The fol-

lowing subsections describe the objectives of the two phases and the roles of their components

in details.

3.3.1 Phase-1 Design

The objective of the first phase of operations was to collect the latest information about the

environment. This was done by using a High Altitude Surveyor (HAS) UAV equipped with

a nadir camera. The data-set obtained by the flight operations was used to generate and

update the representations of the environment. This work used the data-set to generate three


different representations : 3D-reconstruction for scene representation, Occupancy Image

to represent the obstacles, and Network graph to represent the traversable paths. These

representations were generated after the data-collection through the flight operations.

Figure 3.5: Phase-1 of the system involved capturing images of the environment by theHAS, which was followed by using the data-set for creating scene representations. The HASexecuted a pre-planned flight mission to fly over the target environment.

3.3.2 Phase-2 Design

The second phase of the flight operations was designed to allow a human operator to guide

the search operation, and execute a mission plan to obtain detailed information for decision

making. The human operator identified and marked areas of interest for closer inspection, by

clicking-and dragging the mouse over the mosaic image displayed by specially designed user-

interface. Using these areas of interest, a path planner developed an optimal mission plan,

which was then executed by the Low Altitude Inspector(LAI) aircraft. During the flight,


LAI communicated the detailed information to a ground station. The human operator used

this information for manual target location, and detection and further decision making. The

image 3.6 shows the tasks carried out in the Phase-2.

Figure 3.6: The tasks involved in Phase-2 of the system

3.3.3 High Altitude Surveyor (HAS)

A Holybro S-500 was used as the High Altitude Surveyor (HAS) for obtaining updated scene

data. The payload on the HAS includes Raspberry Pi 4b, and a camera. The HAS was used

with two different cameras - The Imaging Source (TIS) USB 2.0, and Sony RX0. In each

flight operation the HAS carries only one of the two cameras. During the missions where the

HAS carries TIS camera, the on-board Raspberry Pi was used to create the WiFi network

which was used by a ground station computer to establish a connection, and remotely run a

Python script to start image acquisition. In other flight operations, where the HAS carried

the RX0, the camera was set to capture images continuously by holding the camera trigger

down, prior to the start of the flight. The HAS carried out a pre-planned coverage planning


mission that was uploaded to its PixHawk using the QGroundControl software on the ground

station. The mission altitude was preset to 20 meters based on the camera-lens parameters,

vehicle ground-speed, and the lawnmower mission plan parameters.

3.3.4 Low Altitude Inspector (LAI)

A HolyBro X-500 was used as the Low Altitude Inspector (LAI) flew at lower altitude (2-3

meters) within the environment. The LAI received its visitation goals as mission waypoints

over the MavLink connection from a ground station computer. These visitation goals were

generated by the path planner by using the areas of interest, marked by the human operator

on the UI. The payload for LAI included a Raspberry Pi 4b, and a TIS USB camera. The

wireless network of Raspberry-Pi was used to establish a connection between the on-board

computer and a second ground station. At the start of flight operation, the ground station

started a remote Gstreamer pipeline on the on-board computer to start a video stream from

the on-board camera, over a wireless UDP connection, to the ground station.

3.3.5 Local Ground Stations

The HAS flights, and the LAI flights were conducted in two different phases which is dis-

cussed in details in the Section 4.1 and Section 4.12.

For the first phase of flight operations which involved only HAS, a single ground station was

used. When the HAS carried a TIS USB 2.0 camera. During this operation, the ground

station ran QGroundControl(QGC) which was used to upload the flight plan to the HAS over

MavLink, and monitor the UAV vitals, and the flight mission. The ground station was also

connected to the on-board computer using its wireless network, and was used to remotely

execute a python script on the on-board computer that started the image acquisition. This


python script utilized the TIS Software Development Kit (SDK) APIs to set the camera

parameters, and trigger the image acquisition on regular intervals.

In the second phase of flight operations, the LAI visited the areas of interest within the

environment. During this operation, two ground stations were used. The local Ground

Station-1 ran a User-Interface (UI) which displayed the mosaic image of the environment,

and allowed the remote human operator to mark the areas of interest by drawing rectangles

over the mosaic. Once the areas of interest were identified, a python code was executed on the

ground station-1 that generated a flight path for the LAI, in the terms of a series of mosaic

image pixels coordinates, and then converts these pixel coordinates into the GPS coordinates,

and then communicated these GPS locations as waypoint goals to the LAI using the DroneKit

library APIs that used MavLink protocol for communication with the UAV. The Ground

Station-1 also ran the Mission Planner software to monitor the flight mission, and UAV

vitals. To allow Mission Planner software, and the Dronkit code to communicate with the

LAI simultaneously, it also executed MavProxy. The local Ground Station-2 connected

to the wireless network of the Raspberry-Pi on-board the LAI, and remotely executed the

Gstreamer application pipeline that sent a video stream from the on-board TIS camera,

over UDP connection, to the Ground Station-2. Locally, Ground Station-2 executed another

Gstreamer pipeline that received, and displayed the video stream.

3.3.6 Human-in-the-loop

Human operator is an integral component in this system’s design. During the second phase of

fight operations, the human operator guided the search operation, and observed a live video

stream for target identification, and decision making. The operator used a remote ground

station to remotely log in to the local Ground Station-1, present at the target environment


site during the experimentation. The operator viewed the mosaic image displayed through

the UI, and marked the potential areas of interest by drawing rectangles by clicking and

dragging the mouse over the mosaic image. The marked areas were used as visitation goals

by the planner module that generated an optimal visitation plan, which is discussed in

more details in Section Path planner. The operator then executed the code on ground

station-1, that called the path planner, converted planner’s results from image pixels to

GPS coordinates, and communicated the GPS locations as sequential waypoint missions to

the LAI. The human operator also observed the video stream from the LAI on the ground

station-2. The operator manually identified the goal in the video stream which completed

the objective of the operation.

Chapter 4

Results

This chapter discusses the results of the various steps involved in the operation. Each section

in this chapter is devoted for a discussion of the methods used and the outputs obtained.

4.1 Phase-1 Flight Operations: HAS Flights

A partially-known environment implies the lack of accurate or complete information about

the environment. It may include a lack of a floor-plan, or topography information of the

environment. In this scenario, the term ’partially-know environment’ refers to the outdated

prior scene information. The outdated information could imply absence of accurate location

information of various objects in the target environment. This could be due to addition

or removal of objects, or change in the position of objects within the environment, relative

to the prior. These objects could either be obstacles during exploration with Low Altitude

Inspector (LAI) or potential objects of interest in the mission. The prior information may

be available in the form of previous maps of the environment, overhead photos from Google

Images, data-set from previous flight operations, etc.

In this case, the KEAS lab facility at the Kentland Farms in Blacksburg was chosen for

conducting field experiments. As the first step, flight operations were conducted in order to

acquire the latest information in the form of overhead images of the area.

19

20 Chapter 4. Results

The flight operations were carried out using a Holybro S-500 UAV, with a Raspberry Pi

computer, and a USB camera on-board. The Raspberry Pi was used to create a local wireless

network. The local ground station connected to the Raspberry Pi over the wireless network.

It was used to remotely login to the Raspberry Pi, and executed a python script to trigger

image acquisition using the USB camera. The local ground station also ran QGroundControl

(QGC) software, which communicated with the HAS via the telemetry radio. QGC was used

to upload, and start a pre-planned mission on the HAS, and monitor the vitals of the UAV

including the location, altitude, battery level, and progress of the mission during the flight.

The mission plan created, was aimed at covering the complete target area in order to capture

the overhead images. Coverage planning back-and-forth strategy or more commonly known

as raster scan, or ”lawnmower” scan, which is a kind of cellular decomposition coverage

strategy, was used to generate a flight path for this mission [5]. A flight path was generated

in QGC by specifying the extremities of the area to be covered. The parameters of this

flight path including the spacing between passes, and turning distances were set to ensure

the images obtained from the camera covered the complete region, and provided sufficient

overlap between images of adjacent regions. The image overlap was vital in ensuring the

quality of the map during the 3D-reconstruction step. The altitude of the UAV was kept

constant for the duration of the flight at 20m, and the spacing between two consecutive

passes was kept to 5m. These values ensured that a minimum recommended overlap (>60%)

was maintained. Using these parameters, a final path for the mission was generated as

displayed in Figure 4.1.

The flight operations were conducted using two different cameras - The Imaging Source (TIS)

USB 2.0 camera and a Sony RX0 camera. For the operations with TIS camera, the ground

station was connected to the on-board Raspberry Pi computer via its wireless network. A

remote login session was started into the Pi from the ground station before the take-off.

4.1. Phase-1 Flight Operations: HAS Flights 21

Figure 4.1: Coverage flight plan over the KEAS lab at the Kentland Farms

Once the HAS reached near the start position of the mission plan, the on-board camera was

triggered remotely to start image acquisition. This was done by remotely starting a python

script on the Raspberry Pi that used The Imaging Source SDK APIs to interact with the

camera. The python code was written to take an image every second, and save it on the Pi’s

SD card. When the HAS reached at the last waypoint in the mission plan, the script was

remotely closed. After completion of the mission, and landing of the HAS, the saved images

were transferred to the ground station through FTP file transfer. The images obtained from

this operation were not geotagged. A flowchart of this operation is displayed in Figure 4.2.

Another set of flights were conducted using the Sony RX0 camera instead of the TIS camera.

For these flights, the ground station was used to upload, control, and monitor the flight mis-

sion. The on-board Raspberry-Pi was not used during these flights. The Sony RX0 camera

was set to take images at regular intervals, before the beginning of the flight operation. The


images were saved in the camera’s SD card, that were transferred to the ground station after

the landing.

Figure 4.2: Flowchart displaying the process of Phase-1 flight operations

4.2 3D-Reconstruction

The second phase of flight operations of this system involved fling the Low Altitude In-

spector (LAI) in the environment, and visiting the areas of interests for closer inspection.

However, before proceeding to this stage, it was important to obtain detailed information

of the environment. A high-resolution 3D reconstruction of the environment provided the

necessary confidence for the low altitude flights. It was essential for locating, and avoiding

obstacles during flight missions.

Structure from Motion (SfM) is a popular technique for generating a 3D reconstruction of a

scene from image data. Structure from Motion attempts to reconstruct the scene geometry

using feature matching across multiple images. Since, the recent feature-matching algorithms

4.2. 3D-Reconstruction 23

are relatively more robust and less prone to errors due to inconsistent image acquisition, it’s

a highly sought method in cases of image-set obtained through UAVs [8]. The inconsistencies

in the image-set could be introduced from the location and altitude errors during flight due

to GPS, camera vibrations, etc. Region overlaps in the images helps this technique, in

overcoming these inconsistencies in the data set, by improving the feature-matching.

4.2.1 Geotagging images

Geotagged images are known to produce better results when used for 3D reconstruction, as

it is easier for feature matching and determining the relative position of the images. The

images obtained from the phase-1 of flight operations were, however, not geotagged. The

Imaging Source (TIS) camera used didn’t have an inbuilt GPS system. The image metadata,

however, does contain the timestamp of the time when image was first saved on the disk.

To improve the quality of 3D reconstruction, as a preprocessing step, the downsampled

images were geotagged using the flight log data obtained from the PixHawk flight logs.

Python code was written to find the GPS location of the UAV from the flight logs, at a time

corresponding to the timestamp of each image. These corresponding GPS coordinates were

then added to copies of the original images as a metadata. Using this process, an alternate

geotagged data-set was created.

4.2.2 Reconstructions

Commercial software packages Agisoft Metashape, and Pix4D were used to generate the 3D

reconstruction of the KEAS Lab environment. Before the reconstruction, the data-set was

downsampled by converting the resolution of each image from 16 MP to 1 MP. A python code

was written to resize each image of the data-set using the OpenCV libraries inbuilt function.


This was done to reduce the processing time of 3D reconstruction. Both the software were

fed the same down-sampled image data set for 3D-reconstruction. The results obtained from

both the software are discussed in the following subsections.

Agisoft Metashape Reconstruction

Reconstruction was first attempted with Agisoft Metashape on a 32GB RAM machine, with

non-geotagged images. It did not produce a usable 3D reconstruction for this dataset. Hence,

geotagged image dataset was used for reconstruction in Metashape.

The 3D reconstruction was attempted with different accuracy settings for the image align-

ment, and the dense point cloud, and the time taken for the process was observed. The

results are displayed in the Table 4.1.

Align Photos :Accuracy

Build dense cloud :Accuracy

Time taken(in hours)

High High 26High Low 15

Table 4.1: Time taken for 3D-reconstruction with Metashape for different options using thegeo-tagged image data-set

Figure 4.3 displays the point-cloud map of the Kentland farm after the 3D reconstruction

using high accuracy in photo alignment step, and high accuracy in Build dense cloud step. By

visually inspecting the point-cloud map generated by Metashape, it was found that several

points in the features of the scene were missing like walls of shed, and ground, and in some

cases the walls, and the roofs of the sheds were distorted or discontinuous. Due to these

errors, the point-cloud generated by Metashape wasn’t used in the subsequent steps of the

project.


Figure 4.3: Metashape reconstruction for geotagged downsampled image-set

Pix4D Reconstruction

The Pix4D Mapper software is another popular commercial software that has the capability

of generating a reconstruction of a scene using a set of images. This software was used to

generate a 3D reconstruction for two different image datasets. The first set of images were

the downsampled version of the images obtained from Sony RX0 flight missions over the

Kentland farms. These images were not geotagged. The second set of images were geotagged

downsampled images where the geotag metadata was added by the process described in the

section Geotagging images. The resulted 3D reconstruction is displayed in Figure 4.4. This

3D reconstruction was better in quality than the point cloud generated by Metashape, and

captured various features of Kentland Farm area with higher accuracy.

Experiments with different available ’point density’ option, in the Pix4D software, during the

reconstruction steps were conducted to see its impact on the quality of the reconstruction,

and the processing time taken. This was done using the geotagged image data set. The


Table 4.2 shows the comparison in time taken .

Point Density Time taken(in minutes)

High 35Low 25

Table 4.2: Time taken for 3D-reconstruction with Pix4D for different options for geotaggedimage data-set

Figure 4.4: Pix4D reconstruction for non geotagged downsampled image-set

An overhead mosaic image of the complete environment was generated by Pix4D Mapper.

The mosaic is shown in Figure 4.5.


Figure 4.5: Overhead mosaic image of the target environment located near KEAS Lab atthe Kentland Farms in Blacksburg. This image was generated from the 3D reconstructioncreated by the Pix4D Mapper


4.3 Image Analysis

Apart from using the data gathered in the first phase of flight operations, it was also used

to develop a better understanding of the environment using Computer vision techniques. A

detailed understanding of the scene could be vital for the planning stage in a search and

rescue operation. Once a mosaic image of the complete area was generated, it was analysed

to isolate or segment the contents of the scene.

Two different methods were used to segment the scene using the overhead images or the

mosaic generated from set of overhead images, and attempt was made to identify the roads

and shadows from the images. This solution, however, was not integrated into the final

system design. This was due to the lack of robustness of the solution. Following subsections

discuss the two methods used in further details.

4.3.1 Shadow and road detection

Color based segmentation is one of the popular methods for segmenting an image. Often

segmentation in Hue-Saturation-Value (HSV) color-space is preferred over segmentation us-

ing the Reg-Green-Blue (RGB) color-space. This is because human eyes are more sensitive

towards intensity of the light rather than color. HSV color-space resembles human vision

more closely than the RGB color-space [9]. In [9], and [19] the authors have used color

thresholding in the HSV color-space for segmenting images. A similar approach was used

for segmenting the overhead mosaic image obtained from the Pix4D point-cloud.

The roads, and shadows could be identified in the mosaic by converting it in the HSV color-

space, and then using thresholding. A python code was written using OpenCV functions to

read the mosaic image, and convert it to HSV color-space and segment the image based on

4.3. Image Analysis 29

the thresholds values. A binary image was generated as output that displayed pixels within

the range of thresholds as white, while rest of the pixels were black in color. This was done

to identify shadows, and roads in the mosaic image.

The shadows were identified by using threshold on the ’Value’ of pixels. The pixels that had

’Value’ between 5 and 50 in the HSV color-space were identified as shadow pixels. The result

obtained by using this technique are displayed in Figure 4.7. To identify pixels belonging

to the road, thresholds on Hue (between 100 and 160), Saturation (between 8 and 40), and

Value (between 100 and 160) were used. The output image identifying the roads is displayed

in Figure 4.6.

Although, this technique generated quick results which made it a good candidate for real-

time or on-line processing, its drawback was that the threshold values were not absolute

or objective. The threshold values were subject to lighting conditions of the day when the

images were taken, and the camera-lens system used for acquiring the images. Therefore,

the threshold values to identify shadows and roads in the above two cases discussed were

empirically deduced, and it cannot be expected to work with a different data set. Hence,

this technique was not robust enough to be included into the final system design.

4.3.2 Texture based segmentation

Another popular technique used for segmentation of aerial images is based on textures. In

[12], Texture is defined as ”repeating patterns of local variations in image intensity which are

too fine to be distinguished as separate objects at the observed resolution.” Mathematically,

textures in an image are represented by using feature descriptors that represent the local

variations in image intensity.

By using texture, it is possible to differentiate amongst the materials, even in a grayscale


Figure 4.6: Roads identified in mosaic Figure 4.7: Shadows identified in mosaic

image as shown in Figure 4.8. Textures in images are often indicative of a material’s prop-

erties, and serve as important cue that could be used to distinguish object, shapes, and

boundaries [13]. Image segmentation using texture is often done in two steps - first, analyse

or represent textures present in the image, and then subdivide the image into regions of

consistent textures.

To represent textures mathematically, a feature descriptor is used. Filters or filter-banks (a

set of filters) are applied to a neighbourhood of a pixel through convolution, and is used to

generate a response for each filter, for each pixel of an image. By using the statistics of filter

responses, like mean, and standard deviation, on each local window a feature descriptor of

local texture can be formed for each pixel. The feature descriptor has the same dimensionality

4.3. Image Analysis 31

Figure 4.8: A grayscale image showing four different materials. By looking at this image itis possible for humans to distinctly identify the materials even without color information.[13]

as the number of filters present in the filter bank. Textures can also be characterized by

forming a histogram of these responses to filters in a region of interest.

One of the many filter banks used for generation of feature descriptors for textures is Leung-

Malik (LM) filter bank as shown in Figure 4.9. This filter bank consists of 48 filters, which

are designed to isolate a variety of textures including texture patterns composed of edges,

bars, or spots with different orientations and sizes.

The image could be segmented using the feature descriptors. One of the ways to achieve this

is to assign pixels with similar feature descriptors to a common group. This can be done by


Figure 4.9: Figure showing the images of the 48 filters of the Leung-Malik (LM) filter bank[13]

using k-means Clustering algorithm over the multi-dimensional texture feature descriptors.

The clustering Algorithm groups features that are similar together into a cluster. One of

the disadvantages of this method, however, is that the number of groups, k, are usually not

known beforehand. The number of unique textures present in an image depends on the area

being imaged. In such cases a pre-defined number of clusters isn’t too effective. The results

of image segmentation using the k-means clustering are shown in Figure 4.10.

One of the drawbacks of this implementation of texture based segmentation method was that

the evaluation of textures using the LM filter bank, and segmentation of image based on it

using k-means clustering algorithm required more processing time than what was suitable

for a real time system. Apart from this, choosing a value for the number of clusters for the

k-means algorithm was difficult. The number of distinct textures would vary from image to

image, as each image captured a different area. The clustering didn’t provide an objective

method for segmenting an image. An unsupervised algorithm could be used to find the

optimal clusters before segmenting each image, however it’d be computationally expensive

and may require more processing time. Due to its inability to process images quickly, and

the need of an external parameter, this solution could not integrated in the system to be

4.4. On-line Image mosaicking 33

(a) Original Image 1 (b) Texture Segmentation of Image 1

(c) Original Image 2 (d) Texture segmentation of Image 2

Figure 4.10: Image showing the result of texture based segmentation that used LM filters togenerate the feature descriptors and k-means clustering for segmenting the pixels

used in the field experiments. However, the segmentation results are encouraging and this

method could be improved upon and be included into the system in the future.

4.4 On-line Image mosaicking

Image mosaicking is the process of joining multiple overlapping images of a scene, and

stitching them together to obtain a larger combined-image. This process typically involves

identifying common features between the images, evaluating the homography transformation

matrices, and warping the images by applying the transform to create a panoramic image.


During flight operations, the camera captures new images at regular time intervals as the

flight mission is executed. On-line image mosaicking is a technique to generate mosaic

progressively by integrating the new image, as it’s captured. This means generating mosaic

without having to wait for the complete set of images to be collected, but rather to integrate

each new image, into the mosaic image, as it is captured. An on-line image mosaicking

solution was created with the intent to assist the process of integration of the two phases

of flight operations. Although, due to lack of robustness this solution was not integrated

into the final system design, it featured some useful techniques for implementation of on-line

mosaicking. The steps involved in this solution included ORB feature identification, brute-

force feature matching, homography matrix evaluation for various parameters, selection of

the best homography matrix, and image warping for mosaicking. Following text reveals

further details of the implementation.

Two popular feature detection algorithms include Scale Invariant Feature Transform (SIFT),

and Oriented FAST and Rotated BRIEF (ORB) [3], [13]. While SIFT is considered a robust

algorithm for identifying keypoints, it’s computationally expensive [13]. The SIFT algorithm

is also patented which restricts its usage. ORB, on the other hand, is a non-patented

alternative algorithm that matches SIFT in performance, and exceeds it in processing time.

ORB uses a combination of FAST, for keypoint detection, and a modified version of BRIEF

descriptors to enhance the performance.

Using OpenCV library of Python, a program was written with the objective of generating a

mosaic from the overhead photos captured by the UAV on-line. The program attempted to

iteratively generate a mosaic image, where the inputs to the program were the mosaic from

previous iteration, and a new image. This process was repeated till the new input images

were provided. The code used ORB algorithm implementation of OpenCV for identifying

keypoints and generating BRIEF descriptors for the two images. OpenCV’s brute force


matcher was used to match the keypoints across the images.

The flight mission parameters such as spacing between each pass of the lawnmower scan,

overshoot distance, vehicle ground-speed etc. were set to ensure sufficient overlap in the

sequential images. Using this information, the feature matching was used to find match

the features of the new image, with only the features of existing mosaic, that belonged

to the region where the last image was integrated. This was done by keeping track of

the homography transformation of the previous step, and generating a mask. This mask

was a binary image with the size that of the existing mosaic image, in which white region

corresponded to the region of last image in the existing mosaic. The figure 4.11 shows the

last image isolated from the existing mosaic, and the corresponding mask. The figure 4.12

shows the feature matches drawn between a new image, and an existing mosaic. Notice that

the features in the mosaic belong only to the region where the last image was integrated.

This technique was helpful in reducing incorrect feature matches, and made the code sub-

stantially faster, as it restricted the features to a small region of existing mosaic instead of

the whole mosaic image area. This was especially vital as the mosaic image constantly grows

with each iteration, which could’ve drastically affected the performance if the feature match-

ing was carried out for the complete mosaic image. This technique, on the contrary, lead to

a near-constant processing time for each iteration in the mosaicking process, regardless of

the size of existing mosaic image.

For evaluation of homography matrices, OpenCV’s ’findHomography’ function was used. A

set of homography matrices were generated by using a combination of different optimization

methods provided by the OpenCV function, and number of matches to be considered for its

computation.From this set, the best homography matrix was chosen using two criterion -

least mean square projection error for the matches, and near-constant scaling. Since all the

images were taken from nearly the same altitude, the homography matrix could be expected


(a) (b)

Figure 4.11: The sub-image (a) shows the last image region isolated in the existing mosaic;Sub-image (b) shows Mask representing the last image in the existing mosaic

Figure 4.12: Matches detected between an existing mosaic and a new image


to have scaling factors closer to 1. Hence, any matrices that had high value of scaling

factors were rejected. Out of the remaining matrices the matrix with the least mean square

projection error for the matched features was selected. In the next step, this transformation

matrix was used to warp the new image and integrate it to the existing mosaic to form a

new mosaic image.

This program was capable of generating mosaics on-line. Creating a new mosaic by integrat-

ing a new image to the existing mosaic took approximately 0.3 second. The processing time

remained almost constant even as the size of existing mosaic grew. However, the mosaic

quality achieved by this method was not suitable for use in field operations. Figure 4.13

shows the mosaic generated from 25 images provided to this program sequentially. Hence,

this solution was not used in the final system design.

The results of this method were compared to the OpenCV image stitching functionality. The

OpenCV functionality worked well, when all the images to be stitched are provided at once

as shown in Figure 4.14. However, during the flight operations the images would be acquired

sequentially. Hence to simulate this condition, when the images were provided sequentially

to the OpenCV function, it produced worse quality mosaic image 4.15, before exiting with

errors.

Compared to the OpenCV functionality, the on-line image mosaicking solution developed

for this work produced better results. In terms of processing time, and the quality of mosaic

created for on-line mosaicking process, the solution was superior to the OpenCV function.

However, due to inconsistencies in the mosaic image, this solution lacked the robustness of

a finished product, and was not integrated into the final system design. The results are

summarized in the Table 4.3.


Figure 4.13: The image showing the mosaic generated from 25 images of the Kentland farmarea, passed sequentially, using the original code developed. The mosaic contains multipleinconsistencies including a discontinuous pole, and shed. Hence, this solution was not usedin the system.

OpenCV Image Stitcher New Solution

Off-line Image Mosaicking Capability Yes NoQuality High NA

On-line Image Mosaicking Capability No, exits with error YesQuality NA Medium

Table 4.3: Comparison of the capabilities of the OpenCV Image Stitcher and the on-lineimage mosaicking solution developed for this work.


Figure 4.14: The image showing the mosaic generated from 30 images of the Kentland farmarea using the OpenCV functionality when all the images are provided at once.

Figure 4.15: The mosaic generated by the OpenCV functionality for a set of images of theKentland farm, passed to it sequentially. The program exited with an error after processing15 images, and is unable to finish processing the 25 image set.


4.5 Occupancy Image

Occupancy grid map is a popular representation of an environment, and is commonly used

for path planning. An occupancy grid map divides the environment into multiple grids,

and each grid is associated with a value that represent the probability of that grid being

occupied. The occupancy grid carries information about the obstacles and free-grids of the

environment, which is crucial for path planning.

In the current system design, it was assumed that the environment was static, and the

locations of obstacles, and free areas of the scene was correctly captured in the overhead

mosaic image, and the 3D reconstruction created in the earlier steps. With this assumption,

the overhead mosaic image, and the reconstruction was used to generate a binary version of

occupancy grid map, where grids were represented by the pixels of the image, and the color

of each pixel represented the presence or absence of obstacle at the pixel. The probabilities

of occupancy in each grid or pixel were replaced with binary values, where a white-colored

pixel represented a pixel belonging to an obstacle while a black-colored pixel represented a

free area in the environment. Such a binary occupancy grid was used for the path planning.

To create a binary occupancy image, the point-cloud generated by Pix4D Mapper was used.

Since the flight altitude for LAI was predetermined, the point cloud map was used to de-

termine the points that belonged to objects that were high enough to be an obstacle for

the LAI. The Point cloud data was filtered based on the z-coordinate threshold. Any point

higher than the threshold was considered as an obstacle, while the points with z-coordinate

below the threshold were marked as unoccupied. The point of the point-cloud were then

projected on a horizontal plane, assigning the white-color to the points that were higher

than the threshold in the point-cloud, while assigning black-color to points lower than the

threshold. Through this method, an occupancy grid image was created as shown in Figure

4.6. Perimeter Contours 41

Figure 4.16: Occupancy image shows the sheds and structures taller than 1.5 m. Thesestructures are obstacles for the LAI flight.

4.16, that divided the map into regions of obstacles and non-obstacles.

4.6 Perimeter Contours

As discussed before, the target object could be either located in the open areas, which would

be visible from the HAS camera or it could be present in regions next to a structure or

under one of the equipment sheds at the farm, which may obscure it from the HAS camera.

If the object of interest is not present in the open areas, the structures and sheds become

areas of interests for a closer inspection by the LAI. Hence, the path planning algorithm


needed to favor the paths that allowed inspection of the sheds. For inspecting the sheds

and structures, a flight plan was needed that favoured flights along the perimeter of these

structures, allowing the LAI to inspect the contents of these structures.

To achieve this, contours enclosing the equipment sheds, and other structures present in

the environment were created. The contours are shown in the Figure 4.17. These contours

were returned by an OpenCV API that identified contours belonging to the obstacles in the

occupancy image. The contours obtained by this method were then optimised using the

Discrete Curve Evolution (DCE) technique. The Discrete Curve Evolution (DCE) technique

is used to simplify curves (such as contours) by removing contour segments that contribute

the least to the overall shape of the contour [13]. The simplified contours allowed for coverage

of the perimeter of the structures while optimizing the contour length.

Incorporating the optimized perimeter contours in the path planning allowed generating

optimal visitation plans that allowed closer inspection of the sheds, and structures while

visiting the areas of interest that were selected by the human operator.

4.7 Network Graph Map

Network graph is one of the ways of mathematically representing a discrete version of an

environment. A network graph consists of nodes, and edges connecting the nodes. The

nodes represents the locations/landmarks in an environment, and the edges between two

nodes represent a path between the locations. Each path has a weight associated with it,

which is interpreted as the cost of visitation from one node to another. This is also known

as Edge cost. The network graphs could have bi-directional or uni-directional edges. The

cost of visitation could also depend upon the direction of visitation.

4.7. Network Graph Map 43

Figure 4.17: Image displaying the contours generated enclosing the sheds and other structuresin the environment

To create a discretized version of network graph for the Kentland farm area, the perimeter

contours, discussed in Section Perimeter Contours, were used. The contour points of all the

contours were assigned as the nodes of the network graph. The position of the UAV, at the

beginning of the mission, is also added to this graph as a node.

4.7.1 Network Edges

The network edges for this graph can be divided into two categories - contour edges and

non-contour edges. The contour edges were are the segments that comprise of the contours


that were generated as the perimeter contours in the Section Perimeter Contours. These

edges however only connected the nodes that are part of the same contours.

Using Python code, all possible edges between each of the pair of nodes or contour points,

that were not connected by the contour edges, were generated. These non-contour edges

were created by drawing the edges between each pair of points while ensuring that these

edges do not intersect with obstacles identified in the occupancy image (Figure 4.16). If

an edge intersected with any of the obstacles on one or more points, the edge was rejected.

Hence, only the legal non-contour edges were generated that didn’t intersect any obstacle.

The Figure 4.18 displays all the edges generated for the graph. The edge cost of an edge in

the graph was proportional to the length of the edge itself, and independent of the traversal

direction. This incentivized the planner to generate shorter optimal paths of visitation for

the search operation.

4.7.2 Non-contour Edge-Cost Factor (ECF)

As discussed in the previous sections, to increase the efficiency of the search the flight path

of the LAI needed to follow the perimeter of sheds and other structures present in the scene.

This would allow a closer inspection of the contents of sheds which increases the possibility

of finding the target object. To assist this process, the flight paths needed to be biased to

include the contour edges generated in the Section Perimeter Contours. Hence, a concept of

non-contour edge-cost factor (ECF) was introduced. This was a constant number, that was

used as a multiplier to the cost of traversal along a non-contour edge. This discouraged the

planner from generating an optimal path that would consist only the non-contour edges. By

increasing the cost of traversal along these edges, the planner was penalized. It served the

purpose of biasing an optimal path generated by the planner to include contour-edges.

4.7. Network Graph Map 45

Figure 4.18: Image displaying the contour and non-contour edges of the graph created forthe environment

Experiments were conducted with different values of the non-contour edge cost factor for

the same set of visitation goals for the planner. Since, the edge cost of a non-contour edge

was multiplied by the ECF factor, for ECF values of greater than 1, the planner would

favor the contour edges, while for ECF values smaller than 1, the planner would favor the

non-contour edges. It was observed that for small values of the ECF, the planner tend to

generate shorter path that included edges which were small regardless of whether the edge

were contour edges or non-contour edges. However, for factor values of 3 and above, the

planner tend to generate slightly larger optimal path that included the contour edges more

often. Figure 4.19 shows the results of this experiment.


(a) Non-contour edge cost factor = 0.1 (b) Non-contour edge cost factor = 0.5

(c) Non-contour edge cost factor = 1.0 (d) Non-contour edge cost factor = 2.0

4.8. User Interface 47

(e) Non-contour edge cost factor = 3.0 (f) Non-contour edge cost factor = 5.0

Figure 4.19: Path generated by the path planner for various values of non-contour edge costfactor. For values of ECF greater than 1, the planner favors edges along contours, while forlower values the path favors non-contour edges

4.8 User Interface

Human operator is an important component in this system, and has two major roles to

play. The first of which, is to guide the search operation by identifying the areas of interest

for visitation in the environment. These areas of interest are then incorporated in the

path planner which generates a series of visitation waypoints for the LAI to gather detailed

information in the form of close-up video stream. This video stream is manually analysed

by the human operator which constitutes their second task in the system. This required


designing a user interface that allowed the human operator to communicate the areas of

interest with the path planner. The challenge was to develop an interface that’s easy to use

and effectively translates the areas of interest identified by the human into a format that

can be used by the path planner without much processing.

To allow the human-robot interaction a user-interface (UI) was developed in Python. The UI

displayed the overhead mosaic image and had the capability to allow user to draw rectangular

boxes over the image being displayed. Drawing boxes allowed the human operator in the

loop to, conveniently, mark the areas of interest in the mosaic image being displayed. The

UI allowed drawing multiple boxes, one at a time. When a new box was being created, the

previous box was cleared from the screen to avoid confusion.

When the operator presses the ’save visitation goals’ button, all these coordinates are written

in a text file on the memory. When the mission is executed, in the first step, this text file is

read to extract the rectangle coordinates, and these are passed to the path planner, which

uses these coordinates to generate an optimal visitation plan. Figure 4.20 displays the user-

interface developed.

4.9 Path planner

The path planner designed for this work is a two-step planner. It uses A-star planning algo-

rithm to find the optimal path in the first step, while a solution to the Travelling Salesman

problem is used to deduce optimal visitation order in the second step. These algorithms are

described in details in the following sections.

4.9. Path planner 49

Figure 4.20: The image shows UI displaying the overhead mosaic image with a rectangledrawn (in red) over an area of interest. At the top a ”Save visitation goals” button ispresent that saves the central coordinates of all the rectangles drawn by the human operatorin a text file on disc.


4.9.1 Dijkstra’s planning algorithm

Dijkstra’s algorithm is designed to find an optimal path between any two nodes in a dis-

cretized environment which may be represented as a network graph. It’s an iterative algo-

rithm which is complete, and optimal [20]. The completeness of the algorithm implies that

it either returns a solution, if it exists, or a declares failure in finite processing time. The

Optimality implies that the algorithm returns the most optimal solution for the problem.

The objective of this algorithm is to find a path from start vertex to a goal nodes that has

minimum cost of traversal. As discussed in 4.7 section, in a network graph each edge has an

associated cost. The cost of traversal of a path is the sum of cost of all the edges that are

part of the path. The most optimal path between the start and goal nodes has the minimum

cost of all possible paths between the two nodes.

Cost-to-come (V(xi)) of any node, i, in the network graph is defined as the optimal cost

to visit that node from the start node. The cost-to-come version of Dijkstra’s algorithm

maintains an Open list (O), initialized with the start node, that is a list of nodes to be

explored. The node in the open list are prioritized by their cost-to-come values. In every

iteration the node (xi) with lowest cost-to-come is removed from O, and all its neighbouring

nodes are added to the Open list (O). While doing so, the cost-to-come V(xj) for each of

the neighbouring nodes xj of xi is evaluated as V(xj) = C(xj, xi) + V(xi), where C(xj, xi)

is the edge-cost of the edge joining the nodes xj, and xi, and V(xi) is the cost-to-come for

the node xi. If the newly evaluated value V(xj is lower than the cost-to-come of xj then, it

is updated, and the node xi is set as the best neighbor for xj. Once this step is done, the

chosen node, xi, is moved from open list (O) to a closed list (C). This process is repeated

till the goal node is moved to the closed list (C). At the end of this process, the optimal

cost-to-come for the goal node is given by V(xgoal). By back-tracking the best neighbour of


Figure 4.21: The image shows the expansion as the Dikjstra’s algorithm as it finds theoptimal path from start node (green) to the goal node (red). [20]

the nodes, starting from goal node to the start node, the optimal path can be deduced. The

optimal path consists of only the vertices that are present in the closed list (C).

The Dikjstra’s algorithm has time complexity of O(|E| + |V|log|V|), where |E| is the number

of edges in the graph, while |V| represent the number of nodes. An example of Dikjstra’s

expansion from start node to goal node is shown in Figure 4.21. The green node in the figure

corresponds to the start node while the red node represent the goal node. The sub-figures

show the progress of the algorithm as it finds the optimal path from start node to goal node.

4.9.2 A* planning algorithm

A* algorithm is an extension of the Dikjstra’s algorithm that reduces the number of nodes

explored. This is done by incorporating a heuristic estimate of cost to get from current node

to the goal node, while prioritizing the nodes in the Open-list [14]. In Dikjstra’s algorithm,

the nodes are solely prioritized based on the cost-to-come value, however, in the A* algorithm

the nodes are prioritized by looking at both - the cost-to-come value of current node from

the start node, and the heuristic estimate of cost-to-go to the goal node.


Figure 4.22: The image shows the expansion as the A* algorithm as it finds the optimalpath from start node (green) to the goal node (red). [20]

Other than changing the criteria for prioritizing the nodes in Open list (O), the A* algorithm

works similar to the Dijkstra’s algorithm. The algorithm is however, optimal only if the

heuristic used is admissible. A heuristic estimate is considered admissible if it’s an under-

estimate of the actual minimum cost-to-go. The heuristic estimate of cost-to-go from a node

h(xi) must be less than or equal to the cost to go from the node to the goal node i.e. h(xi)

<= C(xi, xgoal).

An expansion of A* algorithm is displayed in Figure 4.22. The start node and goal nodes

are similar to that in case of Dijkstra’s algorithm shown in Figure 4.21. However, it can be

observed by looking at the two figures that the A* algorithm needs fewer expansions to reach

the goal node. In this case, the heuristic estimate used was the Euclidean distance between

a node and the goal node. Since, Euclidean distance is the smallest distance between two

nodes, this heuristic estimate is admissible.

4.9.3 Travelling Salesman problem

The objective of Travelling Salesman Problem (TSP) is to find a route for a travelling

salesman starting from a city, and visiting each goal city exactly once before returning back


to the same city where the travel started. The total number of possible tours are (n-1)!/2

for n number of cities, which makes it difficult to evaluate the cost of each tour and select

the one with minimum cost. To solve this problem functionality of existing Python library,

MLRose was used [2]. MLRose uses randomized optimization algorithm to solve the TSP

problem.

There are three steps involved in the solving the TSP using the MLRose functionality. First

is to define a fitness function object, second is to define an optimization problem object, and

lastly running the randomized optimization algorithm. The MLRose package offers APIs

to use genetic algorithms to solve the TSP. This method was used to solve the TSP for

determining visitation goals for the LAI.

4.9.4 Two-Step Planning

After the human operator marks the areas of interest on the UI and presses the ’Save

visitation goals’ button, the central pixel coordinates of these rectangular areas are saved on

the disc. These coordinates are then passed as arguments to the planner. As a first step, the

path planner computes the graph nodes (or contour points) in the network graph (discussed

in Section 4.7) that are nearest to each of these central pixel coordinates. These graph nodes

are used as the visitation goals by the planner. This helps the planner in generating an

optimal path that allows closer inspection of sheds that are present near the marked areas.

The path planner designed for this work is a two-step planner. In the first step, an optimal

path is found between each pair of the visitation goals in the network graph using A* star

algorithm with a Euclidean distance heuristic. In second step, an optimal visitation plan

to all the goal nodes is deduced using the optimal paths computer in the first step. The

visitation goals are considered as the nodes for the travelling salesman problem, while the


Figure 4.23: Tasks carried out by the two steps of the path planner

cost of visitation between any two nodes is derived from the first step of the path planning.

This problem is then converted into a format that is acceptable by the MLRose functionality

before solving the TSP. In this work, genetic algorithm with default parameters were used to

solve the TSP. The output of the second step is the optimal order of visitation of the goals.

Figure 4.23 shows the tasks carried out by the two steps of the planner.

The path planner combines the solution of the two steps described above to generate an

optimal path of visitation to the areas marked by the human operator for closer inspection.

The path is generated in the form of a sequence of pixel coordinates that comprise the contour

nodes present on the edges of the optimal path. These pixel coordinates are later converted

into GPS coordinates, which are then communicated to the LAI as waypoint visitation goals.

4.10 Pixel to GPS Transformation

The path planner generates the optimal path in the form a sequence of pixel coordinates

of the overhead mosaic image. These pixel coordinates need to be converted to the GPS

4.11. LAI Navigation 55

coordinates before it can be used by the Dronekit navigation code to guide the LAI to the

areas of interest. Another module was created to transform the pixel coordinates to the GPS

coordinates. For this, the GPS coordinates of the corner points of 5 different sheds in the

Kentland Farm area, were determined using the PixHawk GPS mounted on the UAVs. Then,

the pixels coordinates of the same corners were obtained form the overhead mosaic image.

The corners are indicated in the Figure 4.24. The Pixel coordinates and GPS coordinates

for these points are presented in the Table B.1 in Appendix Pixel to GPS transformations.

Using the in-built Python functionality a homography matrix was computed for these two

set of coordinates. The homography matrix allowed transforming pixel coordinates into

the GPS coordinates, and an inverse transform of it allowed reverse transformations. This

transformation was obtained using RANSAC method that reduced the errors, and computed

an optimal solution. The transformation matrix obtained through this process is present in

the Appendix B.

The transformation matrix was validated before the phase-2 flight operations were conducted.

This was done by generating GPS coordinates from pixel coordinates of some of the random

identifiable features in the mosaic, and comparing the results with the GPS location of these

features obtained using PixHawk GPS on-board the UAV.

4.11 LAI Navigation

Both the UAVs used in the field operations, the S-500 and the X-500, used PixHawk 4 which

communicates over the MavLink protocol. The MavLink connection is also used by the

QGroundControl, and the Mission Planner software to exchange data, upload mission plans,

and monitor the state of the UAV.


Figure 4.24: The image shows the reference points used to compute the transformationmatrix

4.12. Phase-2 Flight Operations: LAI Flights 57

Dronekit-python library is an open source library that provides high level API to communi-

cate with the drone over MavLink protocol [1]. It access to s functionalities allow querying

connected drone’s telemetry and state information. Using the Dronekit APIs also allow

mission management, and motion control of the connected vehicle.

Using Dronekit-Python APIs, a navigation module was developed to command the vehicle to

take-off, and visit a set of waypoints. The module was used to established the connection with

the vehicle, and set vehicle mode and parameters, and mission parameters. A function was

created which accepted a series of 2D GPS points (Latitude and Longitude), and a desired

altitude, and commanded the UAV to visit these GPS locations in order. The location of

the vehicle relative to the waypoint goal was tracked, and when the distance between the

two was below an acceptable threshold, the vehicle was sent a command to visit the next

waypoint in the list. When all the GPS points in the list were visited, the mission was

complete and a Return To Launch (RTL) command was issued to the vehicle.

During the flight operation, the vehicle mode was constantly checked. If the vehicle mode was

found to be anything other than Guided, the execution was paused for 20 seconds and vehicle

mode was rechecked. If the mode was not found to be Guided, the waypoint was skipped,

and next waypoint visitation command was issued. This was done to allow a human operator

to take over the control of the vehicle for a short duration if necessary. This was important

as due to GPS errors, barometric errors, or wind gusts the vehicle would occasionally lose

control and an external intervention was needed to set it back on right path.

4.12 Phase-2 Flight Operations: LAI Flights

The goal of the second phase of flight operations was to conduct a LAI flight as per the path

generated by the path planner based on the areas of interest marked by the human operator,


Figure 4.25: The image shows the process flow chart for the phase-2 flight operations

and receive a live video stream from the on-board camera to a ground station. A process

chart of the phase-2 flight operations is shown in Figure 4.25.

For this operation an X-500 UAV equipped with a front-facing TIS camera, and on-board

Raspberry Pi was used. The setup also included two local ground station, and a remote

ground station with a human operator.

The local ground station-1 ran the UI, as discussed in Section 4.8, and used MavProxy

software to allow the same telemetry radio port to be used in Mission Planner as well as

the by the Dronekit code for establishing connection with the LAI, and communicating the

waypoint commands over MavLink. The human operator remotely logged in to the ground

station-1, and marked areas of interest for closer inspection by drawing the rectangles over

the mosaic image being displayed on the UI. The image 4.26a shows the areas of interest

marked by the human operator for closer inspection. When the human operator, pressed the


”Save visitation Goals” button on the UI, a text file containing the central pixel coordinates

of all the rectangles drawn by the human operator were saved into a text file in the memory

of the local ground station-1.

After this a remote human operator, started a python script called ”Run_Mission” on the

local ground station-1. This python code read the pixel coordinates saved in the previous

step, and passed them as argument to the path planner. The path planner, first generated the

network graph using the occupancy grid map, and then triggered the two-step path planning

process. The path planning process generated, and returned an optimal plan of visitation in

terms of mosaic image pixel coordinate. The two-step path planner required 0.9 seconds of

processing time to generate the mission plan for 5 areas of interest selected by the human

operator (16 GB RAM, i7 processor). Once, the path planner finished executing, another

function was called to transform the pixel coordinates into GPS coordinates. The plan

generated by the planner is displayed in the Figure 4.26b. Once, the GPS coordinates were

obtained, the DroneKit code was called to communicate the sequence of GPS coordinates

as waypoints to the LAI. During the rest of the mission, the ground station-1 monitored the

vitals of the UAV and mission progress through the Mission Planner software. The Figure

4.26c shows the path followed by the LAI.

Meanwhile, the ground station-2 was connected to the camera on-board the LAI and streamed

the video, over a UDP connection through the wireless network of the on-board Raspberry

Pi. Both the Raspberry-Pi and ground station-2 used the GStreamer application for stream-

ing of the video. The Raspberry Pi executed a GStreamer pipeline that encoded the TIS

camera video, and packaged it before sending it over the UDP port of ground station-2. The

ground station-2 executed another pipeline that received the stream from the UDP port,

depackaged it, decoded it, and then displayed it on its screen. A lag of roughly 1-2 seconds

was observed in the video streaming, during the mission. The Figure 4.27a shows the UAV


(a) (b)

(c)

Figure 4.26: Figure (a) shows the areas of interest selected by the human operator for closerinspection. Figure (b) shows path planned by the two-step path planner while the Figure(c) shows the Mission Planner screenshot showing the LAI flight path.


flying during the mission in front a shed, and the Figure 4.27b shows the screenshot of the

video streamed by the on-board camera of the UAV to the ground station-2.

The LAI took-off, visited all the goals communicated to it by the Dronekit code, and landed

back to the starting position. During this the on-board Raspberry Pi streamed live video to

the local ground station-2 over a wireless network.

This flight operation successfully demonstrated the capability of the system to allow a human

user to select areas of interest for closer inspection using a user-interface, and remotely

trigger the path planning, and autonomous flight navigation of the UAV to follow an optimal

visitation plan. The video streamed at the local ground station-2 was observed by the human

operator to locate the target, and complete the search operation.


(a)

(b)

Figure 4.27: Figure (a) shows the LAI during the flight and Figure (b) shows the screenshotof video streamed by the on-board TIS camera to the ground station-2.

Chapter 5

Summary & Conclusions

This works discussed a cooperative system capable of performing search and rescue operation

in a partially known environment using multiple UAVs with human operator in the loop.

Two field experiments were conducted on a target area near the KEAS Lab on the Kentland

Farms, that demonstrated the capabilities of the system in performing a search and rescue

operation with multiple UAVs, and a human operator in the loop.

The first phase of flight operations was used to gather information about the latest state of

the environment. This data was used to generate or update representation of environments

like 3D reconstruction, mosaic image, occupancy image, and network graph. Contours were

drawn around the obstacles to facilitate their exploration. These contours served as the basis

of network graph, with contour vertices as the nodes of the graph. Non-contour edges, in

addition to the contour edges, were drawn to connect all the vertices without intersecting

any obstacles. A non-contour edge cost factor (ECF) was introduced in the network graph

which was a multiplier to the edge cost of non-contour edges. Using this factor to control

the cost of non-contour edges, the degree of exploration of the sheds could be controlled. A

high value of ECF, forced path planner to include more contour edges in the optimal plan,

while a low value of ECF lead to an optimal path which avoided contour edges, and in turn

shed exploration.

A User-interface was created that allowed human operator to work cooperatively with UAV,

and guide the search operation by marking the areas of interest. This was done by simply

63

64 Chapter 5. Summary & Conclusions

drawing rectangles over the areas of interest over the mosaic image being displayed by the UI.

Equipment sheds, and other structures were preferred sites of exploration. A two-step path

planner was designed to find an optimal mission plan to visit the multiple goals for closer

inspection. The system utilizes the human perception, and intuition, while maintaining the

autonomy and efficiency of the robots. This is an example of a system that is capable

of leveraging the competent abilities from both the humans and the robots. A navigation

module was developed that communicated the target visitation goals to the UAV, guiding

it to follow the optimal mission plan generated by the planner.

The second phase of flight operations demonstrated the cooperative search capability by

allowing a remote human operator to identify the areas of interest, and remotely commanding

the UAV to visit these areas autonomously to gather detailed information while following

an optimal plan of visitation. The live video streaming capabilities, using the on-board

camera system, from the UAV to the local ground station over the wireless network were

also successfully tested.

This system serves as an enabler for human-robot cooperation during the search operations.

This work, however, has not explored the possibility of using multiple human operators.

However, with some modifications this system could provide an important platform for

a multi-human and multi-robot cooperation. This could include scenarios where multiple

humans, and UAVs are actively involved in the search operation on the target environment,

while communicating, and sharing their observations for an efficient search.

The system, in it’s current state, does not operate in real time. However, this work sets

up a framework which could be enhanced to create a real-time cooperative system. One

of the major component to achieving this would be an efficient on-line image mosaicking

module. Upgrading the HAS to equip it with a stereo-camera systems could also provide

the ability to generate 3D maps of the environment in real-time. This could be used for

65

path planning of the LAI in 3D space, which would add to exploration capabilities of the

system. Upgrading or replacing the camera system to ensure more reliable SDK support

could be investigated. Based on the experience during field operations, the current camera

system, and it’s associated SDK was found prone to inconsistencies. A more reliable means

of communication between the on-board computer and the ground station could be explored.

It’ll likely improve the quality and latency of the video stream from the on-board camera

to the local ground station. This could involve using antennas to enhance the WiFi signal

of the Raspberry Pi. Setting up a high strength local WiFi using a WiFi router is another

alternative, however, given it would make the field experimentation less authentic as this

may not be possible during a search and rescue operation. This will also be helpful in scaling

up the system to be used for a larger area. When search area is increased, additional UAVs

can be added to the system. In such cases, a situation can arise where a UAV may not be able

to communicate with the ground station directly due to large distance between them. To

overcome this, the communication system could be designed to use UAVs as communication

array to allow communication between a remote UAV, and the ground station. The system’s

workflow, and the communication between its modules is unidirectional. This, however, can

be modified by using a Robot Operating System (ROS) based architecture, which will allow

modules to communicate through messages. This could be used to make the system more

flexible and allow pausing, and modifying the mission during the operation.

Bibliography

[1] Dronekit-python documentation. URL https://dronekit-python.readthedocs.io/

en/latest/about/overview.html.

[2] Mlrose documentation. URL https://mlrose.readthedocs.io/en/stable/source/

tutorial2.html.

[3] Opencv documentation. URL https://docs.opencv.org/3.4/d1/d89/tutorial_py_

orb.html.

[4] J. L. Burke and R. R. Murphy. Human-robot interaction in usar technical search: two

heads are better than one. In RO-MAN 2004. 13th IEEE International Workshop on

Robot and Human Interactive Communication (IEEE Catalog No.04TH8759), pages

307–312, 2004.

[5] Cabreira, T.M.; Brisolara, L.B.; Ferreira Jr., P.R. Survey on Coverage Path Planning

with Unmanned Aerial Vehicles. Drones, MDPI, Vol. 03 No. 01, 2019.

[6] M. Erdelj and E. Natalizio. Uav-assisted disaster management: Applications and open

issues. In 2016 International Conference on Computing, Networking and Communica-

tions (ICNC), pages 1–5, 2016.

[7] Dario Floreano and Robert J. Wood. Science, technology and the future of small

autonomous drones. Nature, 521(7553):460–466, May 2015. ISSN 1476-4687. doi:

10.1038/nature14542. URL https://doi.org/10.1038/nature14542.

[8] Francesco Mancini, Marco Dubbini, Mario Gattelli, Francesco Stecchi, Stefano Fabbri

and Giovanni Gabbianelli. Using Unmanned Aerial Vehicles (UAV) for High-Resolution

66

https://dronekit-python.readthedocs.io/en/latest/about/overview.html

https://dronekit-python.readthedocs.io/en/latest/about/overview.html

https://mlrose.readthedocs.io/en/stable/source/tutorial2.html

https://mlrose.readthedocs.io/en/stable/source/tutorial2.html

https://docs.opencv.org/3.4/d1/d89/tutorial_py_orb.html

https://docs.opencv.org/3.4/d1/d89/tutorial_py_orb.html

https://doi.org/10.1038/nature14542

BIBLIOGRAPHY 67

Reconstruction of Topography: The Structure from Motion Approach on Coastal En-

vironments . Remote Sensing, MDPI, Vol. 05 No. 12, 2013.

[9] P. Ganesan, V. Rajini, B. S. Sathish, and K. B. Shaik. Hsv color space based seg-

mentation of region of interest in satellite images. In 2014 International Conference

on Control, Instrumentation, Communication and Computational Technologies (ICCI-

CCT), pages 101–105, 2014.

[10] M. A. Goodrich, J. L. Cooper, J. A. Adams, C. Humphrey, R. Zeeman, and B. G.

Buss. Using a mini-uav to support wilderness search and rescue: Practices for human-

robot teaming. In 2007 IEEE International Workshop on Safety, Security and Rescue

Robotics, pages 1–6, 2007.

[11] Michael A. Goodrich, Bryan S. Morse, Damon Gerhardt, Joseph L. Cooper, Morgan

Quigley, Julie A. Adams, and Curtis Humphrey. Supporting wilderness search and

rescue using a camera-equipped mini uav. Journal of Field Robotics, 25(1‐2):89–110,

2008. doi: 10.1002/rob.20226. URL https://onlinelibrary.wiley.com/doi/abs/

10.1002/rob.20226.

[12] Ramesh Jain, Rangachar Kasturi, and Brian Schunck. Machine Vision. 01 1995. ISBN

978-0-07-032018-5.

[13] Dr. Creed Jones. Lecture notes in computer vision, Fall 2019.

[14] Steven M. LaValle. Planning Algorithms. Cambridge University Press, 2006.

[15] Sven Mayer, Lars Lischke, and Pawel W. Woźniak. Drones for Search and Res-

cue. In 1st International Workshop on Human-Drone Interaction, Glasgow, United

Kingdom, May 2019. Ecole Nationale de l’Aviation Civile [ENAC]. URL https:

//hal.archives-ouvertes.fr/hal-02128385.

https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.20226

https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.20226

https://hal.archives-ouvertes.fr/hal-02128385

https://hal.archives-ouvertes.fr/hal-02128385

68 BIBLIOGRAPHY

[16] Agoston Restas. Drone Applications for Supporting Disaster Management. World Jour-

nal of Engineering and Technology, Vol.03No.03:6, 2015. URL 10.4236/wjet.2015.

33C047.

[17] Jürgen Scherer, Saeed Yahyanejad, Samira Hayat, Evsen Yanmaz, Torsten Andre, Asif

Khan, Vladimir Vukadinovic, Christian Bettstetter, Hermann Hellwagner, and Bern-

hard Rinner. An autonomous multi-uav system for search and rescue. In Proceedings

of the First Workshop on Micro Aerial Vehicle Networks, Systems, and Applications

for Civilian Use, DroNet ’15, page 33–38, New York, NY, USA, 2015. Association for

Computing Machinery. ISBN 9781450335010. doi: 10.1145/2750675.2750683. URL

https://doi.org/10.1145/2750675.2750683.

[18] Mario Silvagni, Andrea Tonoli, Enrico Zenerino, and Marcello Chiaberge. Multipurpose

uav for search and rescue operations in mountain avalanche events. Geomatics, Natural

Hazards and Risk, 8(1):18–33, 2017. doi: 10.1080/19475705.2016.1238852. URL https:

//doi.org/10.1080/19475705.2016.1238852.

[19] S. Sural, Gang Qian, and S. Pramanik. Segmentation and histogram generation using

the hsv color space for image retrieval. In Proceedings. International Conference on

Image Processing, volume 2, pages II–II, 2002.

[20] Dr. Pratap Tokekar. Lecture notes in advanced robot motion planning, Fall 2018.

[21] T. Tomic, K. Schmid, P. Lutz, A. Domel, M. Kassecker, E. Mair, I. L. Grixa, F. Ruess,

M. Suppa, and D. Burschka. Toward a fully autonomous uav: Research platform for

indoor and outdoor urban search and rescue. IEEE Robotics Automation Magazine, 19

(3):46–56, 2012.

10.4236/wjet.2015.33C047

10.4236/wjet.2015.33C047

https://doi.org/10.1145/2750675.2750683

https://doi.org/10.1080/19475705.2016.1238852

https://doi.org/10.1080/19475705.2016.1238852

Appendices

69

Appendix A

Path Planned for different set of

visitation goals

The following figures display the optimal path generated by the path planner for set of

randomly chosen visitation goals in the mosaic image. Each subfigure display a step of the

visitation plan, which is an optimal path between two consecutive visitation goals. The paths

are drawn in green color over the mosaic image, while the target locations for visitations are

identified by the small red-circles. The blue circles, next to each red-circle, indicate the

closest contour vertex to the corresponding visitation goals. Each visitation goal has an

associated closest contour vertex.

The optimal visitation plan starts, and ends at the same location while visiting each of the

goal locations only once. These paths were generated for ECF value of 3, and goes on to

show that the path generated does not intersect with any of the obstacles (equipment-sheds

and structures) in the scene.

The figures A.1 and A.2 display the optimal visitation plan generated for two different set

of visitation goals.

70

71

(a) Step-1 (b) Step-2

(c) Step-3 (d) Step-4

72 Appendix A. Path Planned for different set of visitation goals

(e) Step-5 (f) Step-6

Figure A.1: Path generated by the path planner for a random set of visitation goals -1

73

(a) Step-1 (b) Step-2

(c) Step-3 (d) Step-4

74 Appendix A. Path Planned for different set of visitation goals

(e) Step-5 (f) Step-6

Figure A.2: Path generated by the path planner for a random set of visitation goals -2

Appendix B

Pixel to GPS transformations

B.1 Data values

The following data shows the pixel coordinates and GPS coordinates for various reference

locations displayed in Figure 4.24.

Point Pixel Coordinates GPS CoordinatesA [214, 1286] [37.1971799, -80.5790688]B [294, 1351] [37.1972791, -80.5791855]C [555, 1023] [37.1969814, -80.5795444]D [474, 959] [37.1968928, -80.5794747]E [254, 803] [37.1967096, -80.5794233]F NA NAG [307, 473] [37.1963707, -80.5793546]H [436, 584] [37.1965260, -80.5795215]I [515, 416] [37.1963677 , -80.5793674]J [561, 353] [37.1963524 , -80.5796696]K [639, 410] [37.1963836 , -80.5797568]L [590, 476] [37.1964408 , -80.5797005]M [731, 598] [37.1965692 , -80.5798473]N [795, 631] [37.1966120 , -80.5799179]O [767, 686] [37.1966660 , -80.5798896]P [700, 651] [37.1966153 , -80.5798234]Q [700, 702] [37.1966526 , -80.5798036]R [764, 739] [37.1967054 , -80.5798775]S [727, 801] [37.1967735 , -80.5798201]T [664, 766] [37.1967258 , -80.5797420]

Table B.1: The table shows the data for the Pixel coordinates and GPS coordinates for thereference points

75

76 Appendix B. Pixel to GPS transformations

B.2 Homography matrix

The Homography matrix obtained from the above process is displayed below.

Homography Matrix =

−5.53684599e− 03 −4.68124529e− 03 3.71960092e+ 01

1.19938171e− 02 1.01428956e− 02 −8.05791246e+ 01

1.48856639e− 04 −1.25873081e− 04 1.00000000e+ 00

cooperative human-robot search in a partially-known ......cooperative human-robot search in a...

Documents