cooperative human-robot search in a partially-known ......cooperative human-robot search in a...
TRANSCRIPT
Cooperative human-robot search in a partially-known environmentusing multiple UAVs
Shivam B. Chourey
Thesis submitted to the Faculty of the
Virginia Polytechnic Institute and State University
in partial fulfillment of the requirements for the degree of
Master of Science
in
Computer Engineering
Kevin B. Kochersberger, Chair
Creed F. Jones III
Ryan K. Williams
August 13, 2020
Blacksburg, Virginia
Keywords: UAVs, Cooperative search, Path planning, Human-Robot cooperation
Copyright 2020, Shivam B. Chourey
Cooperative human-robot search in a partially-known environmentusing multiple UAVs
Shivam B. Chourey
(ABSTRACT)
This thesis details out a system developed with objective of conducting cooperative search
operation in a partially-known environment, with a human operator, and two Unmanned
Aerial Vehicles (UAVs) with nadir, and front on-board cameras. The system uses two phases
of flight operations, where the first phase is aimed at gathering latest overhead images of the
environment using a UAV’s nadir camera. These images are used to generate and update
representations of the environment including 3D reconstruction, mosaic image, occupancy
image, and a network graph. During the second phase of flight operations, a human operator
marks multiple areas of interest for closer inspection on the mosaic generated in previous step,
displayed via a UI. These areas are used by the path planner as visitation goals. The two-step
path planner, which uses network graph, utilizes the weighted-A* planning, and Travelling
Salesman Problem’s solution to compute an optimal visitation plan. This visitation plan is
then converted into Mission waypoints for a second UAV, and are communicated through a
navigation module over a MavLink connection. A UAV flying at low altitude, executes the
mission plan, and streams a live video from its front-facing camera to a ground station over
a wireless network. The human operator views the video on the ground station, and uses it
to locate the target object, culminating the mission.
Cooperative human-robot search in a partially-known environmentusing multiple UAVs
Shivam B. Chourey
(GENERAL AUDIENCE ABSTRACT)
This thesis details out the work focused on developing a system capable of conducting search
operation in an environment where prior information has been rendered outdated, while
allowing human operator, and multiple robots to cooperate for the search. The system
operation is divided into two phases of flight operations, where the first operation focuses
on gathering the current information using a camera equipped unmanned aircraft, while the
second phase involves utilizing the human operator’s instinct to select areas of interest for a
close inspection. It is followed by a flight operation using a second unmanned aircraft aimed
at visiting the selected areas and gathering detailed information. The system utilizes the
data acquired through first phase, and generates a detailed map of the target environment.
In the second phase of flight operations, a human uses the detailed map, and marks the areas
of interest by drawing over the map. This allows the human operator to guide the search
operation. The path planner generates an optimal plan of visitation which is executed by
the second unmanned aircraft. The aircraft streams a live video to a ground station over
a wireless network, which is used by the human operator for detecting the target object’s
location, concluding the search operation.
Dedication
Dedicated to my Parents - who sacrificed their time, money, health, and retirement for my
dreams.
iv
Acknowledgments
I’d like to thank my advisor Dr. Kevin Kochersberger for giving me the opportunity to join
the Unmanned Systems Lab, and be part of a motivated research group. His continuous
support and guidance enabled me to learn and improve. I’d also like to thank the committee
members, Dr. Ryan Williams, and Dr. Creed Jones for their constant support.
I’d like to thank the lab members - Manav, Felipe, and Avery for their valuable support
in this project, and assisting with the numerous flight operations. I’d also like to thank the
former lab-tech Drew.
At Virginia Tech, I had been part of two great departments - ME, and ECE. I wish to
thank the HODs, and the department staff of both the departments for their incredible sup-
port and guidance.
Finally I’d like to thank my family, and friends- who made it all possible.
v
Contents
List of Figures ix
List of Tables xiii
1 Introduction 1
2 Review of Literature 3
2.1 Autonomous UAV systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Human-operated UAV systems . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Cooperative Human-UAV systems . . . . . . . . . . . . . . . . . . . . . . . 6
3 System Design 7
3.1 Target Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 UAVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.2 Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.3 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.4 Mounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
vi
3.3.1 Phase-1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.2 Phase-2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.3 High Altitude Surveyor (HAS) . . . . . . . . . . . . . . . . . . . . . 15
3.3.4 Low Altitude Inspector (LAI) . . . . . . . . . . . . . . . . . . . . . . 16
3.3.5 Local Ground Stations . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.6 Human-in-the-loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Results 19
4.1 Phase-1 Flight Operations: HAS Flights . . . . . . . . . . . . . . . . . . . . 19
4.2 3D-Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.1 Geotagging images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.2 Reconstructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.1 Shadow and road detection . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.2 Texture based segmentation . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 On-line Image mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Occupancy Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.6 Perimeter Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.7 Network Graph Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.7.1 Network Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
vii
4.7.2 Non-contour Edge-Cost Factor (ECF) . . . . . . . . . . . . . . . . . 44
4.8 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.9 Path planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.9.1 Dijkstra’s planning algorithm . . . . . . . . . . . . . . . . . . . . . . 50
4.9.2 A* planning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.9.3 Travelling Salesman problem . . . . . . . . . . . . . . . . . . . . . . 52
4.9.4 Two-Step Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.10 Pixel to GPS Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.11 LAI Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.12 Phase-2 Flight Operations: LAI Flights . . . . . . . . . . . . . . . . . . . . . 57
5 Summary & Conclusions 63
Bibliography 66
Appendices 69
Appendix A Path Planned for different set of visitation goals 70
Appendix B Pixel to GPS transformations 75
B.1 Data values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
B.2 Homography matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
viii
List of Figures
3.1 S-500 with raspberry-pi, and nadir camera . . . . . . . . . . . . . . . . . . . 9
3.2 X-500 with front-facing camera . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 The Imaging Source USB 2.0 camera mounted on the Quadcopter . . . . . . 12
3.4 The Raspberry-Pi, and the front-facing TIS camera mounted on the X-500
using a mount with vibration isolators . . . . . . . . . . . . . . . . . . . . . 13
3.5 Phase-1 of the system involved capturing images of the environment by the
HAS, which was followed by using the data-set for creating scene representa-
tions. The HAS executed a pre-planned flight mission to fly over the target
environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.6 The tasks involved in Phase-2 of the system . . . . . . . . . . . . . . . . . . 15
4.1 Coverage flight plan over the KEAS lab at the Kentland Farms . . . . . . . 21
4.2 Flowchart displaying the process of Phase-1 flight operations . . . . . . . . . 22
4.3 Metashape reconstruction for geotagged downsampled image-set . . . . . . . 25
4.4 Pix4D reconstruction for non geotagged downsampled image-set . . . . . . . 26
4.5 Overhead mosaic image of the target environment located near KEAS Lab at
the Kentland Farms in Blacksburg. This image was generated from the 3D
reconstruction created by the Pix4D Mapper . . . . . . . . . . . . . . . . . . 27
4.6 Roads identified in mosaic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
ix
4.7 Shadows identified in mosaic . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.8 A grayscale image showing four different materials. By looking at this image
it is possible for humans to distinctly identify the materials even without color
information. [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.9 Figure showing the images of the 48 filters of the Leung-Malik (LM) filter
bank [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.10 Image showing the result of texture based segmentation that used LM filters
to generate the feature descriptors and k-means clustering for segmenting the
pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.11 The sub-image (a) shows the last image region isolated in the existing mosaic;
Sub-image (b) shows Mask representing the last image in the existing mosaic 36
4.12 Matches detected between an existing mosaic and a new image . . . . . . . . 36
4.13 The image showing the mosaic generated from 25 images of the Kentland
farm area, passed sequentially, using the original code developed. The mosaic
contains multiple inconsistencies including a discontinuous pole, and shed.
Hence, this solution was not used in the system. . . . . . . . . . . . . . . . . 38
4.14 The image showing the mosaic generated from 30 images of the Kentland
farm area using the OpenCV functionality when all the images are provided
at once. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.15 The mosaic generated by the OpenCV functionality for a set of images of the
Kentland farm, passed to it sequentially. The program exited with an error
after processing 15 images, and is unable to finish processing the 25 image set. 39
x
4.16 Occupancy image shows the sheds and structures taller than 1.5 m. These
structures are obstacles for the LAI flight. . . . . . . . . . . . . . . . . . . . 41
4.17 Image displaying the contours generated enclosing the sheds and other struc-
tures in the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.18 Image displaying the contour and non-contour edges of the graph created for
the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.19 Path generated by the path planner for various values of non-contour edge
cost factor. For values of ECF greater than 1, the planner favors edges along
contours, while for lower values the path favors non-contour edges . . . . . . 47
4.20 The image shows UI displaying the overhead mosaic image with a rectangle
drawn (in red) over an area of interest. At the top a ”Save visitation goals”
button is present that saves the central coordinates of all the rectangles drawn
by the human operator in a text file on disc. . . . . . . . . . . . . . . . . . . 49
4.21 The image shows the expansion as the Dikjstra’s algorithm as it finds the
optimal path from start node (green) to the goal node (red). [20] . . . . . . 51
4.22 The image shows the expansion as the A* algorithm as it finds the optimal
path from start node (green) to the goal node (red). [20] . . . . . . . . . . . 52
4.23 Tasks carried out by the two steps of the path planner . . . . . . . . . . . . 54
4.24 The image shows the reference points used to compute the transformation
matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.25 The image shows the process flow chart for the phase-2 flight operations . . 58
xi
4.26 Figure (a) shows the areas of interest selected by the human operator for closer
inspection. Figure (b) shows path planned by the two-step path planner while
the Figure (c) shows the Mission Planner screenshot showing the LAI flight
path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.27 Figure (a) shows the LAI during the flight and Figure (b) shows the screenshot
of video streamed by the on-board TIS camera to the ground station-2. . . . 62
A.1 Path generated by the path planner for a random set of visitation goals -1 . 72
A.2 Path generated by the path planner for a random set of visitation goals -2 . 74
xii
List of Tables
4.1 Time taken for 3D-reconstruction with Metashape for different options using
the geo-tagged image data-set . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Time taken for 3D-reconstruction with Pix4D for different options for geo-
tagged image data-set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Comparison of the capabilities of the OpenCV Image Stitcher and the on-line
image mosaicking solution developed for this work. . . . . . . . . . . . . . . 38
B.1 The table shows the data for the Pixel coordinates and GPS coordinates for
the reference points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
xiii
List of Abbreviations
HAS High Altitude Surveyor
KEAS Kentland Experimental Aviation System
LAI Low Altitude Inspector
QGC QGroundControl
ROS Robot Operating System
SfM Structure from Motion
UAV Unmanned Aerial Vehicle
xiv
Chapter 1
Introduction
The Unmanned Aerial Vehicles (UAVs), popularly known as drones, have seen a continuous
rise in their utility in various research, and commercial applications, in the recent years.
Reduction in cost of components, portable electronics, lighter designs, and longer flight times
have been some of the most important contributing factors to their fame [7]. The UAVs are
capable of carrying application specific payloads which makes them a valuable asset in a wide
range of applications. While traditionally UAVs were used primarily for reconnaissance, and
surveillance applications by the military, recent trends indicate the growing use of UAVs in
the domains of research, and commercial applications.
One of the many such research applications is disaster management assistance [6], [16]. The
UAVs are utilized at various stages of disaster management- from monitoring, and scouting to
search and rescue operations. Search and rescue scenarios involve gathering information by
surveying an area to identify the target location, and assess the environment before planning
a rescue operation. The ability of drones to cover large areas in short time [15], and fly over
obstacles to provide information, otherwise inaccessible, makes them indispensable in such
operations. Often such operations involve locating and extracting humans from situations
like debris of buildings, flooded areas, wildfires, deep forests etc. Disasters typically modify
the structure of an environment, and any prior information like floor plans, images, or maps
are rendered outdated or inaccurate. Hence, the prior information cannot be relied upon for
conducting rescue operations, and information about the current state of the environment
1
2 Chapter 1. Introduction
should be gathered before planning a search or rescue operation.
In search operations the UAV control could be autonomous, human-assisted, or cooperative.
The search environments are often unstructured which reduces the efficiency of autonomous
systems. The search and rescue operations are time critical, and since the flight-time of UAVs
is currently limited, it is important to develop optimal strategies for UAV-assisted search and
rescue operations. The author in [6] noted that the autonomous scouting of a disaster area
using the self-learning techniques may not be optimal, and has advocated human guidance
for these operations. Currently, humans are more suitable for versatile tasks than automated
learning algorithms which often tend to be specific to a singular application. Humans are
more adept at contingencies which are common occurrence during disaster scenarios. In such
cases, an application specific program may not be sufficient. Hence, a cooperative framework
is needed where the human operator, and the UAV(s) can work together to fulfill objectives
of a search and rescue operation in an environment that is partially known.
This work focuses on developing a system that is capable of performing search operation in
a partially known environment while allowing UAVs and human operator to work together
efficiently. This involves using one UAV for surveying the environment to gather latest en-
vironmental features, a human operator to identify areas of interest within the environment,
and another UAV for visiting these areas of interest for a closer inspection.
Chapter 2
Review of Literature
As discussed in the Introduction section, the UAVs are being increasingly used in a wide
range of research applications, one of them being Search and rescue operations. A growing
number of research groups have worked in this area and published their findings. Different
research groups have adopted different strategies of utilizing the vehicles during these oper-
ations. Based on the level of autonomy of the UAVs during the operations, these search and
rescue systems could be broadly divided into three categories - Autonomous systems, Manual
systems, and Cooperative systems. In the case of Autonomous systems the path planning,
navigation, and target-detection are carried out autonomously by systems on-board the air-
craft without human intervention. The second group of researchers have used a human
operator to control or assist path planning, and navigation. In these systems, the human
operator is also responsible for the target detection. The third category of researchers have
used UAVs for search operations in collaboration with humans. In these systems human op-
erator, and autonomous systems may be involved in mission planning, navigation, or target
detection stage. The following subsections discuss these approaches in further details.
2.1 Autonomous UAV systems
Various research groups have worked on development of a completely autonomous system
for performing the search and rescue operations using UAVs without human intervention.
3
4 Chapter 2. Review of Literature
In [21], the researchers attempted to create a fully autonomous system using a quadcopter
UAV for urban search and rescue missions. The group focused on developing a system
with high on-board processing capacity, and modular, and flexible sensing and planning
capabilities. The UAV is equipped with a nadir stereo camera system, front-facing camera,
upward facing camera, and a laser scanner which are used in combination for odometry,
depending on lighting and other factors. This is done to add flight capabilities in indoor as
well as outdoor environments. The stereo camera system is also used for target identification
in the search mission. The autonomous system relies upon recognition, and action modules
for exploring the environment and finding the target. In this system while all the processing
is done on-board the aircraft, it lacks the object avoidance capabilities. In [18] the research
group used a UAV equipped with thermal/visual cameras, and avalanche beacon sensors for
autonomous search and rescue during avalanches. The research group developed autonomous
terrain following capabilities for the drone, using a laser distance sensor. The UAV flies
forward, and backward, and changes direction as needed in order to acquire the maximum
beacon signal strength. The flight is guided by beacon sensor signal and ends at the signal
acquisition. The aircraft performs automated landing using the altitude from the laser
distance sensor. The system is capable of conducting autonomous flights in order to perform
search and rescue mission.
In [21], and [18], the research groups have developed systems capable of performing au-
tonomous flights for the search and rescue mission with a single UAV. From the take-off to
target detection, all the activities are carried out by autonomous systems without any human
intervention. Both the research groups used quadcopter UAVs. While [21] used autonomous
vision based systems to identify target, [18] used the beacon signal strength to locate its
goal.
2.2. Human-operated UAV systems 5
2.2 Human-operated UAV systems
Contrary to the previous section 2.1 where the UAVs are equipped with autonomous map-
ping, planning, navigation, and target detection systems for performing the search and rescue
operations, another group of researchers at the Brigham Young University (BYU), have used
human guided UAVs such operations. In [11], and [10], the authors have discussed systems
for UAV assisted wilderness search and rescue missions. These operations typically involve
locating a missing person or their signs in the wilderness or deep forests. The system de-
veloped for this operation used a single fixed-wing UAV for the flight operations. The UAV
was equipped with an on-board nadir camera, and is used to gather signs of the missing
person. The path planning, and flight operations are assisted by a human operator. The
human operator guides the vehicle through a series of waypoints over the areas of interest
to gather information during the flight missions. The images, and videos acquired using the
nadir camera are analysed by a human for detecting the missing person or signs of one. This
search operation is driven by human operators, from planning, and navigation to the target
detection.
While the autonomous systems face difficulties in unstructured environment, having a human
operator limits the speed of operations. A trade-off can be made in order to bring the best
of both worlds together, and develop cooperative human-robot systems. In [6] the author
concedes that the autonomous scouting of a disaster area using a UAV is not optimal. In
[4], the author has discussed that effective human-robot interaction can improve the search
and rescue operations resulting in increase in the success rate.
Humans are currently more efficient than autonomous systems at assessing wide variety of
situations, handling contingencies and using environmental cues for decision making. Util-
ising human operator for these tasks in a cooperative system could prove more efficient than
6 Chapter 2. Review of Literature
both - the fully autonomous systems, as well as the human operated UAV systems.
2.3 Cooperative Human-UAV systems
In [17] the authors have developed a multi-UAV autonomous system for search and rescue
operations. For conducting field experimentation, the authors had used 4 UAVs. Two of these
were equipped with Matrixvision cameras and Mastermind processors, while the other two
UAVs were equipped with Logitech webcams and Atom processors. All the UAVs featured
nadir or downward-facing cameras. The system operated on Robot Operating System (ROS)
and streamed videos from all the cameras were sent to the observer stations on the ground
during the flight,using the wireless communication infrastructure developed. The flight paths
for the UAVs were pre-planned to optimise the search time, and flight plan were sent to the
each of the UAVs as waypoints, through wireless communication. The search objective for
the on-board visual detection modules were to detect a person with red-jacket. In this
system, the path planning was manual, but the navigation, and target detection activities
were carried out by autonomous systems.
These examples display the versatile approaches to the UAV involvement in search and rescue
operations. While the autonomous systems perform the operations ranging from take-off to
target detection without human intervention, the manual systems involve a human operator
at every stage of the operation. The cooperative systems divide the tasks between human,
and autonomous systems leveraging the human intuition, and efficiency of autonomous sys-
tems. The cooperative systems have the potential to perform more efficiently than manual
systems, while being more flexible, and versatile than the autonomous systems.
Chapter 3
System Design
3.1 Target Environment
The goal of this work is to develop a cooperative system capable of searching for an object of
interest in a partially-known environment efficiently, while allowing a human operator and
UAVs to work together in cooperation. The cooperation between a human operator, and the
UAV in this system allows it to leverage the intuitive abilities of humans during planning,
and the efficiency of autonomous systems during the flight operations. This makes for an
efficient search and rescue system.
The KEAS lab testing facility at the Kentland Farms in Blacksburg was selected for the field
experimentation. It consisted of disparate features including open grassy areas, gravel roads,
equipment sheds of different shapes and sizes, and agricultural containers. The equipment
sheds were of two varieties open sheds and closed sheds. In total, the target area consists of
4 sheds and 2 metallic agricultural storage containers. The environment and its feature were
assumed to be static for this work. For this problem area, the preferred areas of interest for
closer inspection included the equipment sheds and the containers, as they could house the
object of interest concealing it from an overhead camera.
7
8 Chapter 3. System Design
3.2 Hardware
The following section discusses the hardware components used for conducting the two-phases
of the flight operations.
3.2.1 UAVs
Multi-rotor UAVs are lightweight and cheaper, which has made them a popular choice among
researchers. These UAV are also capable of vertical take-off and landing (VTOL), which
makes them useful in environments with limited flight space. One of the disadvantages of the
multi-rotors is their low endurance or flight time. Among, multi-rotor UAVs the quadcopters
and hexcopters are widely used in research, and commercial application. Quadcopters are
generally smaller in size, and lighter than Hexcopters. For the field experimentation of this
project, a UAV with light payload capacity and a flight time of 10 minutes was needed.
Hence, a decision was made to use the Quadcopters for both High Altitude Surveyor (HAS),
and Low Altitude Inspector (LAI) to fly the field missions.
Two different quadcopter UAVs were used for the field operations of this project. For the first
phase of operations, a Holybro S-500 quadcopter was used as the HAS. S-500 is lightweight
at 1.3 Kg, and came with a telemetry radio, assembled power management board with ESCs,
PixHawk 4 autopilot, and GPS system. With dimensions of 383 * 385 * 240 mm, it’s a small
UAV that provided 12 minutes of flight time, without payload, with the 5000 mAh Lipo
batteries. Figure 3.1 shows the Holybro S-500, with the Raspberry Pi and nadir facing TIS
USB camera mounted on it.
For the second phase a flight operations, a HolyBro X-500 quadcopter was selected as the
LAI. The X-500 quadcopter is an improved version of the S-500. The X-500 featured carbon-
3.2. Hardware 9
Figure 3.1: S-500 with raspberry-pi, and nadir camera
fibre arms, unlike the plastic arms of S-500. This resulted in a increased durability, and a
lighter aircraft at 0.98 Kg. It also included an updated version of power management unit
from Holybro and was slightly larger with dimensions of 410 * 410 * 300 mm. With a similar
battery, this UAV provided an additional 2-3 minutes of flight time compared to the S-500.
This Quadcopter version was unavailable during the first phase flights. The X-500 equipped
with a front facing camera TIS USB camera and Raspberry Pi is displayed in Figure 3.2.
3.2.2 Raspberry Pi
Raspberry Pi is a widely used single board computer in the field of robotics research. The
Raspberry Pi is light-weight, portable, relatively inexpensive, and provide support to a wide
range of peripheral devices which account for its popularity.
For this project, Raspberry Pi version 4b with 2 GB RAM was selected. It had an on-board
10 Chapter 3. System Design
Figure 3.2: X-500 with front-facing camera
wireless networking, which allowed creation of local WiFi network. It also featured a gigabit
Ethernet port, two USB 2.0, two USB 3.0 ports, and two micro-HDMI port. The USB
ports allowed connectivity to peripheral devices like camera, mouse, and keyboard, while
the micro-HDMI ports could be used for connecting to a display monitor.
3.2.3 Camera
During the first phase of flight operations, two different camera system were used for gath-
ering the image data-set.
The first one was the The Imaging Source (TIS) DFM 42BUC03-ML, a USB 2.0 camera,
shown in Figure 3.3. It’s a 1.2 MP camera capable of capturing upto 25 frames per second.
3.2. Hardware 11
Primary reason for selecting this camera were its features which included a global shutter, its
USB 2.0 support, and an SDK for interacting with the camera that was supported on Linux
based systems. The global shutter avoided motion-blurs allowing acquisition of high quality
images, especially during relative motion between the camera and the target object. The
images taken from a rolling shutter camera are susceptible to inconsistencies due to wobble
(also known as Jello effect), and skew distortions. From previous experience, the USB 3.0
ports were known to interfere with the Wireless signals. Since, the Raspberry Pi on-board
the UAV was expected to provide the wireless network for connecting to the local ground
station system, a USB 2.0 camera was preferable to a USB 3.0 camera. Having an associated
SDK with set of APIs was necessary to customize various camera parameters for acquiring
high quality data, and remotely trigger image/video acquisition.
Apart from the TIS camera, a Sony RX0 camera was also used during the first phase of
flight operations. The RX0, a compact digital camera, featured Exmor RS CMOS sensor,
capable of taking up to 15 MegaPixel (MP) images. The images captured by the RX0 were
high resolution (15 MP as opposed to 1.2 MP of TIS camera), and well exposed owing to its
well-tuned firmware. These images were more suitable to be used for 3D reconstruction of
the target environment on the Kentland farms.
3.2.4 Mounts
To place the Raspberry Pi and the USB camera on-board the UAV, a mount was designed
to fix on the same rods that hold the battery mount. This location was chosen as it was
sufficiently away from the propellers which ensured safety of the assembly, and also clear the
UAV components from the field of view of the camera. The mount was designed and 3-D
printed in the lab by, colleague in the lab, Felipe. It allowed the Raspberry Pi and the USB
12 Chapter 3. System Design
Figure 3.3: The Imaging Source USB 2.0 camera mounted on the Quadcopter
camera to be screwed - fixing them safely, while the mount itself rested on the bars that
support the battery mount. Since the HAS needed a nadir facing camera, while the LAI
needed a front facing camera orientation, the mount was designed to be versatile, to allow
to change the camera orientation between vertical and horizontal. This could be done by
changing the position of a sub-mount to which camera was attached. This sub-mount also
had cut-out that allowed the USB cable to be plugged into the camera from behind, while
being mounted as shown in Figure 3.3. In later version of the mount, vibration insulators
were incorporated in the design to protect the camera images from getting distorted due to
vibrations during flight. The mount is displayed in Figure 3.4.
3.3. Architecture 13
Figure 3.4: The Raspberry-Pi, and the front-facing TIS camera mounted on the X-500 usinga mount with vibration isolators
3.3 Architecture
The system design has been divided into two phases based on the flight operations. The fol-
lowing subsections describe the objectives of the two phases and the roles of their components
in details.
3.3.1 Phase-1 Design
The objective of the first phase of operations was to collect the latest information about the
environment. This was done by using a High Altitude Surveyor (HAS) UAV equipped with
a nadir camera. The data-set obtained by the flight operations was used to generate and
update the representations of the environment. This work used the data-set to generate three
14 Chapter 3. System Design
different representations : 3D-reconstruction for scene representation, Occupancy Image
to represent the obstacles, and Network graph to represent the traversable paths. These
representations were generated after the data-collection through the flight operations.
Figure 3.5: Phase-1 of the system involved capturing images of the environment by theHAS, which was followed by using the data-set for creating scene representations. The HASexecuted a pre-planned flight mission to fly over the target environment.
3.3.2 Phase-2 Design
The second phase of the flight operations was designed to allow a human operator to guide
the search operation, and execute a mission plan to obtain detailed information for decision
making. The human operator identified and marked areas of interest for closer inspection, by
clicking-and dragging the mouse over the mosaic image displayed by specially designed user-
interface. Using these areas of interest, a path planner developed an optimal mission plan,
which was then executed by the Low Altitude Inspector(LAI) aircraft. During the flight,
3.3. Architecture 15
LAI communicated the detailed information to a ground station. The human operator used
this information for manual target location, and detection and further decision making. The
image 3.6 shows the tasks carried out in the Phase-2.
Figure 3.6: The tasks involved in Phase-2 of the system
3.3.3 High Altitude Surveyor (HAS)
A Holybro S-500 was used as the High Altitude Surveyor (HAS) for obtaining updated scene
data. The payload on the HAS includes Raspberry Pi 4b, and a camera. The HAS was used
with two different cameras - The Imaging Source (TIS) USB 2.0, and Sony RX0. In each
flight operation the HAS carries only one of the two cameras. During the missions where the
HAS carries TIS camera, the on-board Raspberry Pi was used to create the WiFi network
which was used by a ground station computer to establish a connection, and remotely run a
Python script to start image acquisition. In other flight operations, where the HAS carried
the RX0, the camera was set to capture images continuously by holding the camera trigger
down, prior to the start of the flight. The HAS carried out a pre-planned coverage planning
16 Chapter 3. System Design
mission that was uploaded to its PixHawk using the QGroundControl software on the ground
station. The mission altitude was preset to 20 meters based on the camera-lens parameters,
vehicle ground-speed, and the lawnmower mission plan parameters.
3.3.4 Low Altitude Inspector (LAI)
A HolyBro X-500 was used as the Low Altitude Inspector (LAI) flew at lower altitude (2-3
meters) within the environment. The LAI received its visitation goals as mission waypoints
over the MavLink connection from a ground station computer. These visitation goals were
generated by the path planner by using the areas of interest, marked by the human operator
on the UI. The payload for LAI included a Raspberry Pi 4b, and a TIS USB camera. The
wireless network of Raspberry-Pi was used to establish a connection between the on-board
computer and a second ground station. At the start of flight operation, the ground station
started a remote Gstreamer pipeline on the on-board computer to start a video stream from
the on-board camera, over a wireless UDP connection, to the ground station.
3.3.5 Local Ground Stations
The HAS flights, and the LAI flights were conducted in two different phases which is dis-
cussed in details in the Section 4.1 and Section 4.12.
For the first phase of flight operations which involved only HAS, a single ground station was
used. When the HAS carried a TIS USB 2.0 camera. During this operation, the ground
station ran QGroundControl(QGC) which was used to upload the flight plan to the HAS over
MavLink, and monitor the UAV vitals, and the flight mission. The ground station was also
connected to the on-board computer using its wireless network, and was used to remotely
execute a python script on the on-board computer that started the image acquisition. This
3.3. Architecture 17
python script utilized the TIS Software Development Kit (SDK) APIs to set the camera
parameters, and trigger the image acquisition on regular intervals.
In the second phase of flight operations, the LAI visited the areas of interest within the
environment. During this operation, two ground stations were used. The local Ground
Station-1 ran a User-Interface (UI) which displayed the mosaic image of the environment,
and allowed the remote human operator to mark the areas of interest by drawing rectangles
over the mosaic. Once the areas of interest were identified, a python code was executed on the
ground station-1 that generated a flight path for the LAI, in the terms of a series of mosaic
image pixels coordinates, and then converts these pixel coordinates into the GPS coordinates,
and then communicated these GPS locations as waypoint goals to the LAI using the DroneKit
library APIs that used MavLink protocol for communication with the UAV. The Ground
Station-1 also ran the Mission Planner software to monitor the flight mission, and UAV
vitals. To allow Mission Planner software, and the Dronkit code to communicate with the
LAI simultaneously, it also executed MavProxy. The local Ground Station-2 connected
to the wireless network of the Raspberry-Pi on-board the LAI, and remotely executed the
Gstreamer application pipeline that sent a video stream from the on-board TIS camera,
over UDP connection, to the Ground Station-2. Locally, Ground Station-2 executed another
Gstreamer pipeline that received, and displayed the video stream.
3.3.6 Human-in-the-loop
Human operator is an integral component in this system’s design. During the second phase of
fight operations, the human operator guided the search operation, and observed a live video
stream for target identification, and decision making. The operator used a remote ground
station to remotely log in to the local Ground Station-1, present at the target environment
18 Chapter 3. System Design
site during the experimentation. The operator viewed the mosaic image displayed through
the UI, and marked the potential areas of interest by drawing rectangles by clicking and
dragging the mouse over the mosaic image. The marked areas were used as visitation goals
by the planner module that generated an optimal visitation plan, which is discussed in
more details in Section Path planner. The operator then executed the code on ground
station-1, that called the path planner, converted planner’s results from image pixels to
GPS coordinates, and communicated the GPS locations as sequential waypoint missions to
the LAI. The human operator also observed the video stream from the LAI on the ground
station-2. The operator manually identified the goal in the video stream which completed
the objective of the operation.
Chapter 4
Results
This chapter discusses the results of the various steps involved in the operation. Each section
in this chapter is devoted for a discussion of the methods used and the outputs obtained.
4.1 Phase-1 Flight Operations: HAS Flights
A partially-known environment implies the lack of accurate or complete information about
the environment. It may include a lack of a floor-plan, or topography information of the
environment. In this scenario, the term ’partially-know environment’ refers to the outdated
prior scene information. The outdated information could imply absence of accurate location
information of various objects in the target environment. This could be due to addition
or removal of objects, or change in the position of objects within the environment, relative
to the prior. These objects could either be obstacles during exploration with Low Altitude
Inspector (LAI) or potential objects of interest in the mission. The prior information may
be available in the form of previous maps of the environment, overhead photos from Google
Images, data-set from previous flight operations, etc.
In this case, the KEAS lab facility at the Kentland Farms in Blacksburg was chosen for
conducting field experiments. As the first step, flight operations were conducted in order to
acquire the latest information in the form of overhead images of the area.
19
20 Chapter 4. Results
The flight operations were carried out using a Holybro S-500 UAV, with a Raspberry Pi
computer, and a USB camera on-board. The Raspberry Pi was used to create a local wireless
network. The local ground station connected to the Raspberry Pi over the wireless network.
It was used to remotely login to the Raspberry Pi, and executed a python script to trigger
image acquisition using the USB camera. The local ground station also ran QGroundControl
(QGC) software, which communicated with the HAS via the telemetry radio. QGC was used
to upload, and start a pre-planned mission on the HAS, and monitor the vitals of the UAV
including the location, altitude, battery level, and progress of the mission during the flight.
The mission plan created, was aimed at covering the complete target area in order to capture
the overhead images. Coverage planning back-and-forth strategy or more commonly known
as raster scan, or ”lawnmower” scan, which is a kind of cellular decomposition coverage
strategy, was used to generate a flight path for this mission [5]. A flight path was generated
in QGC by specifying the extremities of the area to be covered. The parameters of this
flight path including the spacing between passes, and turning distances were set to ensure
the images obtained from the camera covered the complete region, and provided sufficient
overlap between images of adjacent regions. The image overlap was vital in ensuring the
quality of the map during the 3D-reconstruction step. The altitude of the UAV was kept
constant for the duration of the flight at 20m, and the spacing between two consecutive
passes was kept to 5m. These values ensured that a minimum recommended overlap (>60%)
was maintained. Using these parameters, a final path for the mission was generated as
displayed in Figure 4.1.
The flight operations were conducted using two different cameras - The Imaging Source (TIS)
USB 2.0 camera and a Sony RX0 camera. For the operations with TIS camera, the ground
station was connected to the on-board Raspberry Pi computer via its wireless network. A
remote login session was started into the Pi from the ground station before the take-off.
4.1. Phase-1 Flight Operations: HAS Flights 21
Figure 4.1: Coverage flight plan over the KEAS lab at the Kentland Farms
Once the HAS reached near the start position of the mission plan, the on-board camera was
triggered remotely to start image acquisition. This was done by remotely starting a python
script on the Raspberry Pi that used The Imaging Source SDK APIs to interact with the
camera. The python code was written to take an image every second, and save it on the Pi’s
SD card. When the HAS reached at the last waypoint in the mission plan, the script was
remotely closed. After completion of the mission, and landing of the HAS, the saved images
were transferred to the ground station through FTP file transfer. The images obtained from
this operation were not geotagged. A flowchart of this operation is displayed in Figure 4.2.
Another set of flights were conducted using the Sony RX0 camera instead of the TIS camera.
For these flights, the ground station was used to upload, control, and monitor the flight mis-
sion. The on-board Raspberry-Pi was not used during these flights. The Sony RX0 camera
was set to take images at regular intervals, before the beginning of the flight operation. The
22 Chapter 4. Results
images were saved in the camera’s SD card, that were transferred to the ground station after
the landing.
Figure 4.2: Flowchart displaying the process of Phase-1 flight operations
4.2 3D-Reconstruction
The second phase of flight operations of this system involved fling the Low Altitude In-
spector (LAI) in the environment, and visiting the areas of interests for closer inspection.
However, before proceeding to this stage, it was important to obtain detailed information
of the environment. A high-resolution 3D reconstruction of the environment provided the
necessary confidence for the low altitude flights. It was essential for locating, and avoiding
obstacles during flight missions.
Structure from Motion (SfM) is a popular technique for generating a 3D reconstruction of a
scene from image data. Structure from Motion attempts to reconstruct the scene geometry
using feature matching across multiple images. Since, the recent feature-matching algorithms
4.2. 3D-Reconstruction 23
are relatively more robust and less prone to errors due to inconsistent image acquisition, it’s
a highly sought method in cases of image-set obtained through UAVs [8]. The inconsistencies
in the image-set could be introduced from the location and altitude errors during flight due
to GPS, camera vibrations, etc. Region overlaps in the images helps this technique, in
overcoming these inconsistencies in the data set, by improving the feature-matching.
4.2.1 Geotagging images
Geotagged images are known to produce better results when used for 3D reconstruction, as
it is easier for feature matching and determining the relative position of the images. The
images obtained from the phase-1 of flight operations were, however, not geotagged. The
Imaging Source (TIS) camera used didn’t have an inbuilt GPS system. The image metadata,
however, does contain the timestamp of the time when image was first saved on the disk.
To improve the quality of 3D reconstruction, as a preprocessing step, the downsampled
images were geotagged using the flight log data obtained from the PixHawk flight logs.
Python code was written to find the GPS location of the UAV from the flight logs, at a time
corresponding to the timestamp of each image. These corresponding GPS coordinates were
then added to copies of the original images as a metadata. Using this process, an alternate
geotagged data-set was created.
4.2.2 Reconstructions
Commercial software packages Agisoft Metashape, and Pix4D were used to generate the 3D
reconstruction of the KEAS Lab environment. Before the reconstruction, the data-set was
downsampled by converting the resolution of each image from 16 MP to 1 MP. A python code
was written to resize each image of the data-set using the OpenCV libraries inbuilt function.
24 Chapter 4. Results
This was done to reduce the processing time of 3D reconstruction. Both the software were
fed the same down-sampled image data set for 3D-reconstruction. The results obtained from
both the software are discussed in the following subsections.
Agisoft Metashape Reconstruction
Reconstruction was first attempted with Agisoft Metashape on a 32GB RAM machine, with
non-geotagged images. It did not produce a usable 3D reconstruction for this dataset. Hence,
geotagged image dataset was used for reconstruction in Metashape.
The 3D reconstruction was attempted with different accuracy settings for the image align-
ment, and the dense point cloud, and the time taken for the process was observed. The
results are displayed in the Table 4.1.
Align Photos :Accuracy
Build dense cloud :Accuracy
Time taken(in hours)
High High 26High Low 15
Table 4.1: Time taken for 3D-reconstruction with Metashape for different options using thegeo-tagged image data-set
Figure 4.3 displays the point-cloud map of the Kentland farm after the 3D reconstruction
using high accuracy in photo alignment step, and high accuracy in Build dense cloud step. By
visually inspecting the point-cloud map generated by Metashape, it was found that several
points in the features of the scene were missing like walls of shed, and ground, and in some
cases the walls, and the roofs of the sheds were distorted or discontinuous. Due to these
errors, the point-cloud generated by Metashape wasn’t used in the subsequent steps of the
project.
4.2. 3D-Reconstruction 25
Figure 4.3: Metashape reconstruction for geotagged downsampled image-set
Pix4D Reconstruction
The Pix4D Mapper software is another popular commercial software that has the capability
of generating a reconstruction of a scene using a set of images. This software was used to
generate a 3D reconstruction for two different image datasets. The first set of images were
the downsampled version of the images obtained from Sony RX0 flight missions over the
Kentland farms. These images were not geotagged. The second set of images were geotagged
downsampled images where the geotag metadata was added by the process described in the
section Geotagging images. The resulted 3D reconstruction is displayed in Figure 4.4. This
3D reconstruction was better in quality than the point cloud generated by Metashape, and
captured various features of Kentland Farm area with higher accuracy.
Experiments with different available ’point density’ option, in the Pix4D software, during the
reconstruction steps were conducted to see its impact on the quality of the reconstruction,
and the processing time taken. This was done using the geotagged image data set. The
26 Chapter 4. Results
Table 4.2 shows the comparison in time taken .
Point Density Time taken(in minutes)
High 35Low 25
Table 4.2: Time taken for 3D-reconstruction with Pix4D for different options for geotaggedimage data-set
Figure 4.4: Pix4D reconstruction for non geotagged downsampled image-set
An overhead mosaic image of the complete environment was generated by Pix4D Mapper.
The mosaic is shown in Figure 4.5.
4.2. 3D-Reconstruction 27
Figure 4.5: Overhead mosaic image of the target environment located near KEAS Lab atthe Kentland Farms in Blacksburg. This image was generated from the 3D reconstructioncreated by the Pix4D Mapper
28 Chapter 4. Results
4.3 Image Analysis
Apart from using the data gathered in the first phase of flight operations, it was also used
to develop a better understanding of the environment using Computer vision techniques. A
detailed understanding of the scene could be vital for the planning stage in a search and
rescue operation. Once a mosaic image of the complete area was generated, it was analysed
to isolate or segment the contents of the scene.
Two different methods were used to segment the scene using the overhead images or the
mosaic generated from set of overhead images, and attempt was made to identify the roads
and shadows from the images. This solution, however, was not integrated into the final
system design. This was due to the lack of robustness of the solution. Following subsections
discuss the two methods used in further details.
4.3.1 Shadow and road detection
Color based segmentation is one of the popular methods for segmenting an image. Often
segmentation in Hue-Saturation-Value (HSV) color-space is preferred over segmentation us-
ing the Reg-Green-Blue (RGB) color-space. This is because human eyes are more sensitive
towards intensity of the light rather than color. HSV color-space resembles human vision
more closely than the RGB color-space [9]. In [9], and [19] the authors have used color
thresholding in the HSV color-space for segmenting images. A similar approach was used
for segmenting the overhead mosaic image obtained from the Pix4D point-cloud.
The roads, and shadows could be identified in the mosaic by converting it in the HSV color-
space, and then using thresholding. A python code was written using OpenCV functions to
read the mosaic image, and convert it to HSV color-space and segment the image based on
4.3. Image Analysis 29
the thresholds values. A binary image was generated as output that displayed pixels within
the range of thresholds as white, while rest of the pixels were black in color. This was done
to identify shadows, and roads in the mosaic image.
The shadows were identified by using threshold on the ’Value’ of pixels. The pixels that had
’Value’ between 5 and 50 in the HSV color-space were identified as shadow pixels. The result
obtained by using this technique are displayed in Figure 4.7. To identify pixels belonging
to the road, thresholds on Hue (between 100 and 160), Saturation (between 8 and 40), and
Value (between 100 and 160) were used. The output image identifying the roads is displayed
in Figure 4.6.
Although, this technique generated quick results which made it a good candidate for real-
time or on-line processing, its drawback was that the threshold values were not absolute
or objective. The threshold values were subject to lighting conditions of the day when the
images were taken, and the camera-lens system used for acquiring the images. Therefore,
the threshold values to identify shadows and roads in the above two cases discussed were
empirically deduced, and it cannot be expected to work with a different data set. Hence,
this technique was not robust enough to be included into the final system design.
4.3.2 Texture based segmentation
Another popular technique used for segmentation of aerial images is based on textures. In
[12], Texture is defined as ”repeating patterns of local variations in image intensity which are
too fine to be distinguished as separate objects at the observed resolution.” Mathematically,
textures in an image are represented by using feature descriptors that represent the local
variations in image intensity.
By using texture, it is possible to differentiate amongst the materials, even in a grayscale
30 Chapter 4. Results
Figure 4.6: Roads identified in mosaic Figure 4.7: Shadows identified in mosaic
image as shown in Figure 4.8. Textures in images are often indicative of a material’s prop-
erties, and serve as important cue that could be used to distinguish object, shapes, and
boundaries [13]. Image segmentation using texture is often done in two steps - first, analyse
or represent textures present in the image, and then subdivide the image into regions of
consistent textures.
To represent textures mathematically, a feature descriptor is used. Filters or filter-banks (a
set of filters) are applied to a neighbourhood of a pixel through convolution, and is used to
generate a response for each filter, for each pixel of an image. By using the statistics of filter
responses, like mean, and standard deviation, on each local window a feature descriptor of
local texture can be formed for each pixel. The feature descriptor has the same dimensionality
4.3. Image Analysis 31
Figure 4.8: A grayscale image showing four different materials. By looking at this image itis possible for humans to distinctly identify the materials even without color information.[13]
as the number of filters present in the filter bank. Textures can also be characterized by
forming a histogram of these responses to filters in a region of interest.
One of the many filter banks used for generation of feature descriptors for textures is Leung-
Malik (LM) filter bank as shown in Figure 4.9. This filter bank consists of 48 filters, which
are designed to isolate a variety of textures including texture patterns composed of edges,
bars, or spots with different orientations and sizes.
The image could be segmented using the feature descriptors. One of the ways to achieve this
is to assign pixels with similar feature descriptors to a common group. This can be done by
32 Chapter 4. Results
Figure 4.9: Figure showing the images of the 48 filters of the Leung-Malik (LM) filter bank[13]
using k-means Clustering algorithm over the multi-dimensional texture feature descriptors.
The clustering Algorithm groups features that are similar together into a cluster. One of
the disadvantages of this method, however, is that the number of groups, k, are usually not
known beforehand. The number of unique textures present in an image depends on the area
being imaged. In such cases a pre-defined number of clusters isn’t too effective. The results
of image segmentation using the k-means clustering are shown in Figure 4.10.
One of the drawbacks of this implementation of texture based segmentation method was that
the evaluation of textures using the LM filter bank, and segmentation of image based on it
using k-means clustering algorithm required more processing time than what was suitable
for a real time system. Apart from this, choosing a value for the number of clusters for the
k-means algorithm was difficult. The number of distinct textures would vary from image to
image, as each image captured a different area. The clustering didn’t provide an objective
method for segmenting an image. An unsupervised algorithm could be used to find the
optimal clusters before segmenting each image, however it’d be computationally expensive
and may require more processing time. Due to its inability to process images quickly, and
the need of an external parameter, this solution could not integrated in the system to be
4.4. On-line Image mosaicking 33
(a) Original Image 1 (b) Texture Segmentation of Image 1
(c) Original Image 2 (d) Texture segmentation of Image 2
Figure 4.10: Image showing the result of texture based segmentation that used LM filters togenerate the feature descriptors and k-means clustering for segmenting the pixels
used in the field experiments. However, the segmentation results are encouraging and this
method could be improved upon and be included into the system in the future.
4.4 On-line Image mosaicking
Image mosaicking is the process of joining multiple overlapping images of a scene, and
stitching them together to obtain a larger combined-image. This process typically involves
identifying common features between the images, evaluating the homography transformation
matrices, and warping the images by applying the transform to create a panoramic image.
34 Chapter 4. Results
During flight operations, the camera captures new images at regular time intervals as the
flight mission is executed. On-line image mosaicking is a technique to generate mosaic
progressively by integrating the new image, as it’s captured. This means generating mosaic
without having to wait for the complete set of images to be collected, but rather to integrate
each new image, into the mosaic image, as it is captured. An on-line image mosaicking
solution was created with the intent to assist the process of integration of the two phases
of flight operations. Although, due to lack of robustness this solution was not integrated
into the final system design, it featured some useful techniques for implementation of on-line
mosaicking. The steps involved in this solution included ORB feature identification, brute-
force feature matching, homography matrix evaluation for various parameters, selection of
the best homography matrix, and image warping for mosaicking. Following text reveals
further details of the implementation.
Two popular feature detection algorithms include Scale Invariant Feature Transform (SIFT),
and Oriented FAST and Rotated BRIEF (ORB) [3], [13]. While SIFT is considered a robust
algorithm for identifying keypoints, it’s computationally expensive [13]. The SIFT algorithm
is also patented which restricts its usage. ORB, on the other hand, is a non-patented
alternative algorithm that matches SIFT in performance, and exceeds it in processing time.
ORB uses a combination of FAST, for keypoint detection, and a modified version of BRIEF
descriptors to enhance the performance.
Using OpenCV library of Python, a program was written with the objective of generating a
mosaic from the overhead photos captured by the UAV on-line. The program attempted to
iteratively generate a mosaic image, where the inputs to the program were the mosaic from
previous iteration, and a new image. This process was repeated till the new input images
were provided. The code used ORB algorithm implementation of OpenCV for identifying
keypoints and generating BRIEF descriptors for the two images. OpenCV’s brute force
4.4. On-line Image mosaicking 35
matcher was used to match the keypoints across the images.
The flight mission parameters such as spacing between each pass of the lawnmower scan,
overshoot distance, vehicle ground-speed etc. were set to ensure sufficient overlap in the
sequential images. Using this information, the feature matching was used to find match
the features of the new image, with only the features of existing mosaic, that belonged
to the region where the last image was integrated. This was done by keeping track of
the homography transformation of the previous step, and generating a mask. This mask
was a binary image with the size that of the existing mosaic image, in which white region
corresponded to the region of last image in the existing mosaic. The figure 4.11 shows the
last image isolated from the existing mosaic, and the corresponding mask. The figure 4.12
shows the feature matches drawn between a new image, and an existing mosaic. Notice that
the features in the mosaic belong only to the region where the last image was integrated.
This technique was helpful in reducing incorrect feature matches, and made the code sub-
stantially faster, as it restricted the features to a small region of existing mosaic instead of
the whole mosaic image area. This was especially vital as the mosaic image constantly grows
with each iteration, which could’ve drastically affected the performance if the feature match-
ing was carried out for the complete mosaic image. This technique, on the contrary, lead to
a near-constant processing time for each iteration in the mosaicking process, regardless of
the size of existing mosaic image.
For evaluation of homography matrices, OpenCV’s ’findHomography’ function was used. A
set of homography matrices were generated by using a combination of different optimization
methods provided by the OpenCV function, and number of matches to be considered for its
computation.From this set, the best homography matrix was chosen using two criterion -
least mean square projection error for the matches, and near-constant scaling. Since all the
images were taken from nearly the same altitude, the homography matrix could be expected
36 Chapter 4. Results
(a) (b)
Figure 4.11: The sub-image (a) shows the last image region isolated in the existing mosaic;Sub-image (b) shows Mask representing the last image in the existing mosaic
Figure 4.12: Matches detected between an existing mosaic and a new image
4.4. On-line Image mosaicking 37
to have scaling factors closer to 1. Hence, any matrices that had high value of scaling
factors were rejected. Out of the remaining matrices the matrix with the least mean square
projection error for the matched features was selected. In the next step, this transformation
matrix was used to warp the new image and integrate it to the existing mosaic to form a
new mosaic image.
This program was capable of generating mosaics on-line. Creating a new mosaic by integrat-
ing a new image to the existing mosaic took approximately 0.3 second. The processing time
remained almost constant even as the size of existing mosaic grew. However, the mosaic
quality achieved by this method was not suitable for use in field operations. Figure 4.13
shows the mosaic generated from 25 images provided to this program sequentially. Hence,
this solution was not used in the final system design.
The results of this method were compared to the OpenCV image stitching functionality. The
OpenCV functionality worked well, when all the images to be stitched are provided at once
as shown in Figure 4.14. However, during the flight operations the images would be acquired
sequentially. Hence to simulate this condition, when the images were provided sequentially
to the OpenCV function, it produced worse quality mosaic image 4.15, before exiting with
errors.
Compared to the OpenCV functionality, the on-line image mosaicking solution developed
for this work produced better results. In terms of processing time, and the quality of mosaic
created for on-line mosaicking process, the solution was superior to the OpenCV function.
However, due to inconsistencies in the mosaic image, this solution lacked the robustness of
a finished product, and was not integrated into the final system design. The results are
summarized in the Table 4.3.
38 Chapter 4. Results
Figure 4.13: The image showing the mosaic generated from 25 images of the Kentland farmarea, passed sequentially, using the original code developed. The mosaic contains multipleinconsistencies including a discontinuous pole, and shed. Hence, this solution was not usedin the system.
OpenCV Image Stitcher New Solution
Off-line Image Mosaicking Capability Yes NoQuality High NA
On-line Image Mosaicking Capability No, exits with error YesQuality NA Medium
Table 4.3: Comparison of the capabilities of the OpenCV Image Stitcher and the on-lineimage mosaicking solution developed for this work.
4.4. On-line Image mosaicking 39
Figure 4.14: The image showing the mosaic generated from 30 images of the Kentland farmarea using the OpenCV functionality when all the images are provided at once.
Figure 4.15: The mosaic generated by the OpenCV functionality for a set of images of theKentland farm, passed to it sequentially. The program exited with an error after processing15 images, and is unable to finish processing the 25 image set.
40 Chapter 4. Results
4.5 Occupancy Image
Occupancy grid map is a popular representation of an environment, and is commonly used
for path planning. An occupancy grid map divides the environment into multiple grids,
and each grid is associated with a value that represent the probability of that grid being
occupied. The occupancy grid carries information about the obstacles and free-grids of the
environment, which is crucial for path planning.
In the current system design, it was assumed that the environment was static, and the
locations of obstacles, and free areas of the scene was correctly captured in the overhead
mosaic image, and the 3D reconstruction created in the earlier steps. With this assumption,
the overhead mosaic image, and the reconstruction was used to generate a binary version of
occupancy grid map, where grids were represented by the pixels of the image, and the color
of each pixel represented the presence or absence of obstacle at the pixel. The probabilities
of occupancy in each grid or pixel were replaced with binary values, where a white-colored
pixel represented a pixel belonging to an obstacle while a black-colored pixel represented a
free area in the environment. Such a binary occupancy grid was used for the path planning.
To create a binary occupancy image, the point-cloud generated by Pix4D Mapper was used.
Since the flight altitude for LAI was predetermined, the point cloud map was used to de-
termine the points that belonged to objects that were high enough to be an obstacle for
the LAI. The Point cloud data was filtered based on the z-coordinate threshold. Any point
higher than the threshold was considered as an obstacle, while the points with z-coordinate
below the threshold were marked as unoccupied. The point of the point-cloud were then
projected on a horizontal plane, assigning the white-color to the points that were higher
than the threshold in the point-cloud, while assigning black-color to points lower than the
threshold. Through this method, an occupancy grid image was created as shown in Figure
4.6. Perimeter Contours 41
Figure 4.16: Occupancy image shows the sheds and structures taller than 1.5 m. Thesestructures are obstacles for the LAI flight.
4.16, that divided the map into regions of obstacles and non-obstacles.
4.6 Perimeter Contours
As discussed before, the target object could be either located in the open areas, which would
be visible from the HAS camera or it could be present in regions next to a structure or
under one of the equipment sheds at the farm, which may obscure it from the HAS camera.
If the object of interest is not present in the open areas, the structures and sheds become
areas of interests for a closer inspection by the LAI. Hence, the path planning algorithm
42 Chapter 4. Results
needed to favor the paths that allowed inspection of the sheds. For inspecting the sheds
and structures, a flight plan was needed that favoured flights along the perimeter of these
structures, allowing the LAI to inspect the contents of these structures.
To achieve this, contours enclosing the equipment sheds, and other structures present in
the environment were created. The contours are shown in the Figure 4.17. These contours
were returned by an OpenCV API that identified contours belonging to the obstacles in the
occupancy image. The contours obtained by this method were then optimised using the
Discrete Curve Evolution (DCE) technique. The Discrete Curve Evolution (DCE) technique
is used to simplify curves (such as contours) by removing contour segments that contribute
the least to the overall shape of the contour [13]. The simplified contours allowed for coverage
of the perimeter of the structures while optimizing the contour length.
Incorporating the optimized perimeter contours in the path planning allowed generating
optimal visitation plans that allowed closer inspection of the sheds, and structures while
visiting the areas of interest that were selected by the human operator.
4.7 Network Graph Map
Network graph is one of the ways of mathematically representing a discrete version of an
environment. A network graph consists of nodes, and edges connecting the nodes. The
nodes represents the locations/landmarks in an environment, and the edges between two
nodes represent a path between the locations. Each path has a weight associated with it,
which is interpreted as the cost of visitation from one node to another. This is also known
as Edge cost. The network graphs could have bi-directional or uni-directional edges. The
cost of visitation could also depend upon the direction of visitation.
4.7. Network Graph Map 43
Figure 4.17: Image displaying the contours generated enclosing the sheds and other structuresin the environment
To create a discretized version of network graph for the Kentland farm area, the perimeter
contours, discussed in Section Perimeter Contours, were used. The contour points of all the
contours were assigned as the nodes of the network graph. The position of the UAV, at the
beginning of the mission, is also added to this graph as a node.
4.7.1 Network Edges
The network edges for this graph can be divided into two categories - contour edges and
non-contour edges. The contour edges were are the segments that comprise of the contours
44 Chapter 4. Results
that were generated as the perimeter contours in the Section Perimeter Contours. These
edges however only connected the nodes that are part of the same contours.
Using Python code, all possible edges between each of the pair of nodes or contour points,
that were not connected by the contour edges, were generated. These non-contour edges
were created by drawing the edges between each pair of points while ensuring that these
edges do not intersect with obstacles identified in the occupancy image (Figure 4.16). If
an edge intersected with any of the obstacles on one or more points, the edge was rejected.
Hence, only the legal non-contour edges were generated that didn’t intersect any obstacle.
The Figure 4.18 displays all the edges generated for the graph. The edge cost of an edge in
the graph was proportional to the length of the edge itself, and independent of the traversal
direction. This incentivized the planner to generate shorter optimal paths of visitation for
the search operation.
4.7.2 Non-contour Edge-Cost Factor (ECF)
As discussed in the previous sections, to increase the efficiency of the search the flight path
of the LAI needed to follow the perimeter of sheds and other structures present in the scene.
This would allow a closer inspection of the contents of sheds which increases the possibility
of finding the target object. To assist this process, the flight paths needed to be biased to
include the contour edges generated in the Section Perimeter Contours. Hence, a concept of
non-contour edge-cost factor (ECF) was introduced. This was a constant number, that was
used as a multiplier to the cost of traversal along a non-contour edge. This discouraged the
planner from generating an optimal path that would consist only the non-contour edges. By
increasing the cost of traversal along these edges, the planner was penalized. It served the
purpose of biasing an optimal path generated by the planner to include contour-edges.
4.7. Network Graph Map 45
Figure 4.18: Image displaying the contour and non-contour edges of the graph created forthe environment
Experiments were conducted with different values of the non-contour edge cost factor for
the same set of visitation goals for the planner. Since, the edge cost of a non-contour edge
was multiplied by the ECF factor, for ECF values of greater than 1, the planner would
favor the contour edges, while for ECF values smaller than 1, the planner would favor the
non-contour edges. It was observed that for small values of the ECF, the planner tend to
generate shorter path that included edges which were small regardless of whether the edge
were contour edges or non-contour edges. However, for factor values of 3 and above, the
planner tend to generate slightly larger optimal path that included the contour edges more
often. Figure 4.19 shows the results of this experiment.
46 Chapter 4. Results
(a) Non-contour edge cost factor = 0.1 (b) Non-contour edge cost factor = 0.5
(c) Non-contour edge cost factor = 1.0 (d) Non-contour edge cost factor = 2.0
4.8. User Interface 47
(e) Non-contour edge cost factor = 3.0 (f) Non-contour edge cost factor = 5.0
Figure 4.19: Path generated by the path planner for various values of non-contour edge costfactor. For values of ECF greater than 1, the planner favors edges along contours, while forlower values the path favors non-contour edges
4.8 User Interface
Human operator is an important component in this system, and has two major roles to
play. The first of which, is to guide the search operation by identifying the areas of interest
for visitation in the environment. These areas of interest are then incorporated in the
path planner which generates a series of visitation waypoints for the LAI to gather detailed
information in the form of close-up video stream. This video stream is manually analysed
by the human operator which constitutes their second task in the system. This required
48 Chapter 4. Results
designing a user interface that allowed the human operator to communicate the areas of
interest with the path planner. The challenge was to develop an interface that’s easy to use
and effectively translates the areas of interest identified by the human into a format that
can be used by the path planner without much processing.
To allow the human-robot interaction a user-interface (UI) was developed in Python. The UI
displayed the overhead mosaic image and had the capability to allow user to draw rectangular
boxes over the image being displayed. Drawing boxes allowed the human operator in the
loop to, conveniently, mark the areas of interest in the mosaic image being displayed. The
UI allowed drawing multiple boxes, one at a time. When a new box was being created, the
previous box was cleared from the screen to avoid confusion.
When the operator presses the ’save visitation goals’ button, all these coordinates are written
in a text file on the memory. When the mission is executed, in the first step, this text file is
read to extract the rectangle coordinates, and these are passed to the path planner, which
uses these coordinates to generate an optimal visitation plan. Figure 4.20 displays the user-
interface developed.
4.9 Path planner
The path planner designed for this work is a two-step planner. It uses A-star planning algo-
rithm to find the optimal path in the first step, while a solution to the Travelling Salesman
problem is used to deduce optimal visitation order in the second step. These algorithms are
described in details in the following sections.
4.9. Path planner 49
Figure 4.20: The image shows UI displaying the overhead mosaic image with a rectangledrawn (in red) over an area of interest. At the top a ”Save visitation goals” button ispresent that saves the central coordinates of all the rectangles drawn by the human operatorin a text file on disc.
50 Chapter 4. Results
4.9.1 Dijkstra’s planning algorithm
Dijkstra’s algorithm is designed to find an optimal path between any two nodes in a dis-
cretized environment which may be represented as a network graph. It’s an iterative algo-
rithm which is complete, and optimal [20]. The completeness of the algorithm implies that
it either returns a solution, if it exists, or a declares failure in finite processing time. The
Optimality implies that the algorithm returns the most optimal solution for the problem.
The objective of this algorithm is to find a path from start vertex to a goal nodes that has
minimum cost of traversal. As discussed in 4.7 section, in a network graph each edge has an
associated cost. The cost of traversal of a path is the sum of cost of all the edges that are
part of the path. The most optimal path between the start and goal nodes has the minimum
cost of all possible paths between the two nodes.
Cost-to-come (V(xi)) of any node, i, in the network graph is defined as the optimal cost
to visit that node from the start node. The cost-to-come version of Dijkstra’s algorithm
maintains an Open list (O), initialized with the start node, that is a list of nodes to be
explored. The node in the open list are prioritized by their cost-to-come values. In every
iteration the node (xi) with lowest cost-to-come is removed from O, and all its neighbouring
nodes are added to the Open list (O). While doing so, the cost-to-come V(xj) for each of
the neighbouring nodes xj of xi is evaluated as V(xj) = C(xj, xi) + V(xi), where C(xj, xi)
is the edge-cost of the edge joining the nodes xj, and xi, and V(xi) is the cost-to-come for
the node xi. If the newly evaluated value V(xj is lower than the cost-to-come of xj then, it
is updated, and the node xi is set as the best neighbor for xj. Once this step is done, the
chosen node, xi, is moved from open list (O) to a closed list (C). This process is repeated
till the goal node is moved to the closed list (C). At the end of this process, the optimal
cost-to-come for the goal node is given by V(xgoal). By back-tracking the best neighbour of
4.9. Path planner 51
Figure 4.21: The image shows the expansion as the Dikjstra’s algorithm as it finds theoptimal path from start node (green) to the goal node (red). [20]
the nodes, starting from goal node to the start node, the optimal path can be deduced. The
optimal path consists of only the vertices that are present in the closed list (C).
The Dikjstra’s algorithm has time complexity of O(|E| + |V|log|V|), where |E| is the number
of edges in the graph, while |V| represent the number of nodes. An example of Dikjstra’s
expansion from start node to goal node is shown in Figure 4.21. The green node in the figure
corresponds to the start node while the red node represent the goal node. The sub-figures
show the progress of the algorithm as it finds the optimal path from start node to goal node.
4.9.2 A* planning algorithm
A* algorithm is an extension of the Dikjstra’s algorithm that reduces the number of nodes
explored. This is done by incorporating a heuristic estimate of cost to get from current node
to the goal node, while prioritizing the nodes in the Open-list [14]. In Dikjstra’s algorithm,
the nodes are solely prioritized based on the cost-to-come value, however, in the A* algorithm
the nodes are prioritized by looking at both - the cost-to-come value of current node from
the start node, and the heuristic estimate of cost-to-go to the goal node.
52 Chapter 4. Results
Figure 4.22: The image shows the expansion as the A* algorithm as it finds the optimalpath from start node (green) to the goal node (red). [20]
Other than changing the criteria for prioritizing the nodes in Open list (O), the A* algorithm
works similar to the Dijkstra’s algorithm. The algorithm is however, optimal only if the
heuristic used is admissible. A heuristic estimate is considered admissible if it’s an under-
estimate of the actual minimum cost-to-go. The heuristic estimate of cost-to-go from a node
h(xi) must be less than or equal to the cost to go from the node to the goal node i.e. h(xi)
<= C(xi, xgoal).
An expansion of A* algorithm is displayed in Figure 4.22. The start node and goal nodes
are similar to that in case of Dijkstra’s algorithm shown in Figure 4.21. However, it can be
observed by looking at the two figures that the A* algorithm needs fewer expansions to reach
the goal node. In this case, the heuristic estimate used was the Euclidean distance between
a node and the goal node. Since, Euclidean distance is the smallest distance between two
nodes, this heuristic estimate is admissible.
4.9.3 Travelling Salesman problem
The objective of Travelling Salesman Problem (TSP) is to find a route for a travelling
salesman starting from a city, and visiting each goal city exactly once before returning back
4.9. Path planner 53
to the same city where the travel started. The total number of possible tours are (n-1)!/2
for n number of cities, which makes it difficult to evaluate the cost of each tour and select
the one with minimum cost. To solve this problem functionality of existing Python library,
MLRose was used [2]. MLRose uses randomized optimization algorithm to solve the TSP
problem.
There are three steps involved in the solving the TSP using the MLRose functionality. First
is to define a fitness function object, second is to define an optimization problem object, and
lastly running the randomized optimization algorithm. The MLRose package offers APIs
to use genetic algorithms to solve the TSP. This method was used to solve the TSP for
determining visitation goals for the LAI.
4.9.4 Two-Step Planning
After the human operator marks the areas of interest on the UI and presses the ’Save
visitation goals’ button, the central pixel coordinates of these rectangular areas are saved on
the disc. These coordinates are then passed as arguments to the planner. As a first step, the
path planner computes the graph nodes (or contour points) in the network graph (discussed
in Section 4.7) that are nearest to each of these central pixel coordinates. These graph nodes
are used as the visitation goals by the planner. This helps the planner in generating an
optimal path that allows closer inspection of sheds that are present near the marked areas.
The path planner designed for this work is a two-step planner. In the first step, an optimal
path is found between each pair of the visitation goals in the network graph using A* star
algorithm with a Euclidean distance heuristic. In second step, an optimal visitation plan
to all the goal nodes is deduced using the optimal paths computer in the first step. The
visitation goals are considered as the nodes for the travelling salesman problem, while the
54 Chapter 4. Results
Figure 4.23: Tasks carried out by the two steps of the path planner
cost of visitation between any two nodes is derived from the first step of the path planning.
This problem is then converted into a format that is acceptable by the MLRose functionality
before solving the TSP. In this work, genetic algorithm with default parameters were used to
solve the TSP. The output of the second step is the optimal order of visitation of the goals.
Figure 4.23 shows the tasks carried out by the two steps of the planner.
The path planner combines the solution of the two steps described above to generate an
optimal path of visitation to the areas marked by the human operator for closer inspection.
The path is generated in the form of a sequence of pixel coordinates that comprise the contour
nodes present on the edges of the optimal path. These pixel coordinates are later converted
into GPS coordinates, which are then communicated to the LAI as waypoint visitation goals.
4.10 Pixel to GPS Transformation
The path planner generates the optimal path in the form a sequence of pixel coordinates
of the overhead mosaic image. These pixel coordinates need to be converted to the GPS
4.11. LAI Navigation 55
coordinates before it can be used by the Dronekit navigation code to guide the LAI to the
areas of interest. Another module was created to transform the pixel coordinates to the GPS
coordinates. For this, the GPS coordinates of the corner points of 5 different sheds in the
Kentland Farm area, were determined using the PixHawk GPS mounted on the UAVs. Then,
the pixels coordinates of the same corners were obtained form the overhead mosaic image.
The corners are indicated in the Figure 4.24. The Pixel coordinates and GPS coordinates
for these points are presented in the Table B.1 in Appendix Pixel to GPS transformations.
Using the in-built Python functionality a homography matrix was computed for these two
set of coordinates. The homography matrix allowed transforming pixel coordinates into
the GPS coordinates, and an inverse transform of it allowed reverse transformations. This
transformation was obtained using RANSAC method that reduced the errors, and computed
an optimal solution. The transformation matrix obtained through this process is present in
the Appendix B.
The transformation matrix was validated before the phase-2 flight operations were conducted.
This was done by generating GPS coordinates from pixel coordinates of some of the random
identifiable features in the mosaic, and comparing the results with the GPS location of these
features obtained using PixHawk GPS on-board the UAV.
4.11 LAI Navigation
Both the UAVs used in the field operations, the S-500 and the X-500, used PixHawk 4 which
communicates over the MavLink protocol. The MavLink connection is also used by the
QGroundControl, and the Mission Planner software to exchange data, upload mission plans,
and monitor the state of the UAV.
56 Chapter 4. Results
Figure 4.24: The image shows the reference points used to compute the transformationmatrix
4.12. Phase-2 Flight Operations: LAI Flights 57
Dronekit-python library is an open source library that provides high level API to communi-
cate with the drone over MavLink protocol [1]. It access to s functionalities allow querying
connected drone’s telemetry and state information. Using the Dronekit APIs also allow
mission management, and motion control of the connected vehicle.
Using Dronekit-Python APIs, a navigation module was developed to command the vehicle to
take-off, and visit a set of waypoints. The module was used to established the connection with
the vehicle, and set vehicle mode and parameters, and mission parameters. A function was
created which accepted a series of 2D GPS points (Latitude and Longitude), and a desired
altitude, and commanded the UAV to visit these GPS locations in order. The location of
the vehicle relative to the waypoint goal was tracked, and when the distance between the
two was below an acceptable threshold, the vehicle was sent a command to visit the next
waypoint in the list. When all the GPS points in the list were visited, the mission was
complete and a Return To Launch (RTL) command was issued to the vehicle.
During the flight operation, the vehicle mode was constantly checked. If the vehicle mode was
found to be anything other than Guided, the execution was paused for 20 seconds and vehicle
mode was rechecked. If the mode was not found to be Guided, the waypoint was skipped,
and next waypoint visitation command was issued. This was done to allow a human operator
to take over the control of the vehicle for a short duration if necessary. This was important
as due to GPS errors, barometric errors, or wind gusts the vehicle would occasionally lose
control and an external intervention was needed to set it back on right path.
4.12 Phase-2 Flight Operations: LAI Flights
The goal of the second phase of flight operations was to conduct a LAI flight as per the path
generated by the path planner based on the areas of interest marked by the human operator,
58 Chapter 4. Results
Figure 4.25: The image shows the process flow chart for the phase-2 flight operations
and receive a live video stream from the on-board camera to a ground station. A process
chart of the phase-2 flight operations is shown in Figure 4.25.
For this operation an X-500 UAV equipped with a front-facing TIS camera, and on-board
Raspberry Pi was used. The setup also included two local ground station, and a remote
ground station with a human operator.
The local ground station-1 ran the UI, as discussed in Section 4.8, and used MavProxy
software to allow the same telemetry radio port to be used in Mission Planner as well as
the by the Dronekit code for establishing connection with the LAI, and communicating the
waypoint commands over MavLink. The human operator remotely logged in to the ground
station-1, and marked areas of interest for closer inspection by drawing the rectangles over
the mosaic image being displayed on the UI. The image 4.26a shows the areas of interest
marked by the human operator for closer inspection. When the human operator, pressed the
4.12. Phase-2 Flight Operations: LAI Flights 59
”Save visitation Goals” button on the UI, a text file containing the central pixel coordinates
of all the rectangles drawn by the human operator were saved into a text file in the memory
of the local ground station-1.
After this a remote human operator, started a python script called ”Run_Mission” on the
local ground station-1. This python code read the pixel coordinates saved in the previous
step, and passed them as argument to the path planner. The path planner, first generated the
network graph using the occupancy grid map, and then triggered the two-step path planning
process. The path planning process generated, and returned an optimal plan of visitation in
terms of mosaic image pixel coordinate. The two-step path planner required 0.9 seconds of
processing time to generate the mission plan for 5 areas of interest selected by the human
operator (16 GB RAM, i7 processor). Once, the path planner finished executing, another
function was called to transform the pixel coordinates into GPS coordinates. The plan
generated by the planner is displayed in the Figure 4.26b. Once, the GPS coordinates were
obtained, the DroneKit code was called to communicate the sequence of GPS coordinates
as waypoints to the LAI. During the rest of the mission, the ground station-1 monitored the
vitals of the UAV and mission progress through the Mission Planner software. The Figure
4.26c shows the path followed by the LAI.
Meanwhile, the ground station-2 was connected to the camera on-board the LAI and streamed
the video, over a UDP connection through the wireless network of the on-board Raspberry
Pi. Both the Raspberry-Pi and ground station-2 used the GStreamer application for stream-
ing of the video. The Raspberry Pi executed a GStreamer pipeline that encoded the TIS
camera video, and packaged it before sending it over the UDP port of ground station-2. The
ground station-2 executed another pipeline that received the stream from the UDP port,
depackaged it, decoded it, and then displayed it on its screen. A lag of roughly 1-2 seconds
was observed in the video streaming, during the mission. The Figure 4.27a shows the UAV
60 Chapter 4. Results
(a) (b)
(c)
Figure 4.26: Figure (a) shows the areas of interest selected by the human operator for closerinspection. Figure (b) shows path planned by the two-step path planner while the Figure(c) shows the Mission Planner screenshot showing the LAI flight path.
4.12. Phase-2 Flight Operations: LAI Flights 61
flying during the mission in front a shed, and the Figure 4.27b shows the screenshot of the
video streamed by the on-board camera of the UAV to the ground station-2.
The LAI took-off, visited all the goals communicated to it by the Dronekit code, and landed
back to the starting position. During this the on-board Raspberry Pi streamed live video to
the local ground station-2 over a wireless network.
This flight operation successfully demonstrated the capability of the system to allow a human
user to select areas of interest for closer inspection using a user-interface, and remotely
trigger the path planning, and autonomous flight navigation of the UAV to follow an optimal
visitation plan. The video streamed at the local ground station-2 was observed by the human
operator to locate the target, and complete the search operation.
62 Chapter 4. Results
(a)
(b)
Figure 4.27: Figure (a) shows the LAI during the flight and Figure (b) shows the screenshotof video streamed by the on-board TIS camera to the ground station-2.
Chapter 5
Summary & Conclusions
This works discussed a cooperative system capable of performing search and rescue operation
in a partially known environment using multiple UAVs with human operator in the loop.
Two field experiments were conducted on a target area near the KEAS Lab on the Kentland
Farms, that demonstrated the capabilities of the system in performing a search and rescue
operation with multiple UAVs, and a human operator in the loop.
The first phase of flight operations was used to gather information about the latest state of
the environment. This data was used to generate or update representation of environments
like 3D reconstruction, mosaic image, occupancy image, and network graph. Contours were
drawn around the obstacles to facilitate their exploration. These contours served as the basis
of network graph, with contour vertices as the nodes of the graph. Non-contour edges, in
addition to the contour edges, were drawn to connect all the vertices without intersecting
any obstacles. A non-contour edge cost factor (ECF) was introduced in the network graph
which was a multiplier to the edge cost of non-contour edges. Using this factor to control
the cost of non-contour edges, the degree of exploration of the sheds could be controlled. A
high value of ECF, forced path planner to include more contour edges in the optimal plan,
while a low value of ECF lead to an optimal path which avoided contour edges, and in turn
shed exploration.
A User-interface was created that allowed human operator to work cooperatively with UAV,
and guide the search operation by marking the areas of interest. This was done by simply
63
64 Chapter 5. Summary & Conclusions
drawing rectangles over the areas of interest over the mosaic image being displayed by the UI.
Equipment sheds, and other structures were preferred sites of exploration. A two-step path
planner was designed to find an optimal mission plan to visit the multiple goals for closer
inspection. The system utilizes the human perception, and intuition, while maintaining the
autonomy and efficiency of the robots. This is an example of a system that is capable
of leveraging the competent abilities from both the humans and the robots. A navigation
module was developed that communicated the target visitation goals to the UAV, guiding
it to follow the optimal mission plan generated by the planner.
The second phase of flight operations demonstrated the cooperative search capability by
allowing a remote human operator to identify the areas of interest, and remotely commanding
the UAV to visit these areas autonomously to gather detailed information while following
an optimal plan of visitation. The live video streaming capabilities, using the on-board
camera system, from the UAV to the local ground station over the wireless network were
also successfully tested.
This system serves as an enabler for human-robot cooperation during the search operations.
This work, however, has not explored the possibility of using multiple human operators.
However, with some modifications this system could provide an important platform for
a multi-human and multi-robot cooperation. This could include scenarios where multiple
humans, and UAVs are actively involved in the search operation on the target environment,
while communicating, and sharing their observations for an efficient search.
The system, in it’s current state, does not operate in real time. However, this work sets
up a framework which could be enhanced to create a real-time cooperative system. One
of the major component to achieving this would be an efficient on-line image mosaicking
module. Upgrading the HAS to equip it with a stereo-camera systems could also provide
the ability to generate 3D maps of the environment in real-time. This could be used for
65
path planning of the LAI in 3D space, which would add to exploration capabilities of the
system. Upgrading or replacing the camera system to ensure more reliable SDK support
could be investigated. Based on the experience during field operations, the current camera
system, and it’s associated SDK was found prone to inconsistencies. A more reliable means
of communication between the on-board computer and the ground station could be explored.
It’ll likely improve the quality and latency of the video stream from the on-board camera
to the local ground station. This could involve using antennas to enhance the WiFi signal
of the Raspberry Pi. Setting up a high strength local WiFi using a WiFi router is another
alternative, however, given it would make the field experimentation less authentic as this
may not be possible during a search and rescue operation. This will also be helpful in scaling
up the system to be used for a larger area. When search area is increased, additional UAVs
can be added to the system. In such cases, a situation can arise where a UAV may not be able
to communicate with the ground station directly due to large distance between them. To
overcome this, the communication system could be designed to use UAVs as communication
array to allow communication between a remote UAV, and the ground station. The system’s
workflow, and the communication between its modules is unidirectional. This, however, can
be modified by using a Robot Operating System (ROS) based architecture, which will allow
modules to communicate through messages. This could be used to make the system more
flexible and allow pausing, and modifying the mission during the operation.
Bibliography
[1] Dronekit-python documentation. URL https://dronekit-python.readthedocs.io/
en/latest/about/overview.html.
[2] Mlrose documentation. URL https://mlrose.readthedocs.io/en/stable/source/
tutorial2.html.
[3] Opencv documentation. URL https://docs.opencv.org/3.4/d1/d89/tutorial_py_
orb.html.
[4] J. L. Burke and R. R. Murphy. Human-robot interaction in usar technical search: two
heads are better than one. In RO-MAN 2004. 13th IEEE International Workshop on
Robot and Human Interactive Communication (IEEE Catalog No.04TH8759), pages
307–312, 2004.
[5] Cabreira, T.M.; Brisolara, L.B.; Ferreira Jr., P.R. Survey on Coverage Path Planning
with Unmanned Aerial Vehicles. Drones, MDPI, Vol. 03 No. 01, 2019.
[6] M. Erdelj and E. Natalizio. Uav-assisted disaster management: Applications and open
issues. In 2016 International Conference on Computing, Networking and Communica-
tions (ICNC), pages 1–5, 2016.
[7] Dario Floreano and Robert J. Wood. Science, technology and the future of small
autonomous drones. Nature, 521(7553):460–466, May 2015. ISSN 1476-4687. doi:
10.1038/nature14542. URL https://doi.org/10.1038/nature14542.
[8] Francesco Mancini, Marco Dubbini, Mario Gattelli, Francesco Stecchi, Stefano Fabbri
and Giovanni Gabbianelli. Using Unmanned Aerial Vehicles (UAV) for High-Resolution
66
BIBLIOGRAPHY 67
Reconstruction of Topography: The Structure from Motion Approach on Coastal En-
vironments . Remote Sensing, MDPI, Vol. 05 No. 12, 2013.
[9] P. Ganesan, V. Rajini, B. S. Sathish, and K. B. Shaik. Hsv color space based seg-
mentation of region of interest in satellite images. In 2014 International Conference
on Control, Instrumentation, Communication and Computational Technologies (ICCI-
CCT), pages 101–105, 2014.
[10] M. A. Goodrich, J. L. Cooper, J. A. Adams, C. Humphrey, R. Zeeman, and B. G.
Buss. Using a mini-uav to support wilderness search and rescue: Practices for human-
robot teaming. In 2007 IEEE International Workshop on Safety, Security and Rescue
Robotics, pages 1–6, 2007.
[11] Michael A. Goodrich, Bryan S. Morse, Damon Gerhardt, Joseph L. Cooper, Morgan
Quigley, Julie A. Adams, and Curtis Humphrey. Supporting wilderness search and
rescue using a camera-equipped mini uav. Journal of Field Robotics, 25(1‐2):89–110,
2008. doi: 10.1002/rob.20226. URL https://onlinelibrary.wiley.com/doi/abs/
10.1002/rob.20226.
[12] Ramesh Jain, Rangachar Kasturi, and Brian Schunck. Machine Vision. 01 1995. ISBN
978-0-07-032018-5.
[13] Dr. Creed Jones. Lecture notes in computer vision, Fall 2019.
[14] Steven M. LaValle. Planning Algorithms. Cambridge University Press, 2006.
[15] Sven Mayer, Lars Lischke, and Pawel W. Woźniak. Drones for Search and Res-
cue. In 1st International Workshop on Human-Drone Interaction, Glasgow, United
Kingdom, May 2019. Ecole Nationale de l’Aviation Civile [ENAC]. URL https:
//hal.archives-ouvertes.fr/hal-02128385.
68 BIBLIOGRAPHY
[16] Agoston Restas. Drone Applications for Supporting Disaster Management. World Jour-
nal of Engineering and Technology, Vol.03No.03:6, 2015. URL 10.4236/wjet.2015.
33C047.
[17] Jürgen Scherer, Saeed Yahyanejad, Samira Hayat, Evsen Yanmaz, Torsten Andre, Asif
Khan, Vladimir Vukadinovic, Christian Bettstetter, Hermann Hellwagner, and Bern-
hard Rinner. An autonomous multi-uav system for search and rescue. In Proceedings
of the First Workshop on Micro Aerial Vehicle Networks, Systems, and Applications
for Civilian Use, DroNet ’15, page 33–38, New York, NY, USA, 2015. Association for
Computing Machinery. ISBN 9781450335010. doi: 10.1145/2750675.2750683. URL
https://doi.org/10.1145/2750675.2750683.
[18] Mario Silvagni, Andrea Tonoli, Enrico Zenerino, and Marcello Chiaberge. Multipurpose
uav for search and rescue operations in mountain avalanche events. Geomatics, Natural
Hazards and Risk, 8(1):18–33, 2017. doi: 10.1080/19475705.2016.1238852. URL https:
//doi.org/10.1080/19475705.2016.1238852.
[19] S. Sural, Gang Qian, and S. Pramanik. Segmentation and histogram generation using
the hsv color space for image retrieval. In Proceedings. International Conference on
Image Processing, volume 2, pages II–II, 2002.
[20] Dr. Pratap Tokekar. Lecture notes in advanced robot motion planning, Fall 2018.
[21] T. Tomic, K. Schmid, P. Lutz, A. Domel, M. Kassecker, E. Mair, I. L. Grixa, F. Ruess,
M. Suppa, and D. Burschka. Toward a fully autonomous uav: Research platform for
indoor and outdoor urban search and rescue. IEEE Robotics Automation Magazine, 19
(3):46–56, 2012.
Appendices
69
Appendix A
Path Planned for different set of
visitation goals
The following figures display the optimal path generated by the path planner for set of
randomly chosen visitation goals in the mosaic image. Each subfigure display a step of the
visitation plan, which is an optimal path between two consecutive visitation goals. The paths
are drawn in green color over the mosaic image, while the target locations for visitations are
identified by the small red-circles. The blue circles, next to each red-circle, indicate the
closest contour vertex to the corresponding visitation goals. Each visitation goal has an
associated closest contour vertex.
The optimal visitation plan starts, and ends at the same location while visiting each of the
goal locations only once. These paths were generated for ECF value of 3, and goes on to
show that the path generated does not intersect with any of the obstacles (equipment-sheds
and structures) in the scene.
The figures A.1 and A.2 display the optimal visitation plan generated for two different set
of visitation goals.
70
71
(a) Step-1 (b) Step-2
(c) Step-3 (d) Step-4
72 Appendix A. Path Planned for different set of visitation goals
(e) Step-5 (f) Step-6
Figure A.1: Path generated by the path planner for a random set of visitation goals -1
73
(a) Step-1 (b) Step-2
(c) Step-3 (d) Step-4
74 Appendix A. Path Planned for different set of visitation goals
(e) Step-5 (f) Step-6
Figure A.2: Path generated by the path planner for a random set of visitation goals -2
Appendix B
Pixel to GPS transformations
B.1 Data values
The following data shows the pixel coordinates and GPS coordinates for various reference
locations displayed in Figure 4.24.
Point Pixel Coordinates GPS CoordinatesA [214, 1286] [37.1971799, -80.5790688]B [294, 1351] [37.1972791, -80.5791855]C [555, 1023] [37.1969814, -80.5795444]D [474, 959] [37.1968928, -80.5794747]E [254, 803] [37.1967096, -80.5794233]F NA NAG [307, 473] [37.1963707, -80.5793546]H [436, 584] [37.1965260, -80.5795215]I [515, 416] [37.1963677 , -80.5793674]J [561, 353] [37.1963524 , -80.5796696]K [639, 410] [37.1963836 , -80.5797568]L [590, 476] [37.1964408 , -80.5797005]M [731, 598] [37.1965692 , -80.5798473]N [795, 631] [37.1966120 , -80.5799179]O [767, 686] [37.1966660 , -80.5798896]P [700, 651] [37.1966153 , -80.5798234]Q [700, 702] [37.1966526 , -80.5798036]R [764, 739] [37.1967054 , -80.5798775]S [727, 801] [37.1967735 , -80.5798201]T [664, 766] [37.1967258 , -80.5797420]
Table B.1: The table shows the data for the Pixel coordinates and GPS coordinates for thereference points
75
76 Appendix B. Pixel to GPS transformations
B.2 Homography matrix
The Homography matrix obtained from the above process is displayed below.
Homography Matrix =
−5.53684599e− 03 −4.68124529e− 03 3.71960092e+ 01
1.19938171e− 02 1.01428956e− 02 −8.05791246e+ 01
1.48856639e− 04 −1.25873081e− 04 1.00000000e+ 00