a low cost, vision based micro helicopter system for ...c3p0.ou.edu/irl/theses/meyer-ms.pdfthis...
TRANSCRIPT
A Low Cost, Vision Based Micro Helicopter System for
Education and Control Experiments
Jonathan Meyer
April 2014
UNIVERSITY OF OKLAHOMA
GRADUATE COLLEGE
A LOW COST, VISION BASED MICRO HELICOPTER SYSTEM
FOR EDUCATION AND CONTROL EXPERIMENTS
A THESIS
SUBMITTED TO THE GRADUATE FACULTY
in partial fulfillment of the requirements for the
Degree of
MASTER OF SCIENCE
By
JONATHAN W. MEYERNorman, Oklahoma
2014
A LOW COST, VISION BASED MICRO HELICOPTER SYSTEMFOR EDUCATION AND CONTROL EXPERIMENTS
A THESIS APPROVED FOR THESCHOOL OF AEROSPACE AND MECHANICAL ENGINEERING
BY
Dr. David Miller, Chair
Dr. Zahed Siddique
Dr. Harold Stalford
c© Copyright by JONATHAN W. MEYER 2014All Rights Reserved.
DEDICATION
To my parents, Randy and Kathy. Thank you for all your support.
Acknowledgements
I would like to thank Dr. David Miller, my advisor and mentor, for his advice and infinite
patience. I would also like to thank my committee members, Dr. Harold Stalford and Dr.
Zahed Siddique.
To Braden McDorman and Nafas Zaman, thank you for your help in developing software
and interfacing with KISS IDE. Also of invaluable help were the many enthusiasts of computer
vision, electronics, and toy helicopters whose collective knowledge made this work possible.
Last, but not least, I want to thank my parents for their patience, understanding, and
support.
iv
Contents
List of Figures vii
1 Introduction 11.1 Statement of Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Review 52.1 Micro Air Vehicle Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Alternative Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Computer Vision and Object Tracking . . . . . . . . . . . . . . . . . . . . . . . 92.4 Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5 Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.6 Concluding Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Methodology 203.1 Requirements and Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Hardware Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Software Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4 Algorithms and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 Fiducial Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.2 Vision-Based Helicopter Tracking . . . . . . . . . . . . . . . . . . . . . . 293.4.3 Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.4 Helicopter Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 The Tracking Component 364.1 Reference Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2 By-Estimate Blob Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.3 By-Threshold Blob Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.4 Marker Candidate Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.5 Candidate Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6 Closing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Feedback Control Component 535.1 The Reference Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.2 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3 Master Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.4 Channel Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.1 Yaw Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.4.2 Pitch Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
v
5.4.3 Throttle Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.5 Closing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6 Tests and Results 646.1 Software Validation of Chosen Algorithms . . . . . . . . . . . . . . . . . . . . . 65
6.1.1 Software Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Vision Component Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.1 Rotation Detection Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 666.2.2 Speed of Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.2.3 General Tracking Performance . . . . . . . . . . . . . . . . . . . . . . . 72
6.3 Feedback Controller Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.3.1 Feedback Control Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.4 Failure Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.5 Currie’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.6 Concluding Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7 Conclusion 817.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.1.1 New Hardware Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . 827.1.2 Improved Marker Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 837.1.3 Tracking Algorithm Improvements . . . . . . . . . . . . . . . . . . . . . 847.1.4 Control Algorithm Improvements . . . . . . . . . . . . . . . . . . . . . . 857.1.5 Signal Transmission Device Improvements . . . . . . . . . . . . . . . . . 857.1.6 Education Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8 Bibliography 87
A User Manual 91A.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.1.1 Tracking Software Compilation . . . . . . . . . . . . . . . . . . . . . . . 92A.1.2 Micro-Controller Setup and Programming . . . . . . . . . . . . . . . . . 92A.1.3 Visual Marker Construction . . . . . . . . . . . . . . . . . . . . . . . . . 92A.1.4 Running the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A.2 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94A.3 Hacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
vi
List of Figures
1.1 Stock Syma S107G Toy Helicopter . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Binary Thresholding Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 A Histogram Tracking Operation to Detect a Human Face [40] . . . . . . . . . 132.3 Background Subtraction Before and After . . . . . . . . . . . . . . . . . . . . . 142.4 Example of Canny Edge Detector . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5 Feature Matching Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Arduino Uno R3 [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Reference 3-Color Fiducial Marker Constructed From Styrofoam . . . . . . . . 273.3 Fiducial Marker Attached to Syma Helicopter . . . . . . . . . . . . . . . . . . . 283.4 Syma S107 Signal Indicating Full Throttle and Neutral Yaw, Pitch, Trim . . . 32
4.1 A Tracked Helicopter Marker with Rotation Estimate in Degrees . . . . . . . . 404.2 Result of by-estimate method on figure 4.1 . . . . . . . . . . . . . . . . . . . . . 424.3 Result of by-threshold method on figure 4.1 . . . . . . . . . . . . . . . . . . . . 454.4 Example of Single Color Fixation . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.1 Graphical summary of most expensive functions during profiling . . . . . . . . 666.2 Measured and Ground Truth Helicopter Rotation at 3 ft . . . . . . . . . . . . . 676.3 Error in Degrees between Measured and Ground Truth Rotation at 3ft . . . . . 686.4 Measured and Ground Truth Helicopter Rotation at 6 ft . . . . . . . . . . . . . 696.5 Error in Degrees between Measured and Ground Truth Rotation at 6 ft . . . . 706.6 Demonstration of Helicopter Motion Blur at 30 Frames per Second . . . . . . . 716.7 Throttle and Elevation Error For Hovering Helicopter . . . . . . . . . . . . . . 736.8 Throttle and X Axis Error For Hovering Helicopter . . . . . . . . . . . . . . . . 746.9 Normalized and Unnormalized Elevation Error . . . . . . . . . . . . . . . . . . 756.10 Throttle and Elevation Error For Hover Command with Integral Control . . . . 766.11 Throttle and X-Axis Error For Hover Command with Integral Control . . . . . 77
vii
Abstract
Due to a push to improve K-12 STEM education using robotics, and the popularity and high
cost of autonomous flying robots, there exists a niche for a small, low cost, accessible, and
hackable drone platform designed for students and enthusiasts. This work presents a low cost
(<$50 USD) supplement to a laptop that allows a student to achieve positive feedback control
of a Syma S107G micro helicopter. Helicopter tracking is achieved by detecting a color-based
fiducial marker, attached to the bottom of the helicopter, in a digital image captured from a
standard web-camera. A custom software system, written in C++ with OpenCV, detects the
presence of each color on the marker in an image and assembles these colors into an estimation
of the pose of the helicopter. A set of PID controllers then computes an appropriate signal for
the helicopter and relays it using an Arduino microcontroller and an IR LED. The resulting
system is capable of hovering a Syma helicopter, indoors, at a specific image coordinate and
rotation, to within a few inches and 20 degrees, until battery failure. The vision system runs
in real time at 30 Hz and tracks reliably with even lighting and no significant environmental
influences. The helicopter can be programatically moved by altering the controller setpoint.
Also developed is a simple to use C interface to the control system and documentation for
underlying components. Most importantly, the resulting system lowers the barrier for students
and enthusiasts to explore concepts in robotics, such as computer vision and control systems,
on an aerial robot.
viii
Chapter 1
Introduction
This thesis presents the design and implementation of a low cost, vision based micro helicopter
system for education and control experiments. In this chapter, the topic, motivation, and
organization of this thesis are briefly introduced. For students and instructors interested in
how to quickly get setup, please see Appendix A.
1.1 Statement of Problem
In 2007, the United States Congress commissioned a pre-eminent committee of educators,
scientists, and engineers to produce a report on how the United States can maintain its
position as a world leader in technology and science, and maintain a bright economic future
[49]. This committee’s number one recommendation was to vastly improve K-12 science
and mathematics education. As a part of this push, the National Science Foundation, the
Department of Education, state governments and many corporations have devoted money
and expertise toward improving STEM education with robotics programs [13][22].
Robotics naturally integrates many technical disciplines as the design and production of
a robot involves a wide range of technical skills from many disciplines, including science,
mathematics, programming, electronics, mechanics, signal processing, and computer science.
Furthermore, students working in teams to solve problems in robotics must also practice
essential life skills such as teamwork, design, problem solving, and resource management.
As a tool for teaching programming and computer-science, robotics provides an exciting
1
opportunity for students to see the results of their learning come to life before them. One of
the primary challenges in teaching and learning programming is the practice’s fundamentally
abstract nature. Students may struggle to see the practical relevance of their learning, espe-
cially early in their education. On the other hand, one of programming’s greatest educational
aspects is its rapid feedback cycle for a creative endeavor. Robotics education offers a way
to make programming concepts less abstract without harming the creative, short-feedback
cycle. As an example, the programming concepts relating to loops and conditionals might be
reinforced by having a student implement navigation by sensory servoing.
Unfortunately, getting started in robotics often requires significant knowledge, money, and
hardware, presenting a high barrier to entrance to students of the discipline. To counteract
this, various organizations have produced robotics kits that come with instructions and all of
the pieces required to get started. These kits may be divided into two major categories: those
meant for teams of students such as KIPR [9] and FIRST [7], and those meant for individual
students or enthusiasts. Often, these platforms share hardware, especially the computer
processors, as in the case of the Lego Mindstorm [11]. The Arduino [3] is an 8-bit micro-
controller platform that has enjoyed a surge of popularity among electronic hobbyists and
should also be considered when talking about robotics kits, though it requires other hardware
to be interesting. There exists a niche in the individual educational robotics products for
something in the low cost market. In particular, there is a niche for flying robots, a domain
currently dominated by expensive research equipment and augmented reality video games.
Unmanned aerial vehicles, or drones, are one of the hot topics of the day and their use has
rapidly expanded beyond military operations to new domains like fast product shipping [1]
and aerial photography [18]. There is opportunity for a low cost, accessible, and hackable
“small scale” drone platform to make a difference in the educational robotics movement.
To that end, the goal of this thesis is the creation of a flying robot system suitable for use
in education that:
1. is inexpensive,
2. is programmable,
3. is easy for a single person to use,
4. is hackable,
2
5. provides closed loop control, and
6. works in a class room environment.
Stated succinctly, this thesis demonstrates that it is possible to build a low cost (≈ $50
USD) supplement to a laptop that will allow a STEM student to experiment with aerial
robotics in a classroom environment that exhibits these properties. This thesis presents a
complete control system that satisfies this requirement using computer vision techniques with
a fiducial marker to exert positive control on a Syma S107G microhelicopter, seen in Figure
1.1. This system is divided into generic components that perform tasks of sensing/pose
estimation, control calculation, and signal transmission with reference implementations for
each one that work with the Syma S107G. These components can be swapped to provide
control for new hardware platforms.
Figure 1.1: Stock Syma S107G Toy Helicopter
1.2 Thesis Road Map
This thesis discusses the construction of a feedback control system for a toy helicopter in
terms of both philosophy/architecture and actual implementation. Those interested in get-
ting started with the software as quickly as possible may skip directly to Appendix A, the
3
User Manual. This appendix provides simple examples and documentation of the reference
implementation being used to control a Syma Helicopter. It also discusses the interfaces of
each component and is an excellent starting point for those looking to hack on this system or
extend control to a new device.
For those interested in the theory behind the vision and control components of the system,
chapter 2 provides a literature survey of fields relevant to the control of micro air vehicles and
object tracking through computer vision. Chapter 3 outlines the general methodology and
decision making process behind the hardware choices and each component of the reference
control system. In addition, this chapter discusses the signal transmission component of the
control system, responsible for digitally transmitting control signals over an IR interface.
Chapter 3 provides an excellent starting point for those looking to understand how the entire
system works under the hood. Chapters 4 and 5 go into detail about the precise logic behind
the operation of the vision/tracking system and the control/feedback systems respectively.
Chapter 6 presents tests and experiments that verify the functionality of different components
of the reference system. It also presents a discussion on the merits and shortcomings of the
reference system as well as suggestions for future improvements. Appendix A documents
the reference implementations application programming interface as well as the necessary
mechanisms to begin developing new components that will fit in this system.
4
Chapter 2
Literature Review
The following section presents an investigation into existing options and methods for flying
robots. A complete, autonomous robot consists of a hardware platform, a means of sensing the
environment, and logic that integrates sensing with hardware (often expressed in software).
This review is organized into two sections. The first section examines existing micro air
vehicle research with respect to the project goals. Different system architectures, specifically
on-board versus external sensing, are considered; additionally, there is an investigation into
low-end consumer products such as toy helicopters and their viability as an autonomous
platform. The second section is an investigation into existing solutions for sensing and control,
including relevant computer vision methodologies.
2.1 Micro Air Vehicle Research
The GRASP Laboratory at the University of Pennsylvania is perhaps the most visible re-
search group developing autonomous micro-helicopter systems. Their research focuses on the
creation of robust control algorithms for flying robots, using purchased platforms ranging
from 750 grams in weight to more than 2 kilograms. This work has applications for the fields
of surveillance, precision farming, search and rescue, and more. Most relevant to this thesis
is GRASP’s multiple MAV test bed [44], and its primary flying platform, the off-the-shelf
ASCTEC Hummingbird MAV. For state estimation, the lab is equipped with a 20 camera
VICON motion capture system which operates at 375 Hz, measures positions to an incredible
5
accuracy of 20 micrometers, and maintains tracking even if all but one camera is occluded.
Each MAV is equipped with inertial measurement units (IMUs), electronic devices that mea-
sure velocity, orientation, and gravitational forces with a combination of accelerometers and
gyroscopes. These devices inform an internal control loop which in turn sets motor speeds at
600 Hz to maintain a desired heading and position. The motion capture system records MAV
positions at 100 Hz and relays global commands to individual units at the same rate. Each
Hummingbird then performs any further processing with a 600 MHz ARM processor running
a version of the Robot Operating System (ROS).
For many research groups, the ultimate goal is the production of MAVs that are fully
capable of operating autonomously in the real world under possible harsh conditions. Toward
this goal, these expensive and elaborate sensor mechanisms are a means to ensure that the
feedback and state estimate components of a tested control system is as precise as possible.
Other research groups focus on techniques for state estimation using only on-board sensing
with an aim of producing fully autonomous helicopters. [51] combines control algorithms
developed in the GRASP test bed with on board sensing in the form of 2 wide-angle cameras
operating at 20 Hz and an IMU updating at 100 Hz. Other researchers at ETH Zurich have
produced systems that rely not on stereo vision or depth cameras, but on optical flow, a
technique that refers to the pattern of apparent motion in a camera scene when there is
relative motion between the camera and scene. Even so, these systems still rely on IMUs to
provide stability while using their more advanced techniques for pose estimation.
While inspiration may certainly be drawn from these pioneers in the field of micro air
vehicle control, the expensive hardware they employ and the technical expertise they require
puts them well outside the reach of most primary schools and enthusiasts. For example, the
Hummingbird MAV vehicle used by GRASP retails for approximately $5000 [4]. A Vicon
motion capture system costs tens of thousands or more [55]. Even less expensive sensor
systems, such as the depth cameras used on Kumar’s autonomous vehicles, cost at least a
hundred dollars per unit [45]. Low cost units such as the Parrot AR Drone and the crowd
funded R10 platforms typically cost no less than $200 and often much more when equipped
with all necessities for autonomous flight [2]. Schools and enthusiasts, however, do not have
the same set of requirements for rigor and accuracy that cutting edge research does. There
may be alternative platforms and sensing models better suited to their needs.
6
2.2 Alternative Platforms
Many of the previously mentioned platforms are notable for having their robots perform at
least some of the calculations required for controlled motion. This architecture is natural
for the development of a robust and autonomous system but there is a primary drawback.
For the purposes of educational research, the equipment needed for effective on-board sensing
tends to make the associated platform more complex, more expensive, and physically larger.
To mitigate these issues as much as necessary, many systems (such as those at the GRASP
lab) use a hybrid approach where high frequency information such as IMU data is handled
on-board and other jobs are performed off-board (e.g., expensive ones like path planning).
If an on-board sensing and control approach is not feasible because of weight limits, size
constraints, cost considerations, or design choice, then external sensing is required to achieve
sensing and close the loop. A particularly relevant research group is the Automatic Control
Laboratory at ETH Zurich where small co-axial helicopters are controlled via external cameras
[50]. In particular, this setup used a single VICON depth camera tracking four optical markers
attached at cardinal directions on a small co-axial helicopter. The camera provides feedback
information about the system pose and an external controller interpreted that information
and issued appropriate commands. Students at Stanford have also constructed a tracking
system for a larger-scale flying helicopter that uses an array of three uncalibrated digital
cameras at known locations. These cameras relay images to a central computer that performs
background subtraction to identify the helicopter and triangulate its position at 30 Hz [43].
The hardware on these external-sensing based systems tends to be simpler and cheaper
than hardware no their on-board counterparts. This comes at the cost of generality; that
is, having to set up external sensors means that robots only operate in specific areas and
under conditions that allow for the platform to be effectively sensed. Furthermore external
cameras and sensing equipment typically operate at lower frequencies than an on-board IMU
at similar price points which can present difficulties in highly dynamic systems. If this is
not an issue, however, the limited range, low cost, and simplicity are appealing benefits for a
small, educational system meant for a class room environment.
Hardware options for this sensing/control model are far more diverse. An entire world
of ‘toy’ helicopters that are inexpensive, simple, widely available, and robust is available.
7
These systems are typically divided along feature lines such as the control interface (radio vs.
infrared) and platform style (co-axial vs. quad-rotor). Radio controlled interfaces operate on a
radio spectrum that allows many units to be individually flown at a time, typically over longer
ranges than infrared, without line of sight, and allow operation both indoors and outdoors.
On the other hand, infrared interfaces implement different channels either in hardware by
changing the signal modulation or in the protocol of the signal itself. This means that if
multiple infrared hardware platform are used at once, there may be interference. Infrared
also has shorter range and outside operation is limited due to the effects of the sun. For all
its disadvantages, infrared is cheaper and often easier to implement, typically requiring only
a single LED and resistor (the total cost of which is often less than a dollar) in addition to
a device to drive the LED. In certain cases, its narrow broadcast angle and limited range
can be advantageous if multiple devices accept the same input and only should be controlled.
A micro air vehicle platform built from components has the luxury of choosing its means of
communication and control. If using an existing platform, then that platform should be chosen
with consideration for its communication protocol because it usually difficult to change.
In the platform-style debate, quad-rotors are traditionally more stable, easier to tune and
repair, but are more expensive. Examples of viable off-the-shelf radio controlled helicopters
include the $40 Syma X1 [52] and the similarly priced Estes 4606 Proto X Nano [23]. On
the co-axial side, the Chinese brand Syma has inundated the market with a huge variety of
model helicopters. Their least expensive model, the $20 Syma S107, is the most popular;
this may be attributed to its inexpensiveness, robust construction, available replacement
parts, and friendliness toward hackers. In particular, at the beginning of this research, there
was already approximately 800 pages of discussion on RCGroups.com forums regarding the
Syma helicopter, much of it devoted to creating alternative controllers [19]. In May 2011, a
blogger named Agustin Vergottini published an article discussing his efforts to sniff out the
Syma protocol and create a simple controller for it [54]. Many other articles followed suite
using more advanced equipment such as logic analyzers and field-programmable gate arrays
(FPGAs) to bring Syma controllers to greater sophistication [38].
Despite the interest, there did not appear to be significant effort in the generation of
automated controls systems - merely alternative forms of remote control. Late in the process
of writing this thesis, very similar research [34] used an Arduino MCU, IR Led, a Syma S107,
8
and a Microsoft Kinect depth camera to achieve an autopilot for this system. Currie’s work
with respect to this thesis will be discussed in greater detail in the Results section.
2.3 Computer Vision and Object Tracking
Whatever the system architecture for a flying robot, there exists a need for sensors to ‘close
the loop’ and provide feedback for control inputs. This section of the thesis focuses on a
particular class of methods for achieving this closing. The choice of hardware for this task
must be guided by the physics of the system. For example, the sensor must be capable of
detecting the helicopter in the entire operational zone. If the helicopter can be constrained
to flying up and down, then a variety of range finding transducers from sonar to IR to laser
are viable. These sensors can be polled very quickly and return information that represents
physical units of distance with little or no processing. On the other hand, if the helicopter is
allowed to fly unconstrained, then it is unlikely that any sensor with a narrow field of view
will be useful for general control. The existing literature on the control of micro air vehicles
focuses almost entirely on the use of computer vision to solve this problem.
Digital cameras offer the flexibility, resolution, and wide sensing range that is required to
provide control for a dynamic system like a micro helicopter. The cameras themselves come
in many varieties. Much of the existing cutting edge research in camera based control systems
use motion capture cameras, purpose built cameras that operate at 100s of hertz and (most
frequently) use special IR reflective tags attached to the tracked medium to locate it [56].
In recent years, depth sensing cameras have become far more prevalent after being heavily
pushed by the game industry. These cameras, such as the Microsoft Kinect [45] or Asus Xtion
[5], provide a user with a stream of pixel images where each pixel is characterized by a red
value, a green value, a blue value, and a depth approximation. Other users have used the
simpler, commercially available cameras that come equipped with most modern computers.
While these cameras do not provide depth information, they are far more common and much
less expensive. Motion capture cameras are typically priced in the thousands of dollars;
Vicon’s own ‘affordable’ brand, the Bonita, starts at $12,500 USD for two cameras and simple
supporting hardware/software [55]. Depth cameras begin at approximately $100 [45]. Web
cameras, however, cost tens of dollars and more often than not are a sunk cost for users.
9
An important side note regarding camera selection is that chosen web cameras should ideally
support the USB Device Class Definition for Video Devices (UVC) standard so that generic
drivers can be used. A list of compatible devices for Linux systems, which should also work
on other systems, is maintained by the Linux UVC Development community [12]. Modern
laptops are typically equipped with cameras that are compliant with this standard.
The interpretation of information from these devices falls to the techniques of computer
vision which involves the acquisition, processing, analyzing, and understanding of images
often with the assistance of knowledge from the domains of physics, statistics, geometry, and
learning theory [39]. The field is a wide one, but of particular interest to this thesis is the
sub-domain of video tracking. A tracking algorithm for closed loop control of a helicopter
needs to perform the following tasks in real time:
1. identify target object in scene,
2. follow the object as it moves through the video sequence, and
3. estimate the state of the system (including positions, derivatives, etc.).
Common challenges to all vision algorithms are the effects of variations in illumination,
occlusion of objects, and background noise. Illumination variation refers to a scenario where
a significant change in brightness occurs in an image stream which may affect the color and
appearance of objects in a scene over time. Such disturbances may result from changes in the
environment or changes to the camera’s internal properties such as a change in exposure time.
Occlusion of objects refers to a case where the tracked object is at least partially obscured
by some other feature of the image such as if a helicopter were to fly behind a chair in the
middle of an image. Background noise refers to features or changes in the environment (not
the tracked object) that lead to difficulties in tracking the object. Examples include extra
objects or colors similar to the tracked target, or excessive background motion. Low light
conditions can also cause excessive noise in images [57].
There is no general solution to object recognition in computer vision; instead, there exists
a huge collection of algorithms that work well for certain domains and restrictions and poorly
for others. What follows is a brief survey of some popular methods for object recognition and
tracking using two-dimensional RGB-encoded-color digital images.
10
Perhaps the simplest technique for recognition and tracking is to use the color information
encoded into images directly by way of thresholding [28]. This is a binary operation that asks
if a given pixel is above a give value - if it is, that pixel is updated with a designated value
indicating a positive. Otherwise it is given a designated value indicating a negative. A popular
convention in computer vision is to set a pixel to the greatest meaningful value for the image
type to represent “true” and to set a pixel to the minimum meaningful value for “false”.
thresh(pixel) =
maxV al pixel > thresh
minV al pixel ≤ thresh
Two threshold operations can be combined to determine if a pixel’s value is in a given
range. One operator tests the lower bound of the range and the second tests the upper value.
The results of both operations are combined with a ‘logical and’ operation to produce the
final result. Range thresholding is typically not implemented in this manner because of speed
concerns, but this abstraction is useful for mathematically understanding range thresholds.
The result of a threshold operation is a binary image, a black and white image where
each pixel has been classified as within tolerances or not. Binary images greatly reduce the
complexity of an image, can be achieved through many methods, and as such are typically
used as the inputs for all sorts of vision algorithms, including contour detection. They can
also be used together with a priori knowledge of the object that is being tracked to detect
that object. This kind of registration requires a model of the object being tracked, especially
if that model’s appearance in an image is a function of its pose. This could be accomplished
by looking for a certain shape or silhouette, for example. The simplest case is tracking a
brightly colored ball in a scene composed of starkly different color. A range based threshold
can be applied to the scene with a range corresponding to the approximate color of the ball
plus or minus some tolerance. The resulting binary image could be searched for a circular
white patch that would represent the 2D projection of the sphere onto the image [15].
A constraint of this method is that the object will need to be brightly colored and in
sufficient lighting for that color to be detected. This method is particularly sensitive to
noise in the forms of lighting changes, shadows, and background color. There are some
pre-processing stages that can help alleviate this issue, including converting RBG images
11
Figure 2.1: Binary Thresholding Operation
to the Hue-Saturation-Value color space where the color of a pixel is represented by single
value on the color wheel. This value, the hue in HSV, is fairly invariant to lighting changes.
Color equalization can also help produce stark contrast in the image and make colors more
detectable.
An alternate means of color tracking uses histograms to perform color based object detec-
tion. Histogram tracking involves computing the color histogram of the target object(s) from
a sample image. This reference histogram can cover one or multiple channels. The bins of the
histogram can then be used as a probability look-up table for future images. Each pixel of a
new image can be classified into the bin it belongs to and the relative height of that bin can be
used as a probability estimation. This resulting probability map can be used in combination
with a variety of cluster finding algorithms to match the center of the target object. Such al-
gorithms include MEANSHIFT and CAMSHIFT [33]. An advantage over pure blob tracking
is that this histogram can represent non-rigid objects that have more than one predominant
color. This method is slightly more computationally expensive than basic thresholding but
is more robust and allows for the estimation of confidences. Intel has successfully used this
method to track human faces on consumer web cams in real time since 1998 [29].
Another popular method for object detection and tracking is background subtraction.
This method takes as input a reference image and a second image to search, and it produces
a gray-scale image where pixel values indicate magnitude of that pixels change (since the
reference). Background subtraction works by analyzing the difference between a reference
image of a scene before any objects are added and a second image taken after the object has
12
(a) Histogram of Skin Color (b) Probability Map From Histogram
Figure 2.2: A Histogram Tracking Operation to Detect a Human Face [40]
been added. The algorithm assumes that the parts of the images that change from the first
to the second image are regions of interest that might contained the object to be tracked. In
effect a third image, gray-scaled and the same size as the first two is created where each pixel’s
value is equal to the difference between the corresponding pixel on the first and second image.
The resulting gray-scale image could then be further processed, such as by thresholding with
some small value that would reduce noise without disrupting object detection [57]. This
method works best when the background is very static and of sufficiently different color from
the tracked object for there to be a meaningful difference. Movement in the background can
obscure object detection or give false positives. To counteract this, some ‘adaptive’ methods
apply a low-pass filter to the input images and produce a background representation that
morphs over time allowing for more robust detection [59].
Another general approach to extracting significant information from images to look not
at the color values directly but at the change in those values over an entire image. Gradient
detection algorithms can be used to provide a representation of the outline or shape of objects
in the scene. There are many approaches to detecting edges in an image [25]. Some, such
as the Sobel operator, approximate the magnitude of the first derivative of an image channel
and the results are then thresholded to retain only the edge pixels. Other techniques involve
13
Figure 2.3: Background Subtraction Before and After
some local averaging, approximation of noise, and higher level derivatives. Examples of these
operations are the Canny edge detector and the Laplacian of Gaussian method. They are
generally more accurate than the Sobel operator in that they produce thinner lines, are
more robust to noise, and avoid detecting lines multiple times but are more computationally
expensive [31].
Here a binary image is made where positive values are likely edges. The resulting contours
can be compared to a database of known shapes using shape contexts and shape context
distance [27]. The best matching reference shape (that with the lowest distance) can be taken
as the classification of the sampled shape. These algorithms take advantage of the observation
that edges in images are largely unaffected by changes in light and color.
An important and more general tracking algorithm, which employs techniques from many
14
(a) Source Image (b) Output of Canny Edge Detector
Figure 2.4: Example of Canny Edge Detector
different image processing techniques, is feature tracking. [53] defines a local feature to be
an image pattern that differs from its immediate neighborhood as a result of a change in one
or more properties of the image. Commonly considered image properties are intensity, color,
and texture, but a feature can also be a corner, edge, or patch of pixels. Once identified as
a candidate, a feature is analyzed by an algorithm that converts the feature to a descriptor
which can be used for matching later. These features should ideally be detectable even after
changes in image scale, noise, illumination, and even some rotation. Feature tracking is
commonly used by computer vision algorithms to stitch several overlapping photos of a scene
into a single, larger photo.
Two images of the same scene or object can be correlated with this technique even if
the images were taken from slightly different perspectives. This is accomplished by running
the same feature detecting algorithm on each image independently. The resulting feature
descriptors of each image can be compared with one another and pairs of descriptors that
refer to the same feature identified. By analyzing the change in position of the same feature
sets between images, it is possible to estimate change in camera orientation relative to the
object between the images. If the feature detecting algorithm finds the same features in two
images despite a change in the object’s pose or image properties such as brightness, it is said
to be invariant to to those changes. State of the art algorithms like Robert Lowe’s SIFT
[42] and Tuytelaars’ SURF [26] algorithms are examples of such detectors that can match
descriptors of an object with up to 60 degrees of rotation out of plane. These algorithms are
fairly complex to implement and can be computationally expensive. As a side note, many
feature tracking methods including the two mentioned here are patent protected and not
15
available by default in most open source computer-vision algorithm distributions.
The above algorithms are not mutually exclusive and for some applications robustness
may be improved by applying a combination of methods. Zhou and Aggarwall found that
application of both color histogram tracking and adaptive background subtraction gave them
the best chance to detect and classify people, cars, motorcycles, and groups of people moving
through space [58]. It is also common for tracking information to be incorporated into some
statistical algorithm to make more reasoned classifications and improve robustness. Zhou and
Aggarwall, to classify objects and resolve problems associated with occlusion, fused frame by
frame tracking information with an extended Kalman Filter. Other researchers at ETH Zurich
have incorporated histogram based tracking with a particle filter to reliably track non-rigid
colored objects [47].
Figure 2.5: An example of a reference image, left, being matched to a complex scene usingfeatures
2.4 Pose Estimation
In some cases, it may be possible to extract 2D to 3D pose information of an object in an
image in addition to that object’s presence. The literature concerned with pose estimation
is very large and a full survey is beyond the scope of this thesis, but an excellent overview
may be found in [25]. Frequently, the workings of both object tracking and pose estimation
16
systems will be tightly coupled. Real time tracking often involves some form of ‘model’ of the
object being tracked (e.g. a specific set of colors, edges, features). If there is an understanding
of how that model varies with pose, it can be used to attempt to classify the pose of the same
object in some given image.
The nature of the model is one way to divide the field of algorithms that estimate pose
[41]. Some algorithms store 2D representations of an object and look for the presence and
orientation of a specific face of a 3D object. An example might be a reference image with
corresponding feature points pre-calculated that would then be compared to an input to see
if a match occurred, and if so, its partial orientation can be obtained. An extension of this
method is to create a composite model from a set of 2D models representing different views
of the same 3D object. By correlating the input image with the best of the different views, a
better estimate of pose can be obtained, however this requires extensive training with images
from many views. [32] represents the state of the art in this field where an object model is
learned from the descriptors gleaned from a set of images and can be used to classify groups
of objects for the purposes of robot grasping.
A more general approach is to use a 3D computer model, such as one that could be
developed in a CAD program. This model can be projected onto an arbitrary 2D surface
by the computer and correspondences can then be drawn between that projection and the
reference image. [41] uses such a method to precisely predict the pose of automobiles using
complex reference models in a way that is largely invariant to lighting. Despite being one
of the fastest methods available, a precise fit still took on about one minute to compute. A
great variety of other methods have been developed under the topic of 2D-3D pose estimation
using models of different types.
A subset of these algorithms drops the requirement to work with a general model and
instead elects to use specially designed ‘landmarks’ which have known geometry and are easily
segmented from a reference image. What these algorithms lose in generality, they make up for
in speed: recovering 6D position information (x,y,z, roll, pitch, yaw) in real-time. Appearing
as early as 1995, the POSIT algorithm [35] has been used to track a precisely modeled object
with at least four identifiable key points with known geometry relative to the model in all six
dimensions in real time. Relevant examples of systems with this philosophy been used by [46]
and [30]. [46] developed a system for real-time position and attitude control of space robots
17
using a geometric landmark detected with color vision at 60 Hz. The landmark consisted of
three color markers positioned at (1, 0), (-1, 0), and (1, 0) meters in a plane. Each frame
produced by the controlled robot is color segmented to identify in markers present in the
screen. By examining the order and relative position of the markers in the segmented image,
6D position relative to the marker can be calculated. Particularly interesting to this thesis
was the fact that this was accomplished with an uncalibrated camera: The angle calculations
were accomplished by examining relative lengths. [30] implemented a similar scheme to control
MAVs using a four marker ground-based landmark. The markers themselves were attached
to ground robots and came in a variety of types, including IR, active color LEDs, and passive
colored markers. This system also uses visual segmentation to detect markers in the frame
and then solves the Perspective-Three-Point problem (P3P) using a closed-form solution also
developed at ETH.
2.5 Control Systems
A state measurement or pose estimation component is only one piece of a functioning control
loop. Also required is some logic that will take the output of the pose estimation and use
that information to bring the pose to the desired value. The parameter being controlled, the
pose of a helicopter, is referred to as the process variable. The value of the process variable
is the setpoint of the controller. The controller itself is technically called a feedback controller
because it adjusts its output based on information it receives about the current state of the
system. Error is the difference between the process variable and the setpoint at a given
moment in time.
A classical approach to implementing this feedback logic for systems that are, or can be
modeled as, linear systems is a proportional-integral-derivate (or PID) controller. This type
of controller sets the input to the controlled system based on a measurement of the current
error (proportional), the accumulated error in the past (integral), and the predicted error
in the future (derivative). This has historically been the method by which feedback control
was developed for systems without dynamic models [48]. Other popular forms of control
use a technique called fuzzy control which more closely resembles the decision making and
inference process that people follow when controlling systems manually [37]. Information
18
about control systems, and PIDs in particular, can be found in many references, including
Modern Control Engineering [48]. For relatively simple systems and simple tasks, classic
control techniques can be tuned to work well. For some very complex systems and tasks,
it is extremely difficult if not impossible to develop a classical control system for the job.
For this class of problems, the techniques of the field of machine learning have shown some
success. For example, researchers at Stanford have employed machine learning to teach a
model helicopter to do incredible acrobatic maneuvers such as inverted takeoffs and flips in
place[20].
2.6 Concluding Thoughts
The following section presents an investigation into existing options and methods for de-
veloping a flying, micro robot. Topics discussed include system architectures and sensing
methodologies. Also in this chapter is a brief survey of popular computer vision techniques
for object recognition and pose estimation. Finally, it briefly touches on control methodolo-
gies. The next chapter lays out the logic behind design decisions made for this thesis in light
of the research presented in this chapter.
19
Chapter 3
Methodology
3.1 Requirements and Specifications
In the introduction of this thesis, the goal of this thesis is defined as the creation of a flying
robot system that:
1. is inexpensive,
2. is programmable,
3. is easy for a single person to use,
4. is hackable,
5. provides closed loop control, and
6. works in a class room environment.
Stated more qualitatively, the goal of this project is the creation of a system that enables
the programmatic control of micro-helicopters. The envisioned target audience for this work
is young students, ages 13+, learning to program in organized classrooms. Educators and
students are often pressed for time and resources, so low costs and ease of use are important
qualities. Cost considerations should consider not just the initial cost of the flying platform,
the controller, and other supporting items, but also long terms cost factors like maintenance
and durability. Ease of use refers to the ability of individual students to get started on a
helicopter control project with a minimum of help.
20
A secondary audience for this system is more advanced programmers, students, enthu-
siasts, and hackers that are curious about computer vision, control systems, and generally
extending the capabilities of machines around them. To this demographic, the quality of soft-
ware architecture, availability of hardware, and documentation are particularly important.
If the software is designed in such a way that components are self-contained and the entire
system is assembled from a well documented set of those components, then it becomes far
easier for a curious individual to affect the changes that he/she is interested in implementing.
This is what is meant by ‘hackable’.
The classroom setting imposes several more restrictions. Components of the system need
to be fairly robust, they must be replaceable, and they must be safe. Furthermore, the control
system should ideally work with multiple systems running simultaneously in a single room
without interference.
A working control system would be defined as the combination of software running on
a user’s computer, a helicopter, and any extra hardware that enables positive control of the
helicopter platform for an easily observed amount of time (many seconds). This system should
ideally be able to move the helicopter with stability between different 3D points and withstand
minor disturbances such as small air currents or momentary loss of vision.
Target conditions for operation, defined as those under which the system performs best,
include a large indoor room with bright, even lighting and a consistent color palette. A single
helicopter and its communication should be able to be effectively constrained to a five to ten
foot bubble to allow for the simultaneous operation of several helicopters in the same room.
Above all, this thesis and associated tools should provide an opportunity to learn about
computer vision, control systems, programming, and robotics.
3.2 Hardware Platform
The Chinese brand Syma is a popular manufacturer of a range of toy helicopters that are
inexpensive (with the cheapest models starting around $17 USD), rugged, and widely available
including replacement parts [24]. Besides the brand’s low cost and durability, the Syma S107
helicopter, in particular, has a few distinct advantages.
Each helicopter is controlled via an infrared light (IR) interface which, as discussed in
21
the previous section, is inexpensive, easy to use, and because of its limited (and sometimes
narrow) range, appealing for use in a possibly crowded room. As an added advantage, the
same hardware and techniques used to issue commands to the helicopter may offer an excel-
lent opportunity to discuss digital communication protocols and create remote controls for a
variety of household items such as air conditioners, cameras, televisions, and more.
Perhaps the greatest the advantage of the Syma over similar offerings is the large user
community that the Syma enjoys. The helicopters are, at least comparatively, ubiquitous and
many hobbyists have already investigated control schemes. As part of this community, many
hackers have worked on various means to control and modify these helicopters. This is an
invaluable resource for working with and debugging issues with the helicopters.
There are two primary disadvantages to Syma helicopters as platforms for educational
robotics exercises. First, they are light and have very little lift capacity (a matter of grams)
and secondly, they have no direct way to control roll. Unlike full-scale helicopters which are
equipped with swash plates that can tilt the main rotors, the Syma helicopter is stabilized by
a bar attached to the main rotor shaft and a gyroscope mounted to the controlling electronics.
This presents a distinct challenge to effectively controlling these helicopters in tight spaces.
Still, the Syma’s widespread availability and low cost make it the platform of choice for this
thesis. Note that the techniques used in this thesis should be adaptable to any other micro
air vehicle with appropriate adaptations.
One half of the equation necessary to create a working control system is feedback. Given
that the fundamental requirements set forth for this educational system include ‘low-cost’
and ‘simple’, and that Syma brand helicopters and their ilk come with no available on-board
telemetry, it falls entirely to some external device to close the loop. This thesis thus adopts
the external sensing model discussed in the literature review.
It is a requirement of this thesis that some form of personal computer be used to program
the helicopters, and given this requirement, the web cameras that come built into most laptops
are prime candidates for closing the sensory loop. A camera image of a helicopter contains, in
theory, all that is necessary to control that helicopter: position, rotation, distance. Extensive
examples and software APIs already exist for collecting and processing computer images.
These algorithms are also an excellent source of additional educational opportunities.
The disadvantage of a cameras, especially USB web cameras built into laptops, is that their
22
specifications are difficult to control. Laptops feature widely ranging models of cameras with
widely ranging properties. Some of these properties are linked with the physical characteristics
of the camera such as field of view and focal length. Others relate to the hardware and
controlling firmware such as, but not limited to, resolution, exposure timing, white balance,
saturation, and brightness. Some cameras conform to a uniform standard which is a subset
of USB protocol [12] but equally many do not. Furthermore, programmatically controlling
these properties is difficult because the interface to do so is operating system dependent and
the physical meaning of values given to each of these parameters itself varies from camera
to camera. A brightness of 50 might mean something different on camera A versus camera
B. This puts a burden on the user to control their own camera, if possible, and otherwise
dictates that any computer vision algorithm used should be fairly robust to variations in
camera properties.
Other options for sensing include adding on-board telemetry, or using a more complicated
vision setup such as a motion tracking system or a depth camera such as a Microsoft Kinect
or Asus Xtion. The difficulty with on board sensing is that the helicopters themselves have
a very small payload, and there is little room or capacity for additional hardware. Replacing
the existing hardware would be a large project unto itself. Furthermore, adding components
adds expense and there’s little chance that students or educators will have the tools and know-
how to perform the modifications themselves. A similar constraint prevents the use of more
complicated vision systems; the cost and expertise required to run them will erect a barrier
to entrance that few will overcome. Thus, this thesis adopts the simple two dimensional web
camera as its sensing instrument.
The final piece of the puzzle is the connecting element between the sensing and logic
system, and the helicopters themselves. For this purpose an Arduino Uno, a single IR LED,
and a USB cable were used. The Arduino Uno is a hobbyist micro-controller platform built
using an 8 bit Atmel AVR 328p processor running at 16 MHz [3]. It is shipped with a library
of software that effectively abstracts much of the low level knowledge about programming
and hardware that is required to program micro-controllers. Much like the Syma helicopters,
the Arduino platform’s real advantage is the extensive user base and the wealth of knowledge
that accompanies it. Arduinos represent the single most popular unified micro-controller
solution and such both the hardware and supporting software libraries are widely available.
23
Any Arduinos purchased for this project in an educational setting could be re-purposed for
any multitude of other projects.
Figure 3.1: Arduino Uno R3 [3]
That said, the Arduino Uno retails for $30 USD, has limited processing power, and does not
include a necessary USB cable (an expensive additional purchase) or a LED (an inexpensive
additional purchase), making the Arduino Uno quite expensive for its components. Still, there
exists many alternative options but few with the level of polish and consistency that Arduinos
feature. For example, Arduinos come ready to plug-and-play. They have pre-soldered female
headers that allow resistors and LEDs to be plugged directly in. Other platforms require some
amount of soldering. The Arduino libraries work on a wide range of AVR platforms, however,
meaning that anything developed for the actual Arduino should be straightforward to port
to alternate platforms. Possible alternatives including offerings from Digispark [6], retailing
for under $10, and the Teensy, a more powerful processor and platform for less than $20 USD
[21]. While these options are appealing, they both require some soldering and have some
incompatibilities with existing Arduino libraries. The development of an improved signal
transmission device is left as future work.
3.3 Software Platform
For the software interface to camera hardware, this project uses the open-source and cross-
platform computer vision library OpenCV [14]. Started in 1999 by the Intel Corporation,
OpenCV is one of the oldest and most mature libraries in computer vision and is usually the
24
first choice for academic research. Interfacing with hardware attached to a computer requires
inter-operation with that computers operating system, and each OS is different. The ad-
vantage of OpenCV, aside from its wealth of optimized algorithms for all common computer
vision operations, is that it handles hardware interfacing for the programmer. While adding
dependencies to a project should always be done with caution, replicating the functionality
available in OpenCV would be a huge project in and of itself. OpenCVs native computer lan-
guage(s) is C/C++ and this thesis follows suite. The C++ language allows for programming
with high level concepts while maintaining minimal computational overhead during runtime.
For a real-time video processing system such as this thesis, this is a necessary trait. Note that
OpenCV bindings are available for most popular languages.
Any software developed as a part of this thesis will itself be free and open-source. For users
who are familiar with the process of building C/C++ programs using traditional command
line techniques, makefiles are available. For those who desire a more hassle free installation,
this thesis’ software is designed to work with the KISS Institute for Practical Robotics’ KISS
Platform Editor [10]. This integrated development environment ships with cross platform
software libraries to access OpenCV and serial communication alleviating the need for a user
to build/install these him/herself.
3.4 Algorithms and Implementation
This section discusses the chosen system architecture, tracking methodology, and control
scheme. It also discusses more logistical concerns such as the Syma helicopter signal trans-
mission and the layer between the control application and transmission. Please note that the
term “transmission”, in this thesis, is used to indicate the process by which a digital signal is
broadcast to a receiving helicopter using infrared light pulses. It does not refer to mechanical
linkage between the motors and rotors on-board the helicopter which is a black box with
respect to this work.
3.4.1 Fiducial Markers
The following section discusses the means by which a helicopter’s pose information is extracted
from a camera image of that helicopter. This thesis’ literature review, section 2.4, discusses a
25
class of 2D-3D pose estimation algorithms designed to run in real-time that used landmarks,
or purpose designed markers that are simple to detect, to measure a cameras position against.
This thesis inverts this system by fixing the position of the camera and allowing the marker
to move. The following section outline s the specifications for this marker.
An effective marker should be easy to detect under a variety of conditions and provide
unambiguous information about the helicopter’s pose. It should also exert minimal influence
on the dynamics of the helicopter during flight. The most natural marker for the helicopter,
exerting no influence on dynamics and requiring no additional materials, is the helicopter
itself. That is an image of the helicopter itself should encode enough information about the
helicopter to extract its pose without any additional hardware. There are significant issues,
however, that make this difficult to implement.
One such difficulty is ambiguity in orientation measurements. Given a binary image where
positive pixel values indicate the presence of the helicopter, the position of the helicopter can
be taken as the mean position of the positive values. The height of the helicopter, or the
difference between the highest and lowest positive pixel position, provides an estimation for
the helicopter’s distance from the camera. Rotation of the helicopter (with a zero rotation
defined as facing the camera) is correlated with the height to width ratio of the bounding
box of the helicopter. The bounding box is the defined as the smallest rectangle that entirely
encompasses the positive pixels in a binary image.
The height to width ratio is maximized when the helicopter is facing toward or directly
away from the camera. Conversely, it is minimized when the helicopter orientation is perpen-
dicular to the camera. If these minimum and maximum ratios are known, then a measured
ratio can be transformed to rotation through interpolation. The quadrant that the rotation
is in, however, is not clear. For example, a rotation of thirty degrees counterclockwise from
zero would have the same ratio as thirty degrees clockwise.
Affixing some form of marker to each side of the helicopter, or painting the left and right
sides different colors provides a way to help resolve this ambiguity. However, this technique has
some issues from a practical standpoint. First, it requires modification of the helicopter itself
which may make it more difficult to use. Second, the helicopter’s tail becomes very difficult
to detect leading to ambiguous situations. Is a small width to height ratio the product of a
helicopter facing the camera or a failure to detect the tail? Using a single visual feature per
26
side gives no way to have confidence in the values being measured. An easier solution would
be to affix some form of marker which, regardless of orientation, always has two easy to detect
features with a known relationship.
Modifying the helicopter with some visual landmark, referred to from here on as a fiducial
marker provides such a solution. Inspiration for the design of this marker can be drawn from
real-world navigation lights, or colored lights mounted to many boats and aircraft used to
indicate position and heading under low light conditions. Traditionally, the left side of a
craft displays a green light and the right displays a red light [8]. Active markers such as
these would require either batteries or modification of the helicopter’s existing wiring. A
reasonable constraint on the system is operation in good lighting, so a passive colored marker
is a reasonable solution.
Figure 3.2: Reference 3-Color Fiducial Marker Constructed From Styrofoam
The passive fiducial marker settled on is a circular ring (seen in Figure 3.2) is attached to,
or in place of, the landing gear of the Syma helicopter, Figure 3.3. The surface of this ring
is divided into three equally sized rectangular areas and each area is painted with a unique
color. The colors are selected such that they are approximately equidistant on a color wheel.
No matter the orientation of the ring, it keeps the same rectangular profile in a camera image.
Furthermore, the combination and ratio of colors allows for the identification of any rotation
of the helicopter without ambiguity provided the helicopter is not capable of rolling. The fact
27
this marker design has equal height across the width makes it more robust to noise than a
shape similar to a sphere, resulting in more accurate estimations of depth.
Figure 3.3: Fiducial Marker Attached to Syma Helicopter
Finally, operating on the assumption that the helicopter does not roll or pitch significantly,
it can be assumed that the two dimensional projection of each color element on the ring is
itself a rectangle of equal height. Any point inside one of these rectangles should contain the
same color hue. These properties confer computational benefits: For one, only a single method
is necessary to determine the position and rotation of a helicopter in an image (as opposed
to multiple methods depending on relative orientation to the image plane). Additionally,
each color can be represented by a rectangle on the image plane as opposed to a potentially
complicated or noisy contour of individual points.
If one is aware of a point p inside of rectangular color element r, then the dimensions of
r can be found simply by looking for a color gradient above, below, left and right of p. Such
a gradient can be found by traversing in one of “cardinal” directions in groups of n pixels
until a significant shift in color hue is detected. The method involves traversing at most
width+height pixels. This thesis will refer to this method By-Estimate blob expansion. The
alternative is thresholding an entire image which involves performing an operation on each of
the width ∗height pixels in the image, an approximate difference of two orders of magnitude.
28
This method will be referred to as the By-Threshold method.
The disadvantage of using a fiducial marker like the color ring is that it requires a possibly
bulky attachment to the bottom of a micro-helicopter that has little lift capacity, significantly
damping the maximum speed of the helicopter. Furthermore, backgrounds with complicated
colors/patterns and varying lighting can induce many tracking failures including false-positives
and a failure to register the color squares. Thus, to be effective, the helicopters must be flown
in a fairly controlled, indoor environment with the understanding that the system will be quite
damped. Still this method represents an effective alternative to the most general methodolo-
gies which involve marking and finding approximately six key-points on the helicopter in each
frame which suffers from all the same perception problems but requires much more precision.
3.4.2 Vision-Based Helicopter Tracking
This section briefly discusses the core logic of tracking software in high-level terms. The
algorithm is designed to track a single Syma helicopter outfitted with a specially configured
fiducial marker between subsequent images from a stationary camera.
The fiducial marker should be a ring affixed to the bottom of the helicopter with an outer
surface area that consists of three solid colors, easily distinguished from one-another. One
color should be chosen as the front color and the nose of the helicopter should be aligned with
the center of that color’s span along the ring. This segment is designated color zero. Colors
one and two are the other segments of the ring enumerated counterclockwise around the ring.
There are two inputs to the tracking software. The first of which is the configuration
discussed above. This consists of the names of all three colors, a hue value representing that
color on the color-wheel, and a tolerance range about that center hue value. This tolerance
should be as small possible while still maintaining good search results.
The second input to the system is a stream of raw images of a fixed size encoded in an
eight-bit hue, saturation, value format. In order to minimize its impact on other logic a
user might want to use, the tracking software explicitly does not take control of an image
stream. Instead, it is necessary for the end-user to update the tracking engine with each
frame, including performing any desired preprocessing and color space conversions.
The output of the system is a data container representing the believed position (in X,Y
pixel coordinates) of the center of the fiducial marker and the believed rotation of the heli-
29
copter (in radians, from 0 to 2π with a zero rotation corresponding to the helicopter directly
facing the camera). It is possible for the algorithm to fail to locate the helicopter in which case
a point with an invalid position of (-1, -1) and rotation of 0 will be returned. The tracking
engine keeps a minimal amount of state information to better predict helicopter position, but
it is up to the user to keep his/her own information for control purposes.
The general algorithm for helicopter tracking is given in algorithm 1.
Algorithm 1 General Fiducial Marker Search Approach
Require: HSV encoded digital image, marker color configuration1: frame← getCameraFrame2: estimateBlobs← byEstimateExpansion(frame, colors, history)3: possiblePoses← findPossiblePoses(estimateBlobs)4: if possiblePoses is empty then5: estimateBlobs← byThresholdExpansion(frame, colors)6: possiblePoses← findPossiblePoses(estimateBlobs)7: end if8: bestPoint← pickBestPoint(possiblePoses, history) return bestPoint
The by-estimate method of blob expansion refers to the technique of looking for color
gradients surrounding a point suspected of being inside a color blob. The more general by-
threshold method uses binary thresholding and the color configuration to identify blobs.
Internally, the tracking system keeps a history of the helicopter positions and if the history
becomes too large, the oldest point is discarded for each new one. The first step of the
algorithm is to search for the helicopter where it was in the last frame. This is accomplished
looking for a new color blob at the center of each old one with the same color. This method
also examines the edges of the marker composed of color blobs to ensure that emerging
colors as a result of helicopter rotation are found. If this procedure fails to identify a valid
combination of color blobs, then the image will be thresholded using the color configuration
and blob detection will be run on the resulting images.
3.4.3 Control System
The helicopter perception engine returns a believed position (or an invalid one if no candidate
could be found) for each frame it is fed. This calculation is assisted by the history of images
that has come before it, but only picks a value based on what it sees in the current image. This
is done in accordance with the philosophy that components should be as simple as possible
30
(but no simpler), and it also means that the stream of information can be momentarily subject
to noise and errors. It falls onto a separate module to parse the stream of information into a
more reasoned belief about the helicopter position.
With respect to the reference implementation presented here, this module is referred to
as the “controller” application. This application takes as input the stream of information
from the tracking engine and outputs a signal suitable for bringing the helicopter to a target
position. This process works in three basic stages:
1. Each new point is used in combination with the previous belief and a general knowledge
of physics to generate a new belief of where the helicopter is. This is accomplished with
an exponential moving average on the input data stream. estimate = (ratio∗sample)+
(1−ratio)∗old estimate where ratio is the percentage of the old estimate to replace with
the new sample. An effective ratio can be chosen by analyzing the time characteristics
of the system. Certain variables, like rotation, can produce erroneous results with this
method and are handled separately. This is discussed in detail in section 5.2, the state
estimation portion of the controls chapter.
2. Armed with a new estimation of the helicopter position, the controller passes this in-
formation onto individual sub-modules responsible for calculating the output for each
of the helicopter’s control channels: throttle, pitch, and yaw. This affords each sub
module an opportunity to keep some form of internal state such as an integration of the
error in an efficient manner.
3. When called upon, each sub-module is responsible for calculating a signal value that will
bring its parameter under control. The exact method by which this is achieved is left
to each sub module to implement. The reference implementation is a PID control loop
with gain values determined by testing. By separating the action of updating the state
of the system from the actual calculation of the signal in the program interface, more
expensive algorithms can be broken between the two actions. Furthermore, the nature
of the system is that the rate at which information is received is different than the rate
at which signals can be issued. Most web cameras operate at a frame rate between 15
and 30 frames per second while the Syma helicopter only operates at approximately 5
Hz. Separation of these tasks saves some calculations.
31
Further details for the reference control system are available in the control section of the
report. For details on how to implement your own control system or sub-system, see the
Developer Manual in Appendix B.
3.4.4 Helicopter Signaling
The final major component of the system is the hardware/software communication system
that bridges the computer vision and control algorithms with the helicopter itself. This system
consists of a micro-controller and LED (Arduino), a USB cable, a software library facilitating
the communication between the micro-controller and host computer, and a program for the
embedded system to translate signals to IR pulses.
The Syma S107 works on an IR protocol that has been the subject of previous investigation
by enthusiasts on-line [54]. From this existing documentation and direct investigation into
the protocol with an IR receiver, the properties of the broadcast communication were found
to be:
1. transmission medium: 940 nm IR modulated at 38Khz
2. a packet header is represented by 2 ms high, 2 ms low
3. a binary one is represented by 300 µseconds high, 700 µ seconds low
4. a binary zero is represented by 300 µseconds high, 300 µ seconds low
5. each signal is 32 bits long in and sent in big-endian order
6. signal packets are sent every 120 ms
Figure 3.4: Syma S107 Signal Indicating Full Throttle and Neutral Yaw, Pitch, Trim
The structure of each packet is shown in Figure 3.4. It is, in order:
1. header
32
2. yaw byte (representing decimal number from 0 - 127). Neutral at 64, and Full Right at
0.
3. pitch byte (0 - 127), Neutral at 64, and Full Forward at 0.
4. Throttle byte:
(a) If first bit is 1, the transmission is on band A
(b) If first bit is 0, the transmission is on band B
(c) 0 - 127 for band A, 128-255 for band B
5. trim byte (0 - 127)
The Syma protocol outlined above is used across several of the company’s devices, includ-
ing some platforms more complex than the Syma S107. As a simple model, the Syma S107
has only three channel inputs, and ignores the last (trim) byte. Trim is instead implemented
by adjusting the baseline value of the yaw parameter. Thus, for the purposes of this system,
a valid helicopter signal consists of three numbers ranging from 0 - 127, Y aw, P itch, Throttle
each padded by a single zero with a header and 8 bits of tail end padding.
In order to simplify the process of communicating with the micro-controller, the reference
communication system disallows the secondary transmission band. This frees the values 128-
255 for use in verifying signals. Thus, the communication protocol between host computer
and Arduino works as follows:
1. open serial connection between host and device at 9600 Bps, 1 stop bit, no parity
2. transmit yaw, pitch, and throttle in that order as bytes (ranging from 0 to 127)
3. transmit the checksum header signal, a byte with decimal value 128 (hex 80)
4. transmit the checksum byte, computed using algorithm found in listing 1.
The reference checksum algorithm is implemented with the following C code:
Listing 3.1: Reference Checksum Algorithm
uint8_t syma_compute_checksum(const uint8_t *signal)
{
33
uint16_t checksum = 0;
uint16_t i;
for (i = 0; i < SYMA_SIGNAL_LENGTH; i++) {
checksum += signal[i] << (SYMA_SIGNAL_LENGTH-1-i);
}
return checksum % 255;
}
The constant term SYMA SIGNAL LENGTH is defined elsewhere in the reference source
code to be the length, in bytes, of the signal to be sent to the Arduino for transmission. In
the reference implementation, this value is set to three (representing yaw, pitch, and throttle
respectively).
The Arduino’s role is to:
1. accept connection a serial connection at a prescribed rate
2. initialize the output signal to neutral yaw, neutral pitch, and no throttle.
3. maintain two buffers: one which contains the current signal, and the other which collects
bytes received over serial in a first-in-first-out manner.
4. flash the signal by working through the bytes in big endian order.
5. between signals (sent with a period of 120 ms), for every byte received:
(a) if the ‘expect checksum’ flag is set, compute the checksum of the current buffer and
compare against the incoming byte. If they match, copy the buffer to the current
signal buffer. Either way, disable the ‘expect checksum’ flag.
(b) if the ‘expect checksum’ byte is received (0x80), then set the corresponding flag.
(c) otherwise, push the new byte onto the buffer, shifting the other bytes down, and
discarding the oldest one.
This architecture allows signals to be streamed to the controller faster than the actual
control rate: Only the latest signal will be used when it’s time for a new transmission.
34
3.5 Summary
This chapter reviewed the reasoning behind the selection of the various pieces of hardware
and the major approaches to sensing and control. System architecture and software design
decisions were also briefly discussed. The next two chapters provide detailed reports on
the reference implementation of the visual tracking system, responsible for identifying fidu-
cial markers and reconstructing their pose, and the feedback control system, responsible for
bringing the helicopters state under control using information from the tracking system.
35
Chapter 4
The Tracking Component
This chapter presents a more thorough investigation of the vision component of the Syma
helicopter control system. From a system architecture standpoint, each component (e.g.,
tracking, signal calculation, or signal output) functions independently of the other systems.
It is the user application that ties different components together to provide control for a
given hardware platform. This thesis provides “reference” implementations for each of the
components, but interested individuals may provide their own implementations to change the
system behavior or extend control to new hardware. This model affords developers the ability
to treat their components as black boxes and implement whatever logic they see fit. The only
constraint is that each component must offer a set of specific services that define its interface.
The computer vision and tracking component has the simplest interface. It requires a
single track method which takes, for input, digital camera images with HSV color encoding
and information about the composition of the target fiducial marker. The structure of the
marker is described in section 3.4.1 and is provided to the system by the user at initialization.
If the system is able to identify the likely presence of the fiducial marker, then it outputs a
data structure which contains the following information:
1. the center of the helicopter marker in (x,y) pixel coordinates,
2. the width and height of the bounding rectangle of the marker in pixels,
3. the rotation of the marker from the center of the front color in radians (0 to 2π), and
36
4. the blob composition of the marker (colors and positions).
In the event of a failure to identify the marker, the tracker will return a data structure with
a position value of (-1, -1) and undefined values for the other fields. One by-product of defining
the interface in this manner is that the vision component does not take ownership of the video
stream, thereby allowing more flexibility for the user to architect a controller. A second by-
product is that the tracking system must be capable of operating without prior knowledge of
the helicopter position. This is a necessary property of the tracking component to be able to
recover from the effects of occlusion, a condition where the helicopter is temporarily obscured
in the camera image. On the other hand, the tracking component is free to keep some form
of internal state to improve the robustness of its operation (though it must be able to work
without it).
4.1 Reference Implementation
In addition to the explicit requirements set forth by the vision components interface, there
are some practical considerations that must also be addressed. First, the vision tracking
algorithms should run in real time with enough margin to allow users to also run their own
logic. For a camera running at thirty frames per second, this means that the vision algorithm
needs to complete before the next image is available in 33 milliseconds. The smaller the
required processing time, the lower the latency of the entire system, and the more responsive
the ultimate control will be.
Second, the Syma helicopter is a dynamic vehicle making positive control very difficult
without reliable tracking. The algorithm should be fairly robust against background color
noise. Additionally, the helicopter itself accepts new control commands approximately every
120 to 200 milliseconds, or every four to six frames of a thirty-frames-per-second web camera.
If it does not receive a new command after approximately a second, the helicopter responds
by disabling throttle and falling out of the sky. To have a chance of recovery from such a
situation, the tracking algorithm needs to quickly identify the helicopter in the next frames,
even as the helicopter moves a large amount between images and becomes blurred due to its
motion.
A technical challenge to robust tracking across many platforms results from the fact that
37
many cameras have hardware or software controls, typically beyond the control of this system,
that automatically adjust properties like white-balance and auto-exposure. For example, if a
light turns off in the room where the camera is running, the camera may begin to increase
exposure time resulting in a frame-rate that is a fraction of the ideal rate and colors that are
significantly muddled. As much as possible, the system should be invariant to these changes.
As implemented, the tracking algorithm assumes that all input images are HSV encoded
and of a constant size. If these assumptions are met, the algorithm 1 on page 30 gives a
general overview of how the vision tracking system operates.
Before the tracking component can be used, however, it must be initialized. The initial-
ization takes as an argument: the marker configuration, telling the tracking software what
hue values constitute each color on the marker, and where that color is relative to the front
of the ring. With the core tracking software, this thesis also provides a tool for generating
these marker configurations. See Appendix A for more information.
Once configured, the user can begin feeding the algorithm images through the “track”
interface. The first step of the tracking algorithm, once given an RGB image, is to convert
that image’s color space to HSV. This step makes the following logic easier to express as
the color of an object is now a function of a single numerical value instead of three. More
importantly, the hue channel is largely invariant to shifts in illumination that result from
changes in the environment or properties of the camera.
To optimize tracking performance in terms of both speed and accuracy, the vision compo-
nent keeps a minimal amount of state information pertaining to the history of the marker. If
the delay between images is sufficiently small relative to the helicopter’s motion, the tracker
can assume “spatial locality”, the condition where the helicopter marker is in approximately
the same position from one frame to the next. When the helicopter does move, it can also
be assumed that there exists locality in the marker velocity. The tracking engine remembers
recent positions and velocities, using them to compute an estimation of the location of the
marker before any searching is performed. An accurate estimate improves the chance of the
blob expansion algorithms working quickly.
There are two such blob expansion algorithms, the by-estimate method and the by-
threshold method. These methods, discussed in greater depth in the following sections, are
used together to provide tracking that is fast and robust. In general, the by-estimate method
38
attempts to identify a color blob by looking for color gradients above, below, left, and right of
some estimate. The by-threshold method conducts a blob search by binary thresholding the
entire image for each color on the marker and then detecting contiguous blobs. In practical
terms, by-estimate method constitutes a local search with low computational cost, compared
to by-threshold method which is global and expensive.
A problem encountered by the by-estimation method is that it is possible for a color point
in the current frame to no longer be inside the color in the next frame if the helicopter is
moving quickly. For both methods, complicated color environments increase the probability
of detecting false-positives or masking the presence of the helicopter marker. As much as
possible, these failure conditions need to be detected and appropriate actions taken.
To achieve fast yet robust tracking, the reference implementation uses a simple combina-
tion of the by-estimate and by-threshold methods. The by-estimate method is used whenever
the tracker can make a guess about the location of the marker. This guess is formed by
remembering a brief history of the position and motion of the helicopter, and using that
information to extrapolate forward in time. If no guess can be made or the guess fails to
yield a viable marker candidate, then the by-threshold method is used. Most of the time, the
marker will be quickly identified by the by-estimate method, but if this is the first frame, if the
helicopter has accelerated quickly, or if the helicopter was occluded in previous frames, then
the by-threshold method provides a safety net that allows tracking to recover. Whatever the
chosen algorithm, the results of color blob search is a collection of “blob” objects; containers
holding properties such as a bounding rectangle, center position, and color.
Given a set of blobs, the next step is to see what permutation of those blobs forms a
valid marker. This process, also discussed in detail in a following section, involves looking for
permutations of color blobs that are spatially configured such that they satisfy the constraints
of the marker configuration. Properties of two blobs such as relative position and bounding
rectangle dimension ratios are examples of metrics used to determine if those blobs might
be a marker. This process is first run using the results of the by-estimate blob search and if
the results indicate no marker candidates, the process will be run again using the output of
the by-threshold blob search. Each time a blob pair is accepted as a candidate, the physical
pose associated with the properties of that candidate are also calculated. The output of the
candidate search stage is a collection of the pose data structures enumerated at the beginning
39
of this section.
The last computational stage of the vision component takes the list of possible helicopter
poses and chooses the most likely one to be the result of the tracking operation. Each
candidate is given a score based on its proximity to the expected helicopter pose and the
highest score, representing the closest match, is said to be “most likely”. If there are no
candidates or the best score is beneath a chosen minimum value, this component of the vision
system returns an invalid point. The return value is then added to the history of the tracker
where it may be used to form future guesses about the fiducial marker’s position.
This concludes the brief tour of the components of the vision system. What follows is a
more detailed discussion of how each operates.
Figure 4.1: A Tracked Helicopter Marker with Rotation Estimate in Degrees
4.2 By-Estimate Blob Detection
At the heart of the tracking component are the algorithms which perceive the fiducial marker
in a given camera image. In the reference implementation, this operation must run in real-
time and is used to control a highly dynamic system. Therefore, properties such as speed
and accuracy are paramount. If the search for blobs can be “windowed”, or constrained to
40
a subspace of the entire image, then both the computational complexity (time and space)
of the search and the likelihood of false positives can be mitigated. This is the philosophy
behind the by-estimate method of blob detection, the preferred method when there is prior
knowledge of the helicopter state, including position, velocity, and acceleration.
The internals of the by-estimate method could be implemented several ways. For example,
it could search a subregion of the input image using a thresholding method and the known color
configuration. This would accomplish the goal of restricting computational complexity and
improving accuracy, but there are further improvements that could be made. The reference
marker design is a ring with rectangular segments of color affixed to the outer surface area.
When this ring is projected onto a 2D camera image, the visible color segments will appear
to be approximate rectangles. There will be some divergence from this model based on the
physical properties of the camera lens and the orientation of the camera relative to the marker,
but for the given use case (a web camera looking approximately straight onto a marker at
a distance of a few feet) this divergence is small. Furthermore the helicopter is stabilized in
a way that ensures that during normal operation it will not roll or pitch significantly so the
boundary of each color rectangle will be even with the image itself. The dimensions of this
rectangle can then be defined by four points, one on each side.
The reference implementation of the by-estimate method attempts to quickly identify
these four points, and thus a color blob, by searching directly up, down, left, and right of the
estimate. This is accomplished by assuming that the estimate is inside a rectangular block
of pixels with similar hue values and that an edge of this block is characterized by a sudden
shift in hue. Explicitly, the algorithm for the by-estimate expansion of a blob is as follows:
1. Sample the color of the initial guess and save that information.
2. For each of the cardinal directions:
(a) Move d pixels in the target direction and sample the hue again. Take the difference
of this sample and the previous, and add that difference to a running average.
(b) If the running average of the hue change exceeds a threshold T , then mark this as
an edge of the rectangle.
(c) Otherwise, repeat the above steps until an edge is detected or the limits of the
image are met.
41
Figure 4.2: Result of by-estimate method on figure 4.1. The blue rectangles represent possiblecolor blobs. The green dots represent the centers of color blobs in previous frames while blue andred dots represent right and left possibilities respectively. The entire image has been converted toHSV color space.
3. Using the four intersecting points, reconstruct the bounding rectangle and return it.
Compared to an operation on the entire image (e.g. thresholding) which takes W ∗ H
operations, this process takes at most W +H operations. The performance and accuracy of
this method can be tuned by adjusting the values of d, the pixel step between tests, and T ,
the hue change threshold. Other optimizations include testing saturation, in addition to hue,
to prevent white or black backgrounds from throwing off the results, and allowing each search
vector to hit multiple false points before it returns an edge. This prevents certain varieties
of noise, such as shadows or the helicopter tail, from causing an early return. Finally, it may
also be beneficial to enforce an absolute maximum deviation from the first hue sample to stop
slight gradients from throwing off tracking results.
The following search technique is used to find helicopter markers using algorithm 2.
42
Algorithm 2 By-estimate Marker Search
Require: previous state information, HSV image, color configuration1: if no point found last frame then return empty blob set2: end if3: vector ← getEstimatedDisplacement(stateInformation)4: returnBlobSet← nil5: for each blob in last pose do6: estimate← blob.position+ vector7: centerBlob← by-estimate-expansion(estimate, image)8: rightEstimate← blob.position+ vector + sideSearchDisplacementblob()9: rightBlob← by-estimate-expansion(rightEstimate, image)
10: leftEstimate← blob.position+ vector − sideSearchDisplacementblob()11: leftBlob← by-estimate-expansion(leftEstimate, image)12: if centerBlob.color ≈ blob.color then addBlob(returnBlobSet, centerBlob)13: end if14: if rightBlob.color ≈ blob.color.next then addBlob(returnBlobSet, leftBlob)15: end if16: if leftBlob.color ≈ blob.color.prev then addBlob(returnBlobSet, leftBlob)17: end if18: end for
return returnBlobSet
The core operation of this algorithm begins by estimating the motion of the marker be-
tween frames. Each blob of the last marker should then be at the position it was last frame,
plus this new displacement. Therefore, run a search at this new location to see if it matches
the color of the last blob. If it does, add it to the returned set of blobs.
One special case that must be handled is that of a rotating marker. When the marker is
in motion, color blobs will go out of and into view of the camera. If this motion is continuous,
there will always be a stage where the blob is present in one frame and not in the next. When
this occurs, a new blob should be present on the far side of the marker from the disappearing
blob. To detect new colors as they merge, for every estimate expansion of a previous blob
center, a point is expanded to the left and right of the previous blob. This has proven to work
reliably in picking up new colors as they come into view, but there are some side effects.
One such side effect is redundant blob expansion where each color in the marker is ex-
panded twice (once by itself and once by its neighbor). Given the expansion’s fast execution
time, this redundancy actually ends up being beneficial as it helps to guard against noise in
the image. The other side effect is that side expansions occasionally identify blobs of color
that do not belong to a marker. This makes the algorithm that extracts marker positions
work harder and can occasionally lead to false positives.
43
This algorithm returns a collection of blobs, sorted by color, that will then be checked
to see if some permutation can form a valid fiducial marker. See the section 4.4, Marker
Candidate Detection, for more information. If this blob search algorithm fails to yield the
components of a marker, then that task falls to the by-threshold method.
4.3 By-Threshold Blob Detection
Complementing the local blob search algorithm described in the by-estimate section is a search
method that efficiently scans an entire image for color blobs without prior knowledge of the
marker’s location. This section describes the internal workings of such a global method,
described in this thesis as the by-threshold method.
As the name indicates, the reference implementation uses color thresholding to achieve
this goal. More information about the theory behind color thresholding can be found in the
chapter 2, the Literature Review of this thesis.
One of the inputs of the tracking system is information about the construction of the color
marker. This information includes a position, center hue value, and hue tolerances for each
color on the marker. Using a tool provided by this thesis, users can automatically generate
these configurations. The user must select a region of an image and the tool will calculate
statistical data about this region including the mean and standard deviation of each channel.
By default, the center of a color is given the value of the sample mean and the tolerance is
set to plus or minus one standard deviation.
The by-threshold method works by performing a binary threshold operation using informa-
tion from all three channels of an HSV image, as well as an upper and lower threshold value.
This means that a positive value is given to a pixel that falls between an upper and lower
limit for each of its channels. This method is advantageous because it allows finer control on
what values pass through, including the removal colors that do not have enough saturation or
value to be candidates for marker blobs. This is particularly useful for removing very white
or black pixels which are characterized by a lack of hue and value respectively, and are not
associated a particular hue. The threshold limits are set to the color configuration’s center
value plus/minus the tolerance for each channel.
Because of these tests, the by-threshold method is slower in operation than the by-estimate
44
method. It must examine width ∗height pixels (where width and height are the size in pixels
of the image being searched) and perform six branch statements on each one (two per channel
on three total channels) for each color on the marker. However, the results, given a good
color configuration, are much less prone to error than the by-estimate method because noise
or slight occlusion of a color blob will not deter the rest of the blob from being detected
properly.
Figure 4.3: Result of by-threshold method on figure 4.1. The blue rectangles represent possiblecolor blobs. The entire image has been converted to HSV color space.
Thresholding the input image against the colors in the configuration reference produces
three binary images where positive values indicate the possible presence of a marker blob.
Still more processing must be done to reduce noise and extract information required by the
algorithm that selects valid permutations of blobs. Such additional information includes blob
center position and the bounding rectangle.
45
Algorithm 3 By-threshold Blob Search
Require: HSV image, color configuration1: returnBlobs← nil2: for each color in color configuration do3: if color ≈ red then4: binaryImage← redThreshold(image, color)5: else6: binaryImage← threshold(image, color)7: end if8: binaryImage← imageOpening(binaryImage)9: contours← detectContours(binaryImage)
10: contours← sortBySize(contours)11: rectangles← calculateBoundingRectangles(contours)
appendList(blobs, returnBlobs)12: end for
return returnBlobs
The algorithm by which the by-threshold method of blob expansion returns a list of blob
candidates is outlined in algorithm 3. The following section is a discussion of each step in this
process, including edge cases, optimizations, and other knowledge necessary to implement
this logic.
The first step in the by-threshold method is to conduct the threshold operation based on
information stored in the marker configuration. There exists special edge cases that stem from
the fact that the hue portion of the HSV color space forms a closed surface. Practically, this
means that the maximum and minimum hue values represent the same color. The distance
between the red hue at 179 degrees (using the 180 degree OpenCV color wheel) and the red
hue at 1 degree is 2 degrees, not 178 degrees, a fact that must be considered when calculating
the difference between colors. For threshold operations, this means that if the minimum value
is greater in magnitude than the maximum value, all of the values of the image need to be
rotated around the color wheel until the operation makes sense again. In algorithm 3, the
function redThreshold encodes this process.
The raw binary image produced by a threshold operation for a given color may contain a
great deal of noise, or positive values that result from small colored pieces of the environment,
lighting, reflections, or artifacts of the conversion from RGB to HSV color spaces. These
values often appear as specks of white on the binary image. Eliminating this noise will result
in dramatically fewer blob detections which results in increased performance and accuracy of
the tracking method. White noise in the image is handled by performing an image “opening”
46
on each binary image. This process involves first eroding the white values of the image to
reduce noise and then dilating the image to fill in gaps [17].
The next step of the algorithm is to separate and classify distinct blobs in each filtered,
binary image. This is accomplished by detecting the outer contours of each shape formed by
groups of positive values. OpenCV provides functionality for this task in its image processing
module. References for both the application programming interface and underlying algorithm
(with references) may be found in the OpenCV documentation [16]. If the color configuration
is set appropriately and the background does not prominently feature one of the marker colors,
only a small number of distinct contours should be detected.
This is not always the case, however, and the next to last step of the by-threshold method
sorts the remaining contours by their enclosed areas. From this sorted set, all contours with
areas below a set threshold are discarded. By discarding excessively small contours in this
stage, the algorithm ensures that only prominent image features of a given color are considered
as candidates for blobs on a marker.
The final stage of the by-threshold method handles the bookkeeping necessary to pass
the contours onto the next stage. Each distinct contour is labeled with its associated color,
the centroid of the area it encloses, and the bounding rectangle of the entire contour. The
resulting “blobs” are collected and returned to be composed into markers in the next stage.
4.4 Marker Candidate Detection
With a few exceptions, the appearance of the fiducial marker in an image will manifest itself
as a pair of heterogeneous, colored rectangles. The by-estimate and by-threshold methods of
blob expansion and detection work on a single color at a time. Given a collection of color
blobs generated by one of the previous steps, the next task is to identify permutations of those
blobs that make sense in light of the geometry of the marker, the physics of the helicopter,
and known color configuration. This task falls to the marker candidate detection method,
which this section describes in more detail.
This method makes several assumptions about the system to make its job more tractable.
Specifically, these are:
1. the fiducial marker is a perfect circle,
47
2. the three color segments are equal in dimensions and cover the entire surface of the ring,
3. the marker does not pitch significantly,
4. the marker does not roll significantly,
5. the camera is looking straight onto the marker, and
6. the entire marker is visible in the camera image.
The first two assumptions pertain to the construction of the marker. While unlikely that
either of these assumptions is entirely accurate, it is far easier to ensure that the markers
are well constructed than to attempt to program corrections into the system. Deviations
from these assumptions negatively affects the accuracy of calculated poses for the helicopter,
especially the rotation property. Programmatic corrections would require additional training
on the marker which increases the burden on the user for little in return. This system is
designed to give reasonable estimates on helicopter pose very quickly. It does not take the
place of special purpose equipment like motion capture systems, if accuracy is required.
Assumptions three and four deal with the physics of the hardware system. As discussed in
the methodology section, the Syma helicopter is equipped with a stabilizing bar on the rotor
shaft and is further stabilized with an on-board gyroscope. As a result, these helicopters
do not visibly roll while properly operating. The helicopter can pitch visibly during normal
flight, but the angle of pitch is small. Given these properties, it is not unreasonable for the
vision systems to make the assumptions it does regarding marker orientation. The benefits
are significant as many otherwise valid combinations of colors can be disregarded, leading to
a drop in both computational effort and false positives.
Due to the effects of perspective, a helicopter marker with the same pose in the physical
world will appear slightly differently in the top left corner of the system than in the bottom
right corner. The magnitude of this shift depends on intrinsic camera properties such as
field of view. This thesis does not assume a specific camera or calibration so it not generally
possible to calculate the magnitude. Therefore, when calculating rotation based on a perceived
marker, the pose estimator assumes that it is looking straight onto the helicopter.
The final assumption is made solely for the purpose of simplifying logic in the marker
detector. In practical terms, this assumption means that the marker system does not attempt
48
to create a pose guess based on partial color fragments and previous history. Instead, all of
the information used to define a marker must be present in the image. This has not proven
to be an issue in practical testing and it dramatically cuts down on the number of possible
candidates which reduces computational complexity and false positives.
Following immediately from these assumptions are a set of conditions for the composition
of blobs into a representation of the fiducial marker. Two color blobs belonging to the same
fiducial marker should have equivalent bounding rectangle height and y-position properties.
The width of a given blob will depend on its rotation, but the distance between the centers
of two blobs should not exceed the height of the blobs multiplied by the width to height
ratio of the marker. It is further assumed that, for the specific case of the three-color marker
presented here, there are no more than two colors visible at any given time. These two colors
must be present in the order prescribed by the color configuration.
If the helicopter marker is perfectly circular, is divided evenly between three colors, and
the camera can perceive 180 degrees of the marker then there will be situations about the
0, 120, and 240 degree marks where three colors will be visible. Due to the curvature of the
marker, however, the two colors on the ends which constitute thirty degrees of the marker
each will be hardly visible: the farther away the marker is from the camera, the more difficult
the detection. The result of this is, for all practical purposes, that the entire marker can be
represented by one color at these special angles. Thus, the vision component makes a special
exception to its general rule requiring two colors to constitute a blob. If the marker history
indicates it is close to a special angle and if there are no viable two-color candidates, the
marker will consider single color candidates.
All viable fiducial marker combinations can be found by testing each possible permutation
of two unique color blobs against these conditions. The resulting operation has quadratic
(O(n2)) complexity. Computational complexity can be controlled by adjusting the parameters
of the by-estimate and by-threshold blob expansion methods to allow fewer blobs to pass
through. When a viable candidate is calculated, the blobs that compose it are passed on to
the pose estimation stage.
The pose estimation stage of the marker finder is responsible for transforming color permu-
tations into “track points” which consists of an estimation of position, rotation, the bounding
rectangle of the marker, and its composition as discussed in the introduction to this chapter.
49
If a combination consists of a single color, then the corresponding track point inherits the
center position and bounding rectangle property from the blob it is formed from. Rotation
is equivalent to the rotation of the center of the blob color about the fiducial marker’s ring.
This means 0, 120, and 240 degrees are the only viable outputs for the default configuration.
Figure 4.4: The helicopter from figure 4.1 rotated to face directly at the camera and illustratinga marker configuration of one color. Note the color blob on the left of the marker is not detected.
If the combination is of multiple colors, then the bounding rectangle is formed by the
union of the bounding rectangles of the composing blobs and the center position is taken to
be the center of the resulting rectangle. Rotation poses a more complex challenge, but the
assumption that the camera is looking straight onto the helicopter makes the calculation more
straightforward. For the purposes of calculating rotation, the marker is modeled as a hemi-
sphere that is projected onto the axis of its diameter. Using the perfect circle assumption, the
magnitude of a rotation (in radians) to a known point from the center of the marker is given
by the following equation: magRotation = arcsin 2x where the variable x represents the ratio
of the center-to-known-point distance to the overall length of the observed marker. In the
rotation calculation, the length of the marker and center point come from the bounding rect-
angle calculation. The “known point” is the junction x-coordinate of the boundary between
distinct color blobs. The rotation value that this boundary represents can be calculated from
50
the colors forming it and will take on values of 60, 180, and 300 degrees with the default
marker configuration.
When this pose estimation has been performed for every candidate, the resulting estima-
tion collection is returned. The next, and last, stage of the vision component will select the
optimal candidate from this collection. This will represent the result of the entire tracking
operation. If no permutation of color blobs could be found to satisfy the constraints of the
algorithm, it is possible that an empty collection of estimations will result from the marker
finder.
4.5 Candidate Selection
The final stage of the vision component takes as input a collection of possible poses that have
been derived from color blobs found in the image. It also accepts the estimate of the position
of the marker calculated at the beginning of the entire vision process. This guess is a function
of the last known pose and an estimation of velocity. This stage must select the best pose
from the input collection to be the result of the entire calculation.
If no possible poses were calculated, the candidate selector immediately returns an invalid
point indicating failure as the result of its calculation. If given a single pose estimation, then
that estimate is returned immediately as the result of the calculation. If the algorithm is
given an invalid point as a guess, as might be the case if the tracking system has just started,
then the largest marker in terms of pixel area is chosen as the result. Otherwise, if there are
multiple helicopters in the frame or if the background of the image is particularly complex,
there may be multiple candidate points, and the “best” one must be chosen.
The case of multiple candidates is handled by iterating over the collection of poses, scoring
each pose with its “distance” from the guess, and retaining the point with the best score. The
scoring algorithm operates by calculating the percent differences between the candidate and
the guess for the properties of x-position, y-position, bounding rectangle dimensions, and
rotation. Each of these percent differences is normalized by the total number of scoring
properties and subtracted from 1.0. Thus, a point with entirely the same properties as
the guess will score a perfect 1.0. The percent change is calculated by subtracting the guess
property from the candidate property and normalizing that amount by the maximum possible
51
change in the property (which is the size of the image frame for coordinate positions and 360
degrees for rotations) ensuring that no score can go below 0. The highest of these scores is
returned.
4.6 Closing Comments
This chapter has discussed the concept of the vision component of the helicopter control
system and how the reference implementation works. Please refer to Appendix A for more
information about how to programmatically interface to the logic described in this chapter
using C++ and OpenCV. The result of the track interface in the vision component is a best
guess of the pose of the helicopter in the camera frame. This pose estimation may be fed into
a feedback controller to automate flying of the helicopter. The reference implementation is
described in the next chapter.
52
Chapter 5
Feedback Control Component
This chapter presents a more thorough investigation into the design and implementation of
a feedback controller for the Syma helicopter control system. While this thesis provides
a “reference” implementation of this component, interested individuals may provide their
own implementations to change the system behavior or extend control to new hardware. A
compatible extension must provide a defined set of services that constitute the interface of
the feedback control component.
The feedback controller’s core responsibility is calculating what movements the controlled
system must execute to bring its pose to a specific value. For a Syma helicopter, these
movements are encoded into the digital signal that is broadcast from the control system
to the helicopter. The primary method of the feedback controller’s interface is the control
method which takes as inputs the current and desired pose of the system and produces a
digital signal meant for broadcast to the helicopter. This digital signal should help bring the
helicopter’s pose to the target state.
This thesis uses the terms “state” and “pose” to mean similar, but slightly different
things. “Pose” is defined as the helicopter’s position in space at a moment in time (x,y,
z, and rotation). “State” includes information from the pose and, additionally, properties of
the helicopter that are dependent on time, such as estimations of velocity and acceleration.
Effective control requires information about the state of the helicopter which must be derived
from individual poses. However, measured pose estimations may not always be accurate or
precise representations of the true pose of the helicopter. When the pose information deviates
53
from the true value, it falls on the feedback controller to limit deviation in its estimate of
state. Techniques for accomplishing this task are discussed in section 5.2 of this thesis, the
State Estimator.
For safety and ease of use, the controller should output a “shutdown” signal when it
does not have a positive idea of the controlled system’s pose. This prevents, for example, a
helicopter from flying before it is being tracked or flying away if the system has lost track of
it. That said, it is unlikely that the tracker will always find a marker even if it is present in
the camera view. The control system should not fail because a single frame was dropped, but
instead, should fail only if an amount of time deemed “unrecoverable” has passed without
new knowledge of the helicopter pose. No definitions for this value are given as it will depend
greatly on the system being controlled. For the Syma S107G reference implementation, this
value is taken to be one second. This is accomplished through the other required method of
the feedback controller interface, reset. The reset method should reset any internal state that
the feedback controller keeps and should zero the output of the controller. From a practical
standpoint, this feature allows for running multiple helicopter trials without resetting the
controlling program, prevents helicopters from flying out of control, and provides a general
mechanism to handle error conditions in user programs.
5.1 The Reference Implementation
A micro helicopter in flight is a very dynamic system and its behavior depends on many
factors that are difficult to control and equally difficult to ignore; factors such as air currents
and ground effects play large roles and a general control system must be robust to them.
The reference implementation of the feedback control system is not such a general controller.
Rather, it is a proof of concept showing that positive control of Syma helicopters is possible
for a noticeable amount of time. That said, the reference implementation also provides a
framework that allows users to build more complex control systems by supporting “hot-
swapping” of control logic. Here, “hot-swappable” refers to a trait of the controller that
allows portions of it to be changed, or swapped, while the system is running, or hot. This
feature is discussed in the Master Controller section of this chapter.
The reference implementation of the feedback controller has three subcomponents:
54
1. a state estimator, which has the job of converting a potentially noisy data stream into
a coherent estimate of the controlled system’s state,
2. a master control unit, which serves as the interface between user programs and under-
lying control logic, and
3. channel control units, which are responsible for individually calculating each system
control channel’s next input, together forming the underlying control logic.
In order to calculate an appropriate output signal, the controller must be aware of the
controlled system’s current pose. This pose is the first argument to the control method. The
interface does not stipulate where this pose estimation comes from. It may be the most recent
value from the tracker or it may come from some user defined, intermediate program. An
example of such a program would be a user defined state estimation system for combining
the results of multiple sensors.
The reference implementation of the control method, however, is designed to accept the
latest pose estimate from the vision component. The first thing this control method does is
use this new pose to update its belief about the state of the helicopter by calling the state
estimator. The reference tracker may return invalid poses to represent a failure to identify
the marker or it may return a noisy pose that does not represent the true pose of the system.
Reducing the sensitivity of the controller to these types of errors is the responsibility of the
state estimator. In addition to error mitigation, this subcomponent also estimates elements
of the state of the system that are derived from pose inputs such as derivative and integral
information.
In the step after state estimation, the input setpoint and the newly updated state estimate
are passed as arguments to a set of “channel controllers”. Each channel controller is respon-
sible for calculating the magnitude of one component of the digital signal to be transmitted
to the helicopter, or other controlled platform. For the Syma S107G, there are three channel
controllers, one each for yaw, pitch, and throttle. When all channel controllers have finished
their calculations, the result of each is assembled into an appropriately formatted signal suit-
able for transmission. This architecture helps keep control logic modular, allowing users to
change the channel controllers to better respond to the current system state as the helicopter
is in flight.
55
The reference controller is designed to effectively stabilize an airborne helicopter in a
camera frame. This means that the user of the program must start the entire control system
so that it is running and then manually insert the helicopter into the target scene, ideally away
from surfaces. The reason for this constraint is that it has proven very difficult to stabilize
the helicopter when it is operating in ground effect, a condition caused by the interference of
a surface with the airflow pattern of the rotor system. This issue was the primary impetus
behind the modular design of the master control unit. A controller specifically designed to
handle take off could be used when the system starts and then swapped for a controller meant
for free air flight.
An interesting characteristic of the Syma 107G helicopter, from a control standpoint, is
the discrepancy between the control and sensing rates. Control signals are broadcast with
a period of 120 to 200 milliseconds (approximately 8 to 5 Hz). New information from the
camera is available approximately every 33 milliseconds (30 Hz). The reference controller
operates by feeding information to the controller at 30 Hz, calculating an output, and sending
this output over serial to the Arduino responsible for signal transmission. The firmware of
the Arduino is design to handle input at a faster rate than it outputs so this discrepancy
is not an issue. Users looking to implement control for new hardware should be careful to
handle “flow control”, either by designing their logic to handle variable input/output rates or
by being careful to throttle output rates.
This concludes the brief tour of the components of the feedback control system. What
follows is a more detailed discussion of how each of these subcomponents operates.
5.2 State Estimation
Whenever the control interface is called, the feedback controller first uses any new pose
information to update its knowledge of the state of the controlled system. To facilitate this
task, each new pose passed as an argument to the control method is immediately given a
timestamp and the resulting structure is cached in memory. Derivatives of all position and
dimension attributes of state are calculated using a first order forward difference scheme.
Integrals are calculated using a running Riemann sum.
It is also possible that the input to the system is a pose that indicates a failure to track
56
or a pose that does not represent the marker being controlled. The other task of the state
estimator is to turn the potentially noisy stream of information from the tracking component
into a more reliable estimate of the helicopter’s true state. The reference implementation
achieves this goal through two mechanisms - first, it removes invalid tracking information and
second, it filters valid components of the input stream.
If an input pose is invalid, then the state estimator sets a flag and discards the invalid pose.
The flag indicates that tracking has possibly failed and a time is associated with the raising
of the flag. If subsequent pose inputs indicate that tracking has been reestablished, then the
flag is set back to false and operation continues as normal. If, however, more invalid poses are
input, the controller examines the difference between the time the invalid flag was set and the
current time. If this time value exceeds a threshold of one second, then the controller’s state
estimation is reset and any calls to it return a state indicating no information is available.
Valid tracking must be reestablished for a full second before the state estimator returns to
normal functioning. Changes in the validity of the state estimation are accompanied by a call
to the reset interface provided by the user.
The other mechanism for ensuring a reliable stream is filtering. There are a variety
of techniques for extracting improved state estimates from a stream of noisy information.
These techniques range from simple moving averages to statistical techniques built on top of
mathematical models of system dynamics. To keep things as simple as possible, the reference
implementation uses a low pass filter to control against spikes in the input stream. Each
update of the state estimator, the new value of attributes, x from the system state, is given
by the weighted average of x from the new pose estimate and x from the current state
estimate. The weight of attributes in new pose samples can be decreased to prove better
filtering against noise or increased to reduce the time response of the filter relative to the
system. Note, however, that the Syma S107G is a fast moving system and a filtering scheme
with a high time constant will be detrimental to effective control.
The filter implementation is simple. For most tracked attributes such as the x coordinate,
y coordinate, and the bounding rectangle dimensions, the process is as simple as estimate =
oldEstimate∗(1.0−r)+newSample∗r where r is the weight discussed earlier. Time dependent
calculations, such as derivatives, are also filtered in this manner but with a different, slower
weight to better control noise. Some other tracked attributes have properties that are not so
57
easily calculated or are not appropriate for control inputs in their raw form.
The information returned by the tracking component is given in the camera’s reference
frame where position and distance values are encoded in units of pixels. Derived information,
such as velocity, is also expressed in terms of pixels. An equivalent pixel distance at two
different depths represents different physical distances due to the effects of perspective. To
be useful for control, values presented in the camera’s reference frame should be converted
to physical coordinates meaningful to the controlled system. Fortunately, the size of the
helicopter’s fiducial marker serves as a constant that can give a physical meaning to pixel
distances. These pixel differences are transformed to units of “marker heights” by dividing
them by the estimated pixel height of the marker. The distances can be further transformed
to any desired unit, but the reference marker design has a height of about one inch, allowing
for easy mental calculations. The height property of the marker is chosen because this mea-
surement tends to be less noisy than the width. The tracking system must recognize two blobs
to recover the entire width, but height requires only one. The validity of this transformation
is predicated on the same assumptions that much of the vision component relies on, including
an assumption that the camera image distortion is minimal and that the full marker height
is detected.
Rotation is a more difficult problem to handle due to the closed nature of its domain
(i.e., the rotation value “wraps”). The tracking system reports a helicopter rotation from 0
radians to 2π radians. Any scheme that implements a form of averaging creates an edge-
case when dealing with rotation. If a helicopter’s counter clockwise rotation continues past
2π radians, the tracker will begin returning poses that have small rotation values (rotation
wrapped around). In this case, if the results are averaged, the new small values will pull the
state estimation of rotation down in value. A control system will interpret this as a sudden
angular acceleration in the clockwise direction, leading to a control value meant to correct it.
An elegant solution to this problem is to use a one-dimensional analog of a quaternion. In
practical terms, a quaternion is a four-dimensional vector, sometimes interpreted as a rotation
axis and rotation amount, often used to represent three-dimensional rotations without being
subject to “gimbal lock” [36]. In this thesis, one dimensional rotations are translated into
a two dimensional vector by calculating the x and y positions of the given rotation on the
unit circle. These unit circle coordinates are averaged in place of the real-value numbers
58
representing rotation. This two dimensional representation is not subject to the wrapping
problem. If desired, real value rotation is recoverable by converting the coordinate pair back
to polar coordinates. As a side benefit, the length, l, of the polar coordinate is an indicator
of the transience of the rotation. Typically, this value will be approximately 1.0, but a fast
rotational motion will cause it to shorten.
When the state estimator has finished its job, the next step for the control method is to
call the master controller. Here, the estimated state will be used to calculate an appropriate
control signal for transmission to the helicopter.
5.3 Master Controller
As discussed in the introduction to this chapter, the reference feedback controller separates
logic for each input channel of the Syma helicopter (yaw, pitch, and throttle) and allows
users to change that logic while the system is running. This is done to simplify the process
of designing controllers which often naturally operate in different states. For example, the
process of helicopter takeoff has different physics than flight in free air and would likely be
controlled using a different technique.
The master control submodule oversees the logistical concerns required to make this imple-
mentation feasible. It takes ownership of the individual channel logic, automatically updates
the state estimate, and collates the output of each individual channel controller into a final
system output. In short, it implements all the necessary facilities to make control work prop-
erly. It also exposes interfaces to allow the user to change a controller by providing specific
program logic and what channel that logic is responsible for. If no channel controller is avail-
able for a given channel, then a call for output on that channel will return a neutral value by
default. For the Syma S107G helicopter, neutral values are 64, 64, and 0 for yaw, pitch, and
throttle respectively. Finally, the master controller supports querying state information from
the state estimator. This facilitates the construction of higher level logic, such as a waypoint
system, on top of the existing system.
The “hot-swappable” controllers are implemented in C++ using polymorphism and a base
class defining the submodule’s interface with pure virtual functions. The master controller also
handles any memory management related to the controllers it possesses. This implementation
59
is just a suggestion based on the general availability of polymorphism in modern computer
languages. Also, please note that the reference controller only supports control based on the
marker center position and rotation fields of the setpoint. Controlling the helicopter marker’s
height and width, equivalent to depth, is not yet supported.
5.4 Channel Controllers
The next to last step of the feedback controller’s control method is the stage where the value
of each signal channel is determined. For the Syma helicopter, there are three channels:
yaw, pitch, and throttle. Each of these channels is associated with a “channel controller”
that embodies a program logic for transforming a given system state to a controller output.
This transformation should accomplish the task of bringing the estimated system state to the
setpoint’s state.
Every call to control results in a subsequent call to each of the registered channel controllers
which take the estimated system state and the desired setpoint as arguments. The order in
which these controllers are called is not defined and users should not make assumptions
of this nature. The only other constraints on the channel controllers is that they should
always produce valid output for the channel for which they are responsible for. On the Syma
helicopter, this means an integer value between 0 and 127.
This architecture was designed with the goal of maximizing the flexibility and extendibility
of the reference controller. A difficulty with this methodology is that it is more technically
complex because it relies on intermediate C++ topics, like inheritance and virtual functions.
This has the potential to put off “would be” student-hackers and for this reason, an alternative,
simpler implementation is provided, one that offers the same reference logic without hot-
swapping but uses nothing more than simple object oriented code.
Perhaps more fundamentally, this model fails to truly acknowledge the coupled nature
of each channel on the state of the system. For example, pitch forward can induce lateral
motion and attempting to yaw can cause a loss of thrust (and subsequent drop in height). It
is assumed that this coupling is not significant enough to be explicitly modeled in the control
system, and that the logic of each controller should be insensitive to this. That said, each
control module has access to the full system state and can optionally choose to neutralize its
60
output while some other variable is brought under control. A perfect example of this is a
pitch controller which suppresses its output while yaw is not close enough to a setpoint.
All of the reference channel controllers implement proportional-integral-derivative (PID)
feedback controllers underneath. These reference controllers work with the classical form of
the PID equation, and base their derivative actions on the process variable. The individual
controllers differ in exactly how they calculate error and in the particular gain values, or
weights, given to each of the proportional, integral, and derivate terms. These gain are preset
to values selected after extensive testing, but users may elect to change them at will.
The following sections discuss the reference implementation for each of the three channel
controllers.
5.4.1 Yaw Control
The yaw channel controller’s primary task is to control the rotation of the helicopter. From the
perspective of the physics of the system, this is one of the easiest attributes to control because
the helicopter carries very little momentum through its yaw motion. However, rotation is also
one of the noisiest parameters to measure because it depends on reliable detection of many
factors. These characteristics suggest that only a degree of proportional control is necessary
for stable control of yaw. Positive control can be made more accurate and faster with integral
control, but only if measurements are fairly accurate.
Note that a yaw channel input of approximately 64 indicates no motion in the yaw direc-
tion. The results of the internal PID are added to this central value to determine the output
of the controller. This center or neutral value varies and the original controller implements
trim control by changing this value. Error for the yaw channel controller is calculated by
finding the signed, minimum path distance between two points on the rotation “circle”. See
figure X for further illustration. The reference yaw control is always active.
5.4.2 Pitch Control
The pitch channel controller’s primary task is bringing the x-coordinate position of the heli-
copter under control. Unlike yaw, motion induced by helicopter pitch carries with it noticeable
momentum. As a result, the pitch controller has a relatively large gain applied to its deriva-
61
tive action in addition to proportional action. It is important to note that a pitch control
input of approximately 64 indicates no pitch motion to the helicopter. Thus, the output of a
PID control should be added to this natural center value.
Error for the pitch channel controller is defined to be the distance between the marker’s
center point and the setpoint normalized by the height of the helicopter marker (see the
discussion on coordinate transformations in the state estimator section). The reference im-
plementation of pitch control is suppressed when the helicopter’s orientation is not close to
reading 90 or 270 degrees, as these rotations represent normal vectors to the camera at which
pitch induced motion stays in the x-y plane of the image.
5.4.3 Throttle Control
The throttle channel controller’s primary task is to bring the altitude of the helicopter under
control. This task is complicated by the nature of control inputs to the helicopter. For all
practical purposes, the state of the system inside the helicopter, including rotor speeds and
battery level, is a black box. The same control input at two different battery levels produces
noticeably different thrusts and the controller must do its best to account for this. After
approximately five minutes, the helicopter will no longer be able to remain airborne.
From a control standpoint, these characteristics indicate that the “neutral point”, or
reference value, at which the helicopter maintains steady altitude is an unknown value that
changes as a function of time. Other variables influence the reference value, including the
weight of any attached markers, the age of the battery, and any environmental effects such
as proximity to a surface, or air currents. The controller attempts to overcome these issues
through the use of a large gain on the integral term of the underlying PID control and a
reasonable guess at the fully charged throttle reference point (approximately 75).
Error is calculated for this controller by dividing the pixel difference between the set point
and the y-coordinate of the center of the marker by the height of the marker. Like the yaw
channel controller, the throttle channel control is always active.
62
5.5 Closing Comments
This chapter has discussed the concept of the feedback controller component of the helicopter
control system and how the reference implementation works. Please refer to Appendix A
for more information about how to programmatically interface to the logic described in this
chapter using C++ and OpenCV.
The result of the control interface in the feedback controller component is a signal suitable
for transmission to the controlled platform that will attempt to bring the system state to a
desired value. This signal can then be sent to the controller responsible for broadcasting the
signal to the platform. The means by which the signal from control is calculated can be
altered during runtime by a user’s own program in the reference implementation. Again, see
Appendix A for more information. The next chapter presents a series of tests to characterize
the performance of both the tracking and feedback control components of the reference system.
63
Chapter 6
Tests and Results
This chapter presents a series of tests and results meant to quantify the performance of each
component of the control system developed in this thesis. The transmission component,
including the Arduino and accompanying electronics, is tested to quantify the performance
characteristic of range. The accuracy and reliability of pose estimations is the primary focus
of the evaluation of the vision component. Finally, the feedback controller is characterized by
examining the system’s behavior under control.
This chapter follows a test and then result format. Each test is presented, its goals stated,
and its setup explained. Following the exposition, the test results are presented and their
significance is discussed.
These discussions attempt to answer how the results relate to the stated thesis goal of
building a low cost (less than $50 USD) supplement to a laptop that will allow a STEM
student to experiment with aerial robotics in a classroom environment and then hack on it to
learn how it works. In addition, this chapter provides a comparison to a very similar thesis by
Currie [34], giving attention to the pros and cons of her approach compared to the approach
of this thesis. This discussion concludes with a critical analysis of the approach to Syma
helicopter control given by the reference controller.
64
6.1 Software Validation of Chosen Algorithms
This section documents efforts to profile the performance characteristics of the developed
thesis software under target operating conditions. Tracking software, in particular, needs to
run fast enough to leave sufficient computer resources for user programs on computer hardware
likely to be found in an educational environment.
6.1.1 Software Profiling
To get a sense of the runtime costs of the vision and feedback control components of the Syma
control system, the reference implementation was computationally profiled. It is important to
characterize the speed of the algorithms used as they will determine what computer hardware
can be realistically used and how much headroom is available for user programs. Time spent
analyzing images and calculating output signals also contributes to the latency of the system,
which in turn influences control. The common commercial web camera that the reference
implementation is designed for runs at 30 frames per second, or 33 milliseconds per frame.
The total processing time for a frame in the reference control system should be less than that.
A memory analysis tool, Valgrind, was used to profile memory usage. These tests help to
determine the suitability of the implementation presented in this thesis for general use. Profile
information will vary with hardware, operating system, and software versions, but general
trends can easily be discovered. The primary hardware for these tests was a Dell Precision
desktop, with Intel Core i7-3770 running at 3.40 GHz with 16 GB of system RAM, running
GNU/Linux (Ubuntu 13.10). The web camera used was a Logitech C270; this conforms to
the USB video device class allowing Linux, the test operating system, to control the camera
properties with a Video For Linux driver (V4L). Time profiling on Linux was conducted using
the operating system’s steady timers which are purpose made timers meant for measuring
durations, as opposed to points in time. Profiling was performed by measuring the time
period between the completion of an image fetch from the video source and the completion
of the helicopter tracking process. (Please note that while profiling data gives a sense of the
approximate run time of an algorithm, the results are highly dependent on system architecture
and any other computational loads.)
Results of the time profiling indicate that the vision component engenders the majority
65
of the computational effort of the control system. The first frame of the vision system takes
a hefty 40 milliseconds to process but after that, the worst case processing time (for a full
image running the by-threshold method) was found to be 6367 microseconds with a standard
deviation of 972 microseconds. These numbers were based off a video sample consisting of
793 frames, subtracting the first, startup frame. The vast majority of this time is spent
thresholding the three channels. The control system, in contrast, only uses approximately 20
microseconds, on average, to calculate the next signal.
Figure 6.1: Graphical summary of most expensive functions during profiling. Note that track-ing and color conversion is responsible for nearly 70% of the computational cost. The controlalgorithms were not expensive enough to be displayed.
Memory usage, per Valgrind’s analysis, was found to be approximately 35 MiB with
approximately 6000 bytes of leaked memory. This lost quantity does not appear to be a
function of time, instead appearing to be coming form the GUI toolkit of OpenCV and not
this thesis’ implementation. Memory usage can be significantly decreased by disabling the
GUI windows used for debugging.
6.2 Vision Component Tests
This section presents the tests used to validate the functionality of the vision control system.
Unless otherwise stated, these tests are conducted by first, recording sample video and later,
feeding that sample to the tracking algorithm as if it was occurring in real time. This prevents
the computationally expensive process of video encoding from unduly influencing the tests.
6.2.1 Rotation Detection Tests
This test assesses the accuracy of the rotation detection algorithm under good sensing con-
ditions. This was accomplished by placing the helicopter on a raised platform so that it was
66
approximately level to and directly in front of the sensing camera at a distance of approxi-
mately three feet with an unobstructed view. For comparison, a second test was run at six
feet. Ground truth was determined by using a protractor to carefully measure the rotation
of the marker relative to the camera plane. The corresponding measurement of rotation from
the vision component was similarly recorded. The helicopter’s rotation was measured in ten
degree increments, and the test was conducted live under conditions representative of the
system’s target operating environment. After the first semi-circle, the helicopter was flipped
in orientation to measure the back angles as the protractor only measured 180.0. This in-
cludes a fiducial marker used for testing that is not precisely constructed or calibrated, but
instead were made with the care and accuracy one might expect from middle school students.
A rotation measurement from the tracking component consists of the average and standard
deviation of three seconds of image frames.
Figure 6.2: Measured and Ground Truth Helicopter Rotation at 3 ft
Figure 6.2 depicts the results of the three foot trial. The maximum error encountered
at any orientation was approximately ten degrees and the maximum standard deviation was
1.64 degrees. The chart of error, shown in Figure 6.3, shows a fairly consistent negative error
across all measured angles with a mean of approximately five degrees. Here, a negative error
indicates that the rotation measurement is greater than the ground truth. An additional
67
Figure 6.3: Error in Degrees between Measured and Ground Truth Rotation at 3ft
interesting feature of the error chart is the presence of pronounced gradients near the “single
color” points of 0/360, 120, and 240 degrees.
The systematic error in the measurement of angle may be a product of several contributing
factors. The first factor is that the fiducial marker’s geometry may not be a perfect and evenly
segmented ring. By carefully measuring the length of each color arc on the fiducial marker
used in this test, the length of the 0 degree and 120 degree centered-color arcs were determined
to be approximately 3.875 inches while, the 240 degree mark was a half inch longer at 4.375
inches. Error when the long side is not present, from 0 to 90, appears to be less than when it
is.
Second, the helicopter itself appears to be mounted to the ring at a slight angle. Ground
truth was established relative to the nose of the helicopter, not the center of the marker itself.
Careful measurement of the marker shows that the difference between the two arc lengths
derived by dividing the front color into two segments at the nose’s position is almost 0.75
inches. This is significant given the approximately four inch length of the arcs.
The gradients found at the “single color” points are indicative of a problem in the design
of the three color fiducial marker. When the helicopter approaches one of these points, there
comes a period of time where, for all practical purposes, there is a single color in view. Before
68
the new color segment is large enough to detect, the vision component’s estimation of rotation
sticks to the angles 0, 120, and 240. At three feet, this “sticking” appears to occur for ten
degrees. At longer range, the issue becomes worse as the size of the color blobs become
smaller.
Figure 6.4: Measured and Ground Truth Helicopter Rotation at 6 ft
For this control system, three feet between the helicopter and the camera is a very close
range. At this helicopter depth, the size of the camera viewport is on the order of a couple
of feet, making it very easy for the helicopter to fly out of view. Six feet represents a more
realistic range for control. 6.4 shows the results of the orientation test at six feet. This chart
displays the same general trends as the data taken at three feet. There is still a consistent
negative bias of approximately five degrees. However, the sticking points at 0, 120, and 240
degrees have grown to be twenty or thirty degrees wide.
As an instrument to facilitate a control system in an educational context, this system
achieves acceptable accuracy. Most IR transmitters powered by a micro-controller will not
broadcast much past six feet in an appreciable arc, so the issue of rotation sticking at longer
ranges is a practical non-issue. Furthermore the irregular marker used in these tests demon-
strates that the reference vision component operates reasonably even without accurately built
hardware. This greatly increases the system’s practical applicability for young students in a
69
Figure 6.5: Error in Degrees between Measured and Ground Truth Rotation at 6 ft
classroom setting as they are unlikely to build their markers with great precision.
6.2.2 Speed of Tracking
This test seeks to understand the system’s limitations when dealing with a fast moving target.
A sheet of white wainscoting, 32 inches wide, was set up behind the test area. By measuring
the size of the same board in the camera image, a transformation can be developed from
pixels to inches. If the helicopter is then quickly translated across the board, the effects of
high velocity on tracking integrity can be measured. Vertical measurements were taken by
projecting the helicopter across the board by hand. Vertical measurements were taken by
dropping the helicopter from the top of the camera frame to the bottom. The free-fall test
represents a close approximation of the maximum possible speed of the helicopter. As long as
the helicopter remains fairly close to the board, the pixel translations can be effectively made
into real world units. Video of each of these tests is piped into a special program that runs the
tracking algorithm and gathers related statistics including total frames, false negative frames,
maximum frame to frame velocity, and most consecutive false negatives.
During the course of the test, the helicopter reached approximate maximum instantaneous
70
(frame to frame) velocities of 110 inches per second, horizontally, and 180 inches per second,
vertically. The chief difficulty with tracking at higher velocities is motion, a phenomenon
due to object displacement during the exposure of the image, even at 30 frames per second.
Figure 6.6 depicts a frame of the helicopter when it is at maximum velocity.
Figure 6.6: Demonstration of Helicopter Motion Blur at 30 Frames per Second
For the tests where the helicopter was introduced and tracked in the frame prior to being
translated, the helicopter was successfully registered in every test frame in which it was
present. In the case where the helicopter literally fell into the frame at high velocity, the
tracking was much less consistent. This appears to be due to the helicopter marker no longer
being approximately parallel with the x axis of the camera image and motion blur. The
rotation issue is a product of an early tracking software design decision, one intended to help
eliminate as many false positives as possible. The second issue appears to manifest itself
when the smaller color blob becomes too distorted to register: Without prior knowledge of
the helicopter position and velocity, the system will not take a single color blob to represent
the helicopter (again to reduce false positives).
71
6.2.3 General Tracking Performance
In general, the tracking component of the Syma control system does a good job estimating
the state of the helicopter. With bright, even lights and a background mostly constituted of
colors not on the fiducial marker, the system’s tracking performance at ranges less than eight
feet is seamless. Computational cost on a modern computer is also encouraging as the 6-8
millisecond processing time is far less than the maximum of 33 milliseconds. However, there
are still a number of issues.
The most obvious issue from testing is the tendency of the tracking system to “stick” to
single colors as the marker turns in a circle. This issue, compounded at range, reduces the
resolution of control that the system can offer. A related issue is a tendency for the system
to fail to track when the helicopter is in front of a color similar to one on its ring. The blob
expansion algorithms will see part of the background and the color blob on the ring as the
same, large object. At best it interprets this as the helicopter rotating. At worst, it drops
tracking all together.
These issues are a function of the information (or lack thereof) retrieved from the heli-
copter’s fiducial marker. Some improvement might be made by inferencing the appearance
of the helicopter based on the history of its movement. Another viable solution would in-
volve modifying the physical properties of the fiducial marker such that it provides viable
information at any angle.
The by-estimate method also tends to produce inaccurate rotations at close range because
the expansion vectors slightly overstep the actual edges of the colors. This occurs because
of a fault-prevention mechanism in the algorithm which allows each vector several errors
before detecting an edge. Solving this issue will involve redesigning the expansion method.
Alterations of this nature and the changes to the fiducial marker mentioned above will be
discussed further in the future work section of the conclusions chapter.
72
6.3 Feedback Controller Tests
6.3.1 Feedback Control Analysis
In this test, the full control system is engaged to hover a Syma helicopter in the center of the
camera view. This test was performed in an open area with the camera approximately four
feet off the ground. The helicopter was inserted manually into the camera frame at a depth
of approximately six feet. The helicopter was allowed to hover until the system was no longer
able to control it.
Figure 6.7: Throttle and Elevation Error For Hovering Helicopter
Figure 6.7 depicts the throttle control output of the system and the normalized error in
elevation with respect to time. Figure 6.8 depicts the pitch control output of the system and
the normalized x axis error with respect to time. As discussed in the controls chapter, the
normalized error is the absolute pixel error at a given time divided by the marker height at
that same time. This transformation helps account for the affects of perspective on the 2D
image. Note that a decrease in the vertical axis in these charts corresponds to an upward
helicopter motion, following the convention in computer coordinate systems. Also note that
where the term “pitch control” is used in this chapter, it refers to the pitch component of the
signal input and not the actual Euler angle. The helicopter actually pitches very little during
73
Figure 6.8: Throttle and X Axis Error For Hovering Helicopter
a given run.
The result of this test is that the helicopter initially corrects its altitude and settles into
stable control after ten seconds. For approximately five and a half minutes, the helicopter’s
elevation gently oscillates with a period of approximately 1.25 seconds and an amplitude that
averaged under four inches. After five and a half minutes, the helicopter’s batteries deplete and
it can no longer maintain altitude. Syma’s specifications state that flight duration of a brand
new helicopter is seven to ten minutes [?]. Anecdotal evidence from extensive testing done in
this thesis seems to suggest this might be a bit optimistic. In addition, the helicopters used in
testing have all been charged many times, so a six minute battery life seems very reasonable.
One interesting feature of Figure 6.7 is the steady climb of the throttle required to stabilize
the helicopter at a constant elevation. When the run started, the helicopter was near fully
charged and hovered at an average throttle input of approximately 73 (on the 0 - 127 scale).
After five minutes, the average throttle required to hover had risen to approximately 103
suggesting that a steady increase of about one throttle point per ten seconds is required
to stabilize the helicopter in air. This rate is small enough that, for this case, the throttle
controller’s integral term took care of the required adjustments.
Another interesting observation is the coupled nature of the throttle and pitch control.
74
Figure 6.9: Normalized and Unnormalized Elevation Error
The oscillations seen in the elevation control, Figure 6.8 appear to be related to the corrections
in pitch. Oscillations on both channels appear to have approximately the same frequency and
phase shift. The helicopter is able to correct its elevation very quickly as this is simply a
matter of perturbing the main rotor speed. It is not able to correct pitch nearly as easily
because this requires changing the direction of the motor on the tail of the helicopter and
waiting for the helicopter to pitch forward or backward. This process of changing direction
can take a couple of seconds and this delay is evident in the stability of the pitch control
using a PD controller. Adding integral control significantly smooths out the entire system’s
performance.
With this pitch integral control, the helicopter maintained its altitude within two inches
of the setpoint for the majority of the run. More dramatically, the average oscillation am-
plitude along the x-axis was reduced to within four inches of the setpoint. Furthermore, the
oscillations on both control channels themselves are much less apparent and are likely caused
as much by perturbations in air currents and error in the tracking system as they are by
control errors. It is likely that further testing and tuning will lead to faster settling time and
perhaps more accurate control. The accuracy of the current tuning is accurate enough that
it is obvious that the helicopter is under positive control and trying to hover at a given point.
75
Figure 6.10: Throttle and Elevation Error For Hover Command with Integral Control
The helicopter can be manipulated in 2D space by adjusting the setpoint of the control
system. The system sometimes fails when the helicopter is given a change in setpoint with
a very large real distance change, especially downward. While trying to achieve the point,
the overshoot inherit in a feedback control system can cause the helicopter to either hit the
ground or fly out of frame.
For the purposes of this control system, these results are very encouraging. When the
tracking component can operate reliably, the control system is very stable and a helicopter
can be flown for the length of its battery life. For students, educators, and enthusiasts, this
system provides a base on which to tweak, prototype, and extend the vision and control
components. There is still work to do, especially in regard to issuing commands in three
dimensional space. Still, the existing control system implementation fills the requirements of
a simple educational system. It demonstrates positive control for extended periods of time
and allows the helicopter to hover at a chosen point with enough accuracy that a student
could realistically run a “mission” with the helicopter.
76
Figure 6.11: Throttle and X-Axis Error For Hover Command with Integral Control
6.4 Failure Modes
The common methods by which a control system run was interrupted were, from most to
least common:
1. the helicopter flew outside of the range of the transmitting LED,
2. the helicopter’s battery ran out of charge,
3. the helicopter failed to get a good start, and
4. the tracking system lost a lock for too long.
The first issue is caused mostly by drift in the Z axis of the camera view, here defined
as into or out of the image plane. The reference implementation of the controller does not
yet actively try to control the depth of the helicopter. Implementing this feature will require
knowing camera intrinsics or an additional training step. Stability of the 2D system was
deemed a priority.
The helicopter tends to be very stable, but some disturbances in the environment or violent
corrections in the helicopter’s pitch can cause significant drift in the Z direction. Care must
77
be taken to point the transmitting LED at where the user expects the helicopter’s average
position for a run to be. Care must also be taken when issuing setpoints the helicopter as it
is possible to command the helicopter to fly right out of the range of the transmitter.
The next most common failure case is battery related. The capacity of the Syma’s battery
puts an effective limit of five to six minutes on any individual control run. It takes a further
half hour to charge the helicopter back up. After a few dozen runs, the battery life begins to
diminish significantly and the helicopter will no longer fly in free air without a replacement
battery. This is an issue for education systems using this technology, requiring replacement
batteries to be installed after a period of time.
Another common cause of failure is on the start. The reference control system does not
handle takeoff, but instead relies on the user to place the helicopter in the camera view.
This is an artifact of early testing which indicated that effective takeoffs were difficult to
execute without the helicopter drifting out of command range. The control system has no
notion of whether the helicopter is being held by a person, so integral and derivative actions
keep functioning. When the helicopter is finally released, the accumulated error due to these
actions can cause the helicopter to quickly fly up out of range or fall out of the sky. This is
another major section for future work.
The last failure mode is fairly rare with clean backgrounds, but can be an issue with
complicated ones. As discussed in the discussion of the vision component’s performance, the
helicopter can fly into a region of an image that makes marker detection difficult. In cases like
these, tracking is often lost at least temporarily. The system responds by repeating previous
commands for a period of one second, and often that is enough to move the helicopter out of
the difficult region. Other times, however, it is not enough and the system tells the helicopter
to fall. In the latter cases, the resulting motion is often too unstable to effectively control.
6.5 Currie’s Thesis
Late in the process of work on this thesis, a closely related work by Sarah Currie at the
University of Rhodes in South Africa was discovered [34]. Currie, as part of her honors
bachelor degree, developed an “autopilot” for the Syma S107G helicopter using a Microsoft
Kinect and an Arduino for signal transmission. This section provides critical analysis of the
78
differences between Currie’s work and this one.
In many ways, the two works are very similar. Both use an external sensing model, the
software for both is built on OpenCV, both use markers to detect rotation, and both appear
to use PID feedback controllers to achieve control. It must be noted that Currie’s thesis is
vague in some respects, especially in regard to implementation details of the feedback control
system and the performance characteristics of the resulting system. Therefore, this section
restricts itself to commentary on the visual tracking methodology employed by Currie.
The most prominent difference between Currie’s approach and the one taken by this work
is the method by which the visual tracking component operates. Currie tracks the helicopter’s
pose using two LEDs, a red one on the nose of the helicopter and a white one on the tail.
The red LED is part of the standard Syma helicopter, except that it is modified to not blink.
The white LED is an external component and is wired onto the Syma’s controlling circuitry.
Currie then uses a Kinect depth camera to first identify the two LEDs in an image stream
(using the RGB component of each image), and next, uses the depth map of the camera to
find the depth of a point between the two LEDs. Rotation was approximately recovered by
multiplying the pixel distance between LEDs by the distance to the helicopter in millimeters
as measured by the Kinect. The resulting value was then divided by 10000 to produce the
final “orientation value” where a value of 6.5-7.0 represented a normal rotation to the camera.
Currie achieves tracking by first thresholding an RGB image to find only the very bright
regions of the image which are assumed to be the LEDs. The binary image produced by the
thresholding operation is then run through an external blob tracking library, identifying each
blob and calculating its properties. The front and back LEDs are differentiated by sampling
the location of each blob in the original image and looking for those that are predominantly
red or blue (Currie notes that the white LED looks blue, likely due to white balance). To
improve the reliability of tracking, Currie applied the CamShift method, discussed in this
thesis’ literature review of this thesis, on the body of the helicopter to calculate a window in
which to use the threshold operation.
Currie’s active marker setup is simple and elegant. It imposes a negligible penalty to
weight and is unlikely to disturb any of the normal dynamics of the helicopter it is attached
to. Furthermore, the active marker should work well under almost any light conditions,
including low or no light situations. This thesis’ passive approach can be very sensitive to
79
light levels with some tunings and is certainly not capable of working in low or no light
environments. Currie’s use of depth cameras gives another dimension that provides useful
information for vision tracking and helicopter pose estimation.
Currie’s methodology poses some difficulties for the purposes of an educational tool. It
requires modification of the helicopter’s internal circuitry, and comparatively expensive equip-
ment, such as a depth camera, to operate. The use of an active marker also means that the
marker must carry a power source or tap into an existing one on whatever platform is being
used. This limits the general usefulness of this method.
From a performance standpoint, Currie’s tracking methodology is significantly slower than
the approach of this thesis. Currie states that a single frame took approximately 28 mil-
liseconds to process on a modern desktop processor, largely due to the expensive CamShift
windowing operation. For comparison, the tracking process in this thesis takes approximate
six to 8 milliseconds. Because of Currie’s use of brightness as the distinguishing factor for the
LEDs, bright backgrounds can produce false positives. Of course, this thesis’ approach fails
when color backgrounds are overly complex.
In general, Currie’s approach to sensing may be more robust for specific applications largely
due to the use of a Kinect depth camera. The approach this thesis, however, is arguably more
available and more practical because it runs faster, requires less expensive hardware, and
requires no external power source.
6.6 Concluding Thoughts
This chapter has presented a series of tests to characterize the performance and behavior of
the entire Syma control system. The software developed in this thesis was profiled for time
and space usage. The vision system’s ability to detect rotation was characterized, as was the
ability of the tracking mechanism to operate with high marker velocities. To test the entire
system, the control system was used to hover a helicopter for approximately six minutes until
the helicopter’s battery was out of charge.
80
Chapter 7
Conclusion
This thesis has presented a control system composed of arts and craft supplies, a toy helicopter,
hobbyist electronics, and supporting software that, when combined with a laptop, allows one
to autonomously and programatically fly a micro air vehicle for approximately $40 USD. The
system uses a Styrofoam colored ring, the surface of which is painted with three equally sized
color segments, affixed to the bottom of a Syma S107G helicopter combined with purpose
made blob tracking software to identify the position and pose of the helicopter in a stream of
images from a consumer web camera. In order to detect the helicopter reliably and quickly,
two complementary methods of blob tracking were composed to produce the final search
method. The first blob tracker uses an estimate of the helicopter’s position based on previous
values to estimate where the system will be and looks for it in that region of the image. The
second blob search method uses a simple but computationally expensive image thresholding
technique to globally search the image for blobs. Blobs are then combined to find possible
candidate points and a selection algorithm picks the most likely candidate as the final pose
estimation.
Pose information from this tracking component is then fed into a set of PID controllers
which are used to stabilize and control the position of the system in real space. These feedback
controllers keep an estimation of the state of the system that is low pass filtered to provide
robustness to noise and also includes state data such as velocity and acceleration. Limited
three dimensional control is achieved by using the height of the color ring, or fiducial marker,
to normalize pixel errors into the physical coordinate system of the helicopter.
81
The resulting control system is capable of hovering the helicopter at a depth of eight feet
and a given (x,y) setpoint for the entire length of the helicopter’s battery life with oscillations
of no more than four inches on average. By manipulating the setpoint, the helicopter can be
programatically controlled in the camera view.
Existing micro air vehicles tend to be research focused and use expensive hardware plat-
forms and expensive sensing equipment. This contribution of this thesis is a very inexpensive
micro air vehicle system that is particularly suitable for STEM education and curious hobby-
ists. The software is designed in a modular fashion and documented with intent of students
hacking on it to learn more about its components and to extend control to new platforms.
7.1 Future Work
The control system developed as part of this thesis has demonstrated the ability to stabilize a
Syma S107G helicopter for a noticeable period of time, but there remains many areas where
improvements can be made to make the entire system faster, more reliable, easier to hack,
and less expensive.
7.1.1 New Hardware Platforms
The most obvious avenue for extending the work done in this thesis is to use the provided
framework to provide control for new hardware platforms. Other three channel IR toy he-
licopter models are excellent candidates because their physics are all very similar and only
the transmission layer would need significant modification. Extending control to other IR
helicopters would be an excellent exercise for students learning about digital signaling and
control systems.
RC platforms, especially small quad-rotor vehicles, would take more work to make them
operational, but offer the benefit of having an additional input channel to counteract lateral
motion with. The transmission device would require the use of a radio transmitter and
knowledge of the quad-rotor’s control protocol. Fortunately, it appears that many small
quad-rotor design share similar protocols, such as the FlySky protocol.
Departing from flying vehicles, an exciting use of the tracking system developed in this
thesis would be for use with ground vehicles. These vehicles could be remote controlled toys
82
or custom made robots, and because of their comparatively stable physics, they could likely be
controlled far more readily. A controller of this nature might be useful in a robot competition
for autonomously controlling a robot relative to a base station (by having the marker on the
robot and camera on the base station), or controlling a robot based on a known beacon point
(where the marker is on the beacon and camera on the robot).
7.1.2 Improved Marker Design
The vision based tracking component of the reference control system also has several obvious
avenues for improvement. These improvements are hardware based, involving modification
of the fiducial marker that is attached to the tracked device, and software based, involving
changes to the nature of the software algorithms used to detect the marker.
One of negative consequences of using the existing marker design is that it is fairly bulky
and easily destabilizes the delicate balance of micro-helicopters if it is not carefully attached.
For an educational product aimed at middle school children in a classroom, it may be an
unrealistic assumption that the style of Styrofoam rings used in this thesis will be consistently
mounted. An improved marker design would greatly improve the usability of the system from
a controls and usability standpoint. An improved marker would ideally be lighter, perhaps by
perforating the ring or constructing it from wires, and would have a mechanism for consistent
mounting. As an example, a new ring could be constructed with a wireframe and hardpoints
specifically measured to mount to a Syma S107G helicopter.
From a tracking robustness standpoint, the reference fiducial marker design might be
improved by adding a fourth color. This change would help to eliminate the “dead zones”
at 0, 120, and 240 degrees where only a single color is realistically detectable and improve
rotation estimation at a distance. Two or three colors should always be visible with this new
configuration. This would come at the cost of complexity in the software, however, where
four colors must be detected, analyzed, and combined to form marker candidates. It may also
be the case that, by making each color blob smaller, accuracy is negatively affected. Some
of these issues could be counteracted through improvements to the software of the vision
tracking software.
83
7.1.3 Tracking Algorithm Improvements
The vision tracking software, as is, also has a fair amount of room for improvement and
optimization. While the software was designed with good programming practices in mind,
it has not been highly optimized. Many of the tracking components operations, such as
blob expansion, do not modify any of their input data and do not rely on the results of
other operations. Thus, there is great opportunity for parallelizing many sections of the
existing code on platforms that support multithreading. On the other hand, the end result of
optimizations of this nature is often much more complicated source code. For an educational
system, the risks may well outweigh the rewards.
The application of a “windowing” function to the input image is a less hardware spe-
cific optimization with the possibility of improving both accuracy and speed in the tracking
software. This windowing function would extract from the input image a subregion that rep-
resented the most likely position of the helicopter, plus some buffer. This concept is distinct
from the by-estimate blob expansion method because it applies not only to color searching
but also to all pre and post-processing steps that must be done to an image. Only this “re-
gion of interest” would be subject to expensive operations like color space conversion, color
thresholding, smoothing, and contour detection.
These optimizations may be necessary to effectively run this system on high resolution
cameras. A user with a 1080p web camera has almost seven times as many pixels to process
as a user with a standard 640 by 480 pixel camera. This translates a factor of seven increase
in computation time for many of the algorithms used in this thesis. Without significant
optimization, images from these cameras must be scaled down to keep the system running in
real time.
In the case of no prior history of the helicopter’s position, this region would be expanded
to match the dimensions of the image. Thus, the windowing operation would convert all of
the search techniques employed into local searches, possibly greatly improving performance.
The video controller’s implementation of the by-threshold method uses a rather primitive
technique of color thresholding. It performs two binary threshold operations on each channel
of the input image, an expensive technique. Moreover, this technique must perform more
operations to handle hues in the red region of the color circle because their values wrap from
84
0 to 180 degrees. As discussed in the literature review, histogram based tracking using back
projections have the potential to improve tracking speed and accuracy, and make the program
code simpler and easier to understand. This technique is no panacea, however, as it requires
its own set of post processing steps and only profiling can determine if it is truly a benefit.
7.1.4 Control Algorithm Improvements
There is great opportunity for future work in altering the reference control system to improve
control of Syma helicopters. A more intelligent method for state estimation that incorporates a
dynamic model of the controlled system, perhaps some variety of Kalman Filter, would benefit
control by allowing improved control during spotty detection. Most interesting, perhaps,
would be the design of a controller that attempts to control for helicopter movement along
the Z axis by manipulating rotation and pitch.
7.1.5 Signal Transmission Device Improvements
The existing solution for transmitting calculated digital signals to the helicopter involves
the use of an Arduino microcontroller, an LED, and a USB cable. This hardware costs
approximately $30 USD, largely due to the Arduino, more than the $20 USD approximate
price of the helicopter itself. Less expensive pre-made hardware exists that might fit the
requirements for ease of use and consistency. If this hardware will only be used in conjunction
with this control system, any device that saves cost and does not compromise performance is
very appealing.
7.1.6 Education Modules
Most of the future work identified by this thesis thus far has been some form of technical
improvement to the system. However the real value of this work lies in the educational
opportunities that it provides. A variety of “educational modules” can be built on top the
existing components.
For middle school age children, just running the entire control system will be a learning
exercise. Together, with some of the tools developed as part of this thesis, such as a PID
tuner and data logger, students can be introduced to the concept of PID controllers. By ma-
85
nipulating gain values and examining the resulting behavior of the helicopter with analytical
tools and their eyes, students can gain an intuitive understanding of control systems.
Extending control to new platforms is a promising exercise for more advanced students.
This task will challenge students to investigate and implement concepts related to digital
signal transmission. In the course of this investigation, students will likely learn not only
about digital signals, but the process of sniffing out and replicating signals. Armed with this
knowledge, students will be able to immediately interface with a huge variety of devices in
their daily lives including many air conditioners, televisions, and cameras.
There are also many opportunities for students to learn about foundational concepts in
computer vision and feedback control. A computer vision module might introduce students
to the concepts of digital images and their encodings, image thresholding, image filtering, and
ultimately, blob detection. Each concept could be introduced in a brief lesson and supported
with documentation and examples from OpenCV. A feedback control module could discuss
concepts such as error and numerical methods for integrals and derivatives.
86
Chapter 8
Bibliography
[1] Amazon prime air. http://www.amazon.com/b?node=8037720011.
[2] Amazon retail page for parrot ar drone 2.0 quadcopter. http://www.amazon.com/Parrot-AR-Drone-Quadricopter-Controlled-Android/dp/B007HZLLOK.
[3] Arduino. http://www.arduino.cc/.
[4] Ascending technologies research price list.http://www.asctec.de/downloads/flyer/AscTec RESEARCH Pricelist.pdf.
[5] Asus xtion pro. http://www.asus.com/Multimedia/Xtion PRO/.
[6] Digispark usb development board. http://digistump.com/category.
[7] First robotics competition. http://www.usfirst.org/.
[8] Helicopter Flying Handbook. http://www.faa.gov/regulations policies/handbooks manuals/aviation/helicopter flying handbook/media/hfh ch13.pdf.
[9] Kiss institute for practical robotics. http://www.kipr.org/.
[10] Kiss institute for practical robotics. http://www.kipr.org/hardware-software.
[11] Lego mindstorm. http://www.mindstorms.lego.com/.
[12] Linux uvc supported devices. http://www.ideasonboard.org/uvc/#devices.
[13] Oksde botball grant. http://ok.gov/sde/oksde-botball-grant.
[14] Opencv. http://opencv.org/.
[15] Opencv basic thresholding operations.http://docs.opencv.org/doc/tutorials/imgproc/threshold/threshold.html.
[16] Opencv documentation on structural analysis and shape descriptor. http://docs.opencv.org/modules/imgproc/doc/structural analysis and shape descriptors.html.
[17] Opencv morphology transformations. http://docs.opencv.org/doc/tutorials/imgproc/opening closing hats/opening closing hats.html.
[18] Outfilming film production and aerial cinematography. http://www.outfilming.com/.
87
[19] Rcgroups.com syma s107 helicopter discussion.http://www.rcgroups.com/forums/showthread.php?t=1176146.
[20] Stanford university autonomous helicopter. http://heli.stanford.edu/.
[21] Teensy usb development board. https://www.pjrc.com/teensy/.
[22] Stem-c partnerships: Computing education for the 21st century, December 2013. http://www.nsf.gov/publications/pub summ.jsp?WT.z pims id=503582&ods key=nsf14523.
[23] Amazon. Estes 4606 proto x nano r/c quadcopter.http://www.amazon.com/Estes-Proto-Quadcopter-Colors-Black/dp/B00G924W98.
[24] Amazon. Syma s107/s107g r/c helicopter.http://www.amazon.com/Syma-S107-S107G-Helicopter-Colors/dp/8499000606.
[25] Pedram Azad. Visual Perception for Manipulation and Imitation in Humanoid Robots.,volume 4 of Cognitive Systems Monographs. Springer, 2009.
[26] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. Speeded-up robustfeatures (surf). Comput. Vis. Image Underst., 110(3):346–359, June 2008.
[27] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition usingshape contexts. IEEE Trans. Pattern Anal. Mach. Intell., 24(4):509–522, April 2002.
[28] Dr. Gary Rost Bradski and Adrian Kaehler. Learning Opencv, 1st Edition. O’ReillyMedia, Inc., first edition, 2008.
[29] Gary R. Bradski. Computer vision face tracking for use in a perceptual user interface,1998.
[30] Andreas Breitenmoser, Laurent Kneip, and Roland Siegwart. A monocular vision-basedsystem for 6d relative robot localization. pages 79–85, 2011.
[31] John Canny. A computational approach to edge detection. Pattern Analysis andMachine Intelligence, IEEE Transactions on, PAMI-8(6):679–698, Nov 1986.
[32] Alvaro Collet Romea and Siddhartha Srinivasa. Efficient multi-view object recognitionand full pose estimation. In 2010 IEEE International Conference on Robotics andAutomation (ICRA 2010), May 2010.
[33] D. Comaniciu and P. Meer. Mean shift: a robust approach toward feature spaceanalysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on,24(5):603–619, May 2002.
[34] S. A. Currie. Auto-pilot:autonomous control of a remote controlled helicopter. Master’sthesis, Rhodes University, 2012.http://www.cs.ru.ac.za/research/g09C0298/index.html.
[35] Daniel F. DeMenthon and Larry S. Davis. Model-based object pose in 25 lines of code.International Journal of Computer Vision, 15:123–141, 1995.
[36] James Diebel. Representing attitude: Euler angles, unit quaternions, and rotationvectors, 2006.
[37] Gang Feng. A survey on analysis and design of model-based fuzzy control systems.Fuzzy Systems, IEEE Transactions on, 14(5):676–697, Oct 2006.
88
[38] Mike Field. Fpga heli, March 2012.http://hamsterworks.co.nz/mediawiki/index.php/FPGAheli.
[39] David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach. PrenticeHall Professional Technical Reference, 2002.
[40] Robin Hewitt. How opencv’s face tracker works.http://www.cognotics.com/opencv/servo 2007 series/part 3/index.html.
[41] S. Jayawardena, M. Hutter, and N. Brewer. A novel illumination-invariant loss formonocular 3d pose estimation. In Digital Image Computing Techniques andApplications (DICTA), 2011 International Conference on, pages 37–44, Dec 2011.
[42] David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J.Comput. Vision, 60(2):91–110, November 2004.
[43] Masayoshi Matsuoka, Alan Chen, Surya P. N. Singh, Adam Coates, Y. Ng, andSebastian Thrun. Autonomous helicopter tracking and localization using aself-surveying camera array. The International Journal of Robotics Research, 26, 2007.
[44] Nathan Michael, D. Mellinger, Q. Lindsey, and V. Kumar. The grasp multiplemicro-uav testbed. Robotics Automation Magazine, IEEE, 17(3):56–65, Sept 2010.
[45] Microsoft. Microsoft kinect. http://www.xbox.com/en-US/kinect.
[46] David Miller, Anne Wright, Randy Sargent, Rob Cohen, Teresa Hunt, and Y Sargent.Attitude and position control using real-time color tracking, 1997.
[47] Katja Nummiaro, Esther Koller-meier, and Luc Van Gool. Color features for trackingnon-rigid objects. Special Issue on Visual Surveillance, Chinese Journal of Automation,May 2003, 29:345–355, 2003.
[48] Katsuhiko Ogata. Modern Control Engineering. Prentice Hall PTR, Upper SaddleRiver, NJ, USA, 4th edition, 2001.
[49] Rapporteur Planning Committee for the Convocation on Rising Above the GatheringStorm: Two Years Later, Thomas Arrison. Rising Above the Gathering Storm TwoYears Later: Accelerating Progress Toward a Brighter Economic Future. Summary of aConvocation. The National Academies Press, 2009.
[50] Vidya Raju. Modeling and control of rc miniature coaxial. Master’s thesis, ETHZurich, 2011.
[51] S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar. Vision-based state estimation forautonomous rotorcraft mavs in complex environments. pages 1758–1764, May 2013.
[52] Syma Toys. Syma x1 4 ch remove control quad copter.http://www.symatoys.com/product/show/1878.html.
[53] Tinne Tuytelaars and Krystian Mikolajczyk. Local invariant feature detectors: Asurvey. Found. Trends. Comput. Graph. Vis., 3(3):177–280, July 2008.
[54] Agustin Vergottini. Arduino helicopter infrared controller. Blogger, May 2011.http://www.avergottini.com/2011/05/arduino-helicopter-infrared-controller.html.
[55] Vicon. Bonita, affordable motion capture. http://www.vicon.com/System/Bonita.
[56] Vicon. Vicon MX Hardware System Reference, 1.4 edition, 2006.
89
[57] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: A survey. ACMComput. Surv., 38(4), December 2006.
[58] Quming Zhou and J. K. Aggarwal. Object tracking in an outdoor environment usingfusion of features and cameras. Image Vision Comput., 24(11):1244–1255, 2006.
[59] Z. Zivkovic. Improved adaptive gaussian mixture model for background subtraction. InPattern Recognition, 2004. ICPR 2004. Proceedings of the 17th InternationalConference on, volume 2, pages 28–31 Vol.2, Aug 2004.
90
Appendix A
User Manual
This appendix presents a brief overview of how to get started flying Syma helicopters using
the control system developed in this thesis. This guide is further broken down into the
following sections:
1. this introduction,
2. an installation guide,
3. an enumeration of the proposed software application programming interface (API),
and
4. a brief guide for those curious about diving deeper into the code.
A.1 Installation
The system developed in this thesis is composed of three individual components which must
be installed or constructed:
1. the visual marker for the helicopter,
2. the micro-controller based signal transmitter, and
3. the tracking software.
91
A.1.1 Tracking Software Compilation
The software compilation and dependencies will vary for each operating system the system
is run on. Please refer to the README.md file that accompanies your distribution of this
file for detailed information about how to build the software from scratch. This project
depends on the OpenCV computer vision libraries and has been tested with version 2.4.
Refer to http://opencv.org/ for more information on obtaining this library. Finally, the
project also depends on a serial library to provide communication between the host
computer and the transmitting device. A library that is functional with Unix based systems
is provided. A Windows operating system version is in development.
Users may elect to forgo the process of building software themselves and use the KISS
Platform IDE, http://www.kipr.org/kiss-platform-windows. This IDE ships with OpenCV
and cross platform libraries for accomplishing various tasks, including serial communication.
Using this IDE, you must import the KISS project file included in your distribution of the
software and compile.
A.1.2 Micro-Controller Setup and Programming
The micro-controller used to implement the transmission device is the Arduino Uno. To
prepare the Arduino for use with this project, visit http://www.arduino.cc/ and download
the latest edition of the Arduino IDE. This IDE comes with libraries necessary to interact
with and program the Arduino over a USB connection. Once installed, follow the
instructions located in README.md file that accompanies the software to program the
Arduino. After the device is programmed, one may insert an IR LED into the appropriate
pin on the micro-controller to complete the transmitter setup.
A.1.3 Visual Marker Construction
The final stage of preparation involves constructing the fiducial marker to attach to the
helicopter. Required materials include: a Styrofoam cup with an outermost diameter of
approximately 3.5 inches, three different colors of construction paper (ideally far apart on
the color wheel, e.g., yellow, red, and blue), and hot-glue.
1. Remove the top, “lip” portion of the Styrofoam cup using scissors or a knife.
92
2. Measure the height and diameter of the ring, and calculate the approximate
circumference.
3. Place the construction paper in a single stack and on the top layer mark a rectangle
with the appropriate height and width to cover one third of the ring. It is better to
error on the side of too long.
4. Cut out all of the rectangles at once.
5. Carefully fold the color paper around the outside surface of the ring. Check to make
sure the dimensions are correct before proceeding.
6. If the lengths are too long, carefully trim all three of the colors at once until the fit the
right.
7. Place a small strip of hot glue under the end of each color strip and affix them to the
marker.
8. Orient the ring the correct manner for the desired color configuration.
9. Place the helicopter inside the ring and align it such that the front color’s middle
point is aligned with the nose of the helicopter.
10. Glue the marker to the helicopter at the nose and at the two supporting beams on the
rear of the helicopter.
A.1.4 Running the System
Once each of the software, transmitter, and color marker are properly installed and
configured, the system is ready to run. To begin, load the example application code and
compile it with the appropriate makefile or KISS IDE. The details of the API are given in
the following section.
Before starting the application, configure your operational space. The system should ideally
be used in a bright environment with a plain background of dissimilar color to the marker.
The next step is to train the system on the marker under these conditions. To do so, build
and run the “Syma Configure” application and follow the instructions displayed on the
console. This process will generate a configuration file which contains information about the
93
colors on your marker and the physical properties of your camera. After training, the
system is ready to run.
Place the camera on an elevated surface, such as a table, so that is of similar elevation to
the desired helicopter position. Place the transmitter under the starting position of the
helicopter and connect it to the host computer with a USB cable. When this is
accomplished, build and run the sample application.
1. When the application begins, it will begin searching for the helicopter marker.
2. Turn on the helicopter and hold it in the center of the camera image for a second. The
tracking system will begin issuing commands.
3. When the helicopter blades start spinning, let go of the helicopter.
4. The helicopter can then be manipulated by clicking on the camera image to change
the setpoint.
A.2 API
The software that powers the tracking engine, control system, and transmission device is
programmed in C++ using object-oriented programming techniques. Users who would like
to program to these interfaces, please skip to the next section.
For educational purposes, the system also comes with a simpler, C based API for those
interested in getting started quickly. Listed below are the function prototypes for this API
and associated descriptions.
Listing A.1: Simple Control API
typedef struct {
int x;
int y;
int width;
int height;
float yaw;
} syma_state_t;
94
/*
Starts the tracking system.
configFile - the name of the configuration file produced by the config program
serialPort - the name of the serial port that the transmitter is connected to
useInternalCamera - if 1, the program takes ownership of the camera and must
be updated with syma_refresh(). If 0, user is responsible for providing
images via the syma_refresh_by_image() function.
Returns - 1 on success, 0 on failure
*/
int syma_start(const char* configFile, const char* serialPort, int
useInternalCamera);
/*
Releases resources and closes any open connections
Returns - 1 on success, 0 on failure
*/
int syma_stop(void);
/*
Manually issue a command to the helicopter
pitch - full forward = 0, full backward = 127, neutral = 64
yaw - full right = 0, full left = 127, neutral = 64
throttle = min throttle = 0, max throttle = 127
Returns - 1 on success, 0 on failure
*/
int syma_send_signal(unsigned char pitch, unsigned char yaw, unsigned char throttle);
/*
Updates camera image, tracks, runs control logic, and issues command.
This pulls a camera image from the iternal camera object. If compiled with
KISS, the camera object singleton from libkovan is used. If syma_start
was not called with useInternalCamera set to true, this method
immediately returns a failure.
95
Returns - 1 on success, 0 on failure
*/
int syma_refresh(void);
/*
Same as syma_refresh() but the user must specificy a pointer to the image
data in a BGR888 format. Used with the syma_start() method with the
useInternalCamera flag set to false.
Returns - 1 on success, 0 on failure
*/
int syma_refresh_by_image(const unsigned char* data, int width, int height);
/*
Retrieves state information from last tracking update.
Returns - struct with info about helicopter state. Position is -1, -1 if
tracking failed.
*/
syma_state_t syma_get_state(void);
/*
Sets the target position of the helicopter
target - state struct with information about desired position of the
helicopter. May not activally control on all channels. See documentation.
Returns - 1 if the target was valid, 0 if not
*/
int syma_set_target(syma_state_t target);
/*
Set the pixel position target of the helicopter. Infers desired yaw to be 90
degrees from the camera.
x - x pixel position target
y - y pixel position target
Returns - 1 if the target was valid, 0 if not
96
*/
int syma_set_target_pos(int x, int y);
/*
Retrieves the width of the current image in pixels
Returns - value greater than zero on success, 0 on failure
*/
int syma_get_cam_width(void);
/*
Retrieves the height of the current image in pixels
Returns - value greater than zero on success, 0 on failure
*/
int syma_get_cam_height(void);
Using this API, the helicopter can be stabilized at a set yaw in the center of the image with
the following simple program.
#include <stdio.h>
#include "syma_simple.h"
int main(void)
{
// Start the tracking library and check to make sure it
// succeeded
if(syma_start("confix.txt", "/dev/ttyUSB0", 1) == 0) {
fprintf(stderr, "Failed to initialize library\n");
return 1;
}
// Set initial target
syma_set_target_pos(syma_get_cam_width()/2, syma_get_cam_height()/2);
while(1) {
// Pull camera image and track
97
syma_refresh();
// Get information about state
syma_state_t state = syma_get_state();
// Check to see if tracking was lost
if (state.x < 0 || state.y < 0) {
break;
}
}
// Cleanup
syma_stop();
return 0;
}
A.3 Hacking
To jump into the inner workings of the existing library, please consult the following files:
SymaEngine.cpp/hpp These files define the logic behind the tracking system. This is an
excellent place to start if you are curious about modifying the tracking logic.
SymaEngineSearch.cpp defines the methods responsible for composing blobs and
estimating pose.
SymaController.cpp/hpp These files define the logic behind the master controller. The
files SymaPid.cpp/hpp implement the default logic for each of the PID controllers.
SymaStateEstimator.cpp/hpp These files define the logic for filtering the output of the
tracking system before being forwarded to the controllers.
SymaMatUtils.cpp/hpp and SymaBlobSearch.cpp/hpp define the helper utilities
that are used to directly manipulate and process images. The by-threshold and
by-estimate implementations are defined in these files respectively.
SymaStrategy.hpp This file defines a virtual class, or interface, for control logic. If you
are interested in implementing your own control logic to work with the existing
interface, this is a great place to start.
98
These files constitute the core logic of the tracking and control system. Other files provide
support.
99