a low cost, vision based micro helicopter system for ...c3p0.ou.edu/irl/theses/meyer-ms.pdfthis...

A Low Cost, Vision Based Micro Helicopter System for

Education and Control Experiments

Jonathan Meyer

April 2014

UNIVERSITY OF OKLAHOMA

GRADUATE COLLEGE

A LOW COST, VISION BASED MICRO HELICOPTER SYSTEM

FOR EDUCATION AND CONTROL EXPERIMENTS

A THESIS

SUBMITTED TO THE GRADUATE FACULTY

in partial fulfillment of the requirements for the

Degree of

MASTER OF SCIENCE

By

JONATHAN W. MEYERNorman, Oklahoma

2014

A LOW COST, VISION BASED MICRO HELICOPTER SYSTEMFOR EDUCATION AND CONTROL EXPERIMENTS

A THESIS APPROVED FOR THESCHOOL OF AEROSPACE AND MECHANICAL ENGINEERING

BY

Dr. David Miller, Chair

Dr. Zahed Siddique

Dr. Harold Stalford

c© Copyright by JONATHAN W. MEYER 2014All Rights Reserved.

DEDICATION

To my parents, Randy and Kathy. Thank you for all your support.

Acknowledgements

I would like to thank Dr. David Miller, my advisor and mentor, for his advice and infinite

patience. I would also like to thank my committee members, Dr. Harold Stalford and Dr.

Zahed Siddique.

To Braden McDorman and Nafas Zaman, thank you for your help in developing software

and interfacing with KISS IDE. Also of invaluable help were the many enthusiasts of computer

vision, electronics, and toy helicopters whose collective knowledge made this work possible.

Last, but not least, I want to thank my parents for their patience, understanding, and

support.

iv

Contents

List of Figures vii

1 Introduction 11.1 Statement of Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Review 52.1 Micro Air Vehicle Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Alternative Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Computer Vision and Object Tracking . . . . . . . . . . . . . . . . . . . . . . . 92.4 Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5 Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.6 Concluding Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Methodology 203.1 Requirements and Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Hardware Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Software Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4 Algorithms and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4.1 Fiducial Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.2 Vision-Based Helicopter Tracking . . . . . . . . . . . . . . . . . . . . . . 293.4.3 Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.4 Helicopter Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 The Tracking Component 364.1 Reference Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2 By-Estimate Blob Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.3 By-Threshold Blob Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.4 Marker Candidate Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.5 Candidate Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6 Closing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Feedback Control Component 535.1 The Reference Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.2 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3 Master Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.4 Channel Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4.1 Yaw Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.4.2 Pitch Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

v

5.4.3 Throttle Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.5 Closing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6 Tests and Results 646.1 Software Validation of Chosen Algorithms . . . . . . . . . . . . . . . . . . . . . 65

6.1.1 Software Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Vision Component Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.2.1 Rotation Detection Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 666.2.2 Speed of Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.2.3 General Tracking Performance . . . . . . . . . . . . . . . . . . . . . . . 72

6.3 Feedback Controller Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.3.1 Feedback Control Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.4 Failure Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.5 Currie’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.6 Concluding Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7 Conclusion 817.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.1.1 New Hardware Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . 827.1.2 Improved Marker Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 837.1.3 Tracking Algorithm Improvements . . . . . . . . . . . . . . . . . . . . . 847.1.4 Control Algorithm Improvements . . . . . . . . . . . . . . . . . . . . . . 857.1.5 Signal Transmission Device Improvements . . . . . . . . . . . . . . . . . 857.1.6 Education Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8 Bibliography 87

A User Manual 91A.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

A.1.1 Tracking Software Compilation . . . . . . . . . . . . . . . . . . . . . . . 92A.1.2 Micro-Controller Setup and Programming . . . . . . . . . . . . . . . . . 92A.1.3 Visual Marker Construction . . . . . . . . . . . . . . . . . . . . . . . . . 92A.1.4 Running the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

A.2 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94A.3 Hacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

vi

List of Figures

1.1 Stock Syma S107G Toy Helicopter . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Binary Thresholding Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 A Histogram Tracking Operation to Detect a Human Face [40] . . . . . . . . . 132.3 Background Subtraction Before and After . . . . . . . . . . . . . . . . . . . . . 142.4 Example of Canny Edge Detector . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5 Feature Matching Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Arduino Uno R3 [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Reference 3-Color Fiducial Marker Constructed From Styrofoam . . . . . . . . 273.3 Fiducial Marker Attached to Syma Helicopter . . . . . . . . . . . . . . . . . . . 283.4 Syma S107 Signal Indicating Full Throttle and Neutral Yaw, Pitch, Trim . . . 32

4.1 A Tracked Helicopter Marker with Rotation Estimate in Degrees . . . . . . . . 404.2 Result of by-estimate method on figure 4.1 . . . . . . . . . . . . . . . . . . . . . 424.3 Result of by-threshold method on figure 4.1 . . . . . . . . . . . . . . . . . . . . 454.4 Example of Single Color Fixation . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.1 Graphical summary of most expensive functions during profiling . . . . . . . . 666.2 Measured and Ground Truth Helicopter Rotation at 3 ft . . . . . . . . . . . . . 676.3 Error in Degrees between Measured and Ground Truth Rotation at 3ft . . . . . 686.4 Measured and Ground Truth Helicopter Rotation at 6 ft . . . . . . . . . . . . . 696.5 Error in Degrees between Measured and Ground Truth Rotation at 6 ft . . . . 706.6 Demonstration of Helicopter Motion Blur at 30 Frames per Second . . . . . . . 716.7 Throttle and Elevation Error For Hovering Helicopter . . . . . . . . . . . . . . 736.8 Throttle and X Axis Error For Hovering Helicopter . . . . . . . . . . . . . . . . 746.9 Normalized and Unnormalized Elevation Error . . . . . . . . . . . . . . . . . . 756.10 Throttle and Elevation Error For Hover Command with Integral Control . . . . 766.11 Throttle and X-Axis Error For Hover Command with Integral Control . . . . . 77

vii

Abstract

Due to a push to improve K-12 STEM education using robotics, and the popularity and high

cost of autonomous flying robots, there exists a niche for a small, low cost, accessible, and

hackable drone platform designed for students and enthusiasts. This work presents a low cost

(<$50 USD) supplement to a laptop that allows a student to achieve positive feedback control

of a Syma S107G micro helicopter. Helicopter tracking is achieved by detecting a color-based

fiducial marker, attached to the bottom of the helicopter, in a digital image captured from a

standard web-camera. A custom software system, written in C++ with OpenCV, detects the

presence of each color on the marker in an image and assembles these colors into an estimation

of the pose of the helicopter. A set of PID controllers then computes an appropriate signal for

the helicopter and relays it using an Arduino microcontroller and an IR LED. The resulting

system is capable of hovering a Syma helicopter, indoors, at a specific image coordinate and

rotation, to within a few inches and 20 degrees, until battery failure. The vision system runs

in real time at 30 Hz and tracks reliably with even lighting and no significant environmental

influences. The helicopter can be programatically moved by altering the controller setpoint.

Also developed is a simple to use C interface to the control system and documentation for

underlying components. Most importantly, the resulting system lowers the barrier for students

and enthusiasts to explore concepts in robotics, such as computer vision and control systems,

on an aerial robot.

viii

Chapter 1

Introduction

This thesis presents the design and implementation of a low cost, vision based micro helicopter

system for education and control experiments. In this chapter, the topic, motivation, and

organization of this thesis are briefly introduced. For students and instructors interested in

how to quickly get setup, please see Appendix A.

1.1 Statement of Problem

In 2007, the United States Congress commissioned a pre-eminent committee of educators,

scientists, and engineers to produce a report on how the United States can maintain its

position as a world leader in technology and science, and maintain a bright economic future

[49]. This committee’s number one recommendation was to vastly improve K-12 science

and mathematics education. As a part of this push, the National Science Foundation, the

Department of Education, state governments and many corporations have devoted money

and expertise toward improving STEM education with robotics programs [13][22].

Robotics naturally integrates many technical disciplines as the design and production of

a robot involves a wide range of technical skills from many disciplines, including science,

mathematics, programming, electronics, mechanics, signal processing, and computer science.

Furthermore, students working in teams to solve problems in robotics must also practice

essential life skills such as teamwork, design, problem solving, and resource management.

As a tool for teaching programming and computer-science, robotics provides an exciting

1

opportunity for students to see the results of their learning come to life before them. One of

the primary challenges in teaching and learning programming is the practice’s fundamentally

abstract nature. Students may struggle to see the practical relevance of their learning, espe-

cially early in their education. On the other hand, one of programming’s greatest educational

aspects is its rapid feedback cycle for a creative endeavor. Robotics education offers a way

to make programming concepts less abstract without harming the creative, short-feedback

cycle. As an example, the programming concepts relating to loops and conditionals might be

reinforced by having a student implement navigation by sensory servoing.

Unfortunately, getting started in robotics often requires significant knowledge, money, and

hardware, presenting a high barrier to entrance to students of the discipline. To counteract

this, various organizations have produced robotics kits that come with instructions and all of

the pieces required to get started. These kits may be divided into two major categories: those

meant for teams of students such as KIPR [9] and FIRST [7], and those meant for individual

students or enthusiasts. Often, these platforms share hardware, especially the computer

processors, as in the case of the Lego Mindstorm [11]. The Arduino [3] is an 8-bit micro-

controller platform that has enjoyed a surge of popularity among electronic hobbyists and

should also be considered when talking about robotics kits, though it requires other hardware

to be interesting. There exists a niche in the individual educational robotics products for

something in the low cost market. In particular, there is a niche for flying robots, a domain

currently dominated by expensive research equipment and augmented reality video games.

Unmanned aerial vehicles, or drones, are one of the hot topics of the day and their use has

rapidly expanded beyond military operations to new domains like fast product shipping [1]

and aerial photography [18]. There is opportunity for a low cost, accessible, and hackable

“small scale” drone platform to make a difference in the educational robotics movement.

To that end, the goal of this thesis is the creation of a flying robot system suitable for use

in education that:

1. is inexpensive,

2. is programmable,

3. is easy for a single person to use,

4. is hackable,

2

5. provides closed loop control, and

6. works in a class room environment.

Stated succinctly, this thesis demonstrates that it is possible to build a low cost (≈ $50

USD) supplement to a laptop that will allow a STEM student to experiment with aerial

robotics in a classroom environment that exhibits these properties. This thesis presents a

complete control system that satisfies this requirement using computer vision techniques with

a fiducial marker to exert positive control on a Syma S107G microhelicopter, seen in Figure

1.1. This system is divided into generic components that perform tasks of sensing/pose

estimation, control calculation, and signal transmission with reference implementations for

each one that work with the Syma S107G. These components can be swapped to provide

control for new hardware platforms.

Figure 1.1: Stock Syma S107G Toy Helicopter

1.2 Thesis Road Map

This thesis discusses the construction of a feedback control system for a toy helicopter in

terms of both philosophy/architecture and actual implementation. Those interested in get-

ting started with the software as quickly as possible may skip directly to Appendix A, the

3

User Manual. This appendix provides simple examples and documentation of the reference

implementation being used to control a Syma Helicopter. It also discusses the interfaces of

each component and is an excellent starting point for those looking to hack on this system or

extend control to a new device.

For those interested in the theory behind the vision and control components of the system,

chapter 2 provides a literature survey of fields relevant to the control of micro air vehicles and

object tracking through computer vision. Chapter 3 outlines the general methodology and

decision making process behind the hardware choices and each component of the reference

control system. In addition, this chapter discusses the signal transmission component of the

control system, responsible for digitally transmitting control signals over an IR interface.

Chapter 3 provides an excellent starting point for those looking to understand how the entire

system works under the hood. Chapters 4 and 5 go into detail about the precise logic behind

the operation of the vision/tracking system and the control/feedback systems respectively.

Chapter 6 presents tests and experiments that verify the functionality of different components

of the reference system. It also presents a discussion on the merits and shortcomings of the

reference system as well as suggestions for future improvements. Appendix A documents

the reference implementations application programming interface as well as the necessary

mechanisms to begin developing new components that will fit in this system.

4

Chapter 2

Literature Review

The following section presents an investigation into existing options and methods for flying

robots. A complete, autonomous robot consists of a hardware platform, a means of sensing the

environment, and logic that integrates sensing with hardware (often expressed in software).

This review is organized into two sections. The first section examines existing micro air

vehicle research with respect to the project goals. Different system architectures, specifically

on-board versus external sensing, are considered; additionally, there is an investigation into

low-end consumer products such as toy helicopters and their viability as an autonomous

platform. The second section is an investigation into existing solutions for sensing and control,

including relevant computer vision methodologies.

2.1 Micro Air Vehicle Research

The GRASP Laboratory at the University of Pennsylvania is perhaps the most visible re-

search group developing autonomous micro-helicopter systems. Their research focuses on the

creation of robust control algorithms for flying robots, using purchased platforms ranging

from 750 grams in weight to more than 2 kilograms. This work has applications for the fields

of surveillance, precision farming, search and rescue, and more. Most relevant to this thesis

is GRASP’s multiple MAV test bed [44], and its primary flying platform, the off-the-shelf

ASCTEC Hummingbird MAV. For state estimation, the lab is equipped with a 20 camera

VICON motion capture system which operates at 375 Hz, measures positions to an incredible

5

accuracy of 20 micrometers, and maintains tracking even if all but one camera is occluded.

Each MAV is equipped with inertial measurement units (IMUs), electronic devices that mea-

sure velocity, orientation, and gravitational forces with a combination of accelerometers and

gyroscopes. These devices inform an internal control loop which in turn sets motor speeds at

600 Hz to maintain a desired heading and position. The motion capture system records MAV

positions at 100 Hz and relays global commands to individual units at the same rate. Each

Hummingbird then performs any further processing with a 600 MHz ARM processor running

a version of the Robot Operating System (ROS).

For many research groups, the ultimate goal is the production of MAVs that are fully

capable of operating autonomously in the real world under possible harsh conditions. Toward

this goal, these expensive and elaborate sensor mechanisms are a means to ensure that the

feedback and state estimate components of a tested control system is as precise as possible.

Other research groups focus on techniques for state estimation using only on-board sensing

with an aim of producing fully autonomous helicopters. [51] combines control algorithms

developed in the GRASP test bed with on board sensing in the form of 2 wide-angle cameras

operating at 20 Hz and an IMU updating at 100 Hz. Other researchers at ETH Zurich have

produced systems that rely not on stereo vision or depth cameras, but on optical flow, a

technique that refers to the pattern of apparent motion in a camera scene when there is

relative motion between the camera and scene. Even so, these systems still rely on IMUs to

provide stability while using their more advanced techniques for pose estimation.

While inspiration may certainly be drawn from these pioneers in the field of micro air

vehicle control, the expensive hardware they employ and the technical expertise they require

puts them well outside the reach of most primary schools and enthusiasts. For example, the

Hummingbird MAV vehicle used by GRASP retails for approximately $5000 [4]. A Vicon

motion capture system costs tens of thousands or more [55]. Even less expensive sensor

systems, such as the depth cameras used on Kumar’s autonomous vehicles, cost at least a

hundred dollars per unit [45]. Low cost units such as the Parrot AR Drone and the crowd

funded R10 platforms typically cost no less than $200 and often much more when equipped

with all necessities for autonomous flight [2]. Schools and enthusiasts, however, do not have

the same set of requirements for rigor and accuracy that cutting edge research does. There

may be alternative platforms and sensing models better suited to their needs.

6

2.2 Alternative Platforms

Many of the previously mentioned platforms are notable for having their robots perform at

least some of the calculations required for controlled motion. This architecture is natural

for the development of a robust and autonomous system but there is a primary drawback.

For the purposes of educational research, the equipment needed for effective on-board sensing

tends to make the associated platform more complex, more expensive, and physically larger.

To mitigate these issues as much as necessary, many systems (such as those at the GRASP

lab) use a hybrid approach where high frequency information such as IMU data is handled

on-board and other jobs are performed off-board (e.g., expensive ones like path planning).

If an on-board sensing and control approach is not feasible because of weight limits, size

constraints, cost considerations, or design choice, then external sensing is required to achieve

sensing and close the loop. A particularly relevant research group is the Automatic Control

Laboratory at ETH Zurich where small co-axial helicopters are controlled via external cameras

[50]. In particular, this setup used a single VICON depth camera tracking four optical markers

attached at cardinal directions on a small co-axial helicopter. The camera provides feedback

information about the system pose and an external controller interpreted that information

and issued appropriate commands. Students at Stanford have also constructed a tracking

system for a larger-scale flying helicopter that uses an array of three uncalibrated digital

cameras at known locations. These cameras relay images to a central computer that performs

background subtraction to identify the helicopter and triangulate its position at 30 Hz [43].

The hardware on these external-sensing based systems tends to be simpler and cheaper

than hardware no their on-board counterparts. This comes at the cost of generality; that

is, having to set up external sensors means that robots only operate in specific areas and

under conditions that allow for the platform to be effectively sensed. Furthermore external

cameras and sensing equipment typically operate at lower frequencies than an on-board IMU

at similar price points which can present difficulties in highly dynamic systems. If this is

not an issue, however, the limited range, low cost, and simplicity are appealing benefits for a

small, educational system meant for a class room environment.

Hardware options for this sensing/control model are far more diverse. An entire world

of ‘toy’ helicopters that are inexpensive, simple, widely available, and robust is available.

7

These systems are typically divided along feature lines such as the control interface (radio vs.

infrared) and platform style (co-axial vs. quad-rotor). Radio controlled interfaces operate on a

radio spectrum that allows many units to be individually flown at a time, typically over longer

ranges than infrared, without line of sight, and allow operation both indoors and outdoors.

On the other hand, infrared interfaces implement different channels either in hardware by

changing the signal modulation or in the protocol of the signal itself. This means that if

multiple infrared hardware platform are used at once, there may be interference. Infrared

also has shorter range and outside operation is limited due to the effects of the sun. For all

its disadvantages, infrared is cheaper and often easier to implement, typically requiring only

a single LED and resistor (the total cost of which is often less than a dollar) in addition to

a device to drive the LED. In certain cases, its narrow broadcast angle and limited range

can be advantageous if multiple devices accept the same input and only should be controlled.

A micro air vehicle platform built from components has the luxury of choosing its means of

communication and control. If using an existing platform, then that platform should be chosen

with consideration for its communication protocol because it usually difficult to change.

In the platform-style debate, quad-rotors are traditionally more stable, easier to tune and

repair, but are more expensive. Examples of viable off-the-shelf radio controlled helicopters

include the $40 Syma X1 [52] and the similarly priced Estes 4606 Proto X Nano [23]. On

the co-axial side, the Chinese brand Syma has inundated the market with a huge variety of

model helicopters. Their least expensive model, the $20 Syma S107, is the most popular;

this may be attributed to its inexpensiveness, robust construction, available replacement

parts, and friendliness toward hackers. In particular, at the beginning of this research, there

was already approximately 800 pages of discussion on RCGroups.com forums regarding the

Syma helicopter, much of it devoted to creating alternative controllers [19]. In May 2011, a

blogger named Agustin Vergottini published an article discussing his efforts to sniff out the

Syma protocol and create a simple controller for it [54]. Many other articles followed suite

using more advanced equipment such as logic analyzers and field-programmable gate arrays

(FPGAs) to bring Syma controllers to greater sophistication [38].

Despite the interest, there did not appear to be significant effort in the generation of

automated controls systems - merely alternative forms of remote control. Late in the process

of writing this thesis, very similar research [34] used an Arduino MCU, IR Led, a Syma S107,

8

and a Microsoft Kinect depth camera to achieve an autopilot for this system. Currie’s work

with respect to this thesis will be discussed in greater detail in the Results section.

2.3 Computer Vision and Object Tracking

Whatever the system architecture for a flying robot, there exists a need for sensors to ‘close

the loop’ and provide feedback for control inputs. This section of the thesis focuses on a

particular class of methods for achieving this closing. The choice of hardware for this task

must be guided by the physics of the system. For example, the sensor must be capable of

detecting the helicopter in the entire operational zone. If the helicopter can be constrained

to flying up and down, then a variety of range finding transducers from sonar to IR to laser

are viable. These sensors can be polled very quickly and return information that represents

physical units of distance with little or no processing. On the other hand, if the helicopter is

allowed to fly unconstrained, then it is unlikely that any sensor with a narrow field of view

will be useful for general control. The existing literature on the control of micro air vehicles

focuses almost entirely on the use of computer vision to solve this problem.

Digital cameras offer the flexibility, resolution, and wide sensing range that is required to

provide control for a dynamic system like a micro helicopter. The cameras themselves come

in many varieties. Much of the existing cutting edge research in camera based control systems

use motion capture cameras, purpose built cameras that operate at 100s of hertz and (most

frequently) use special IR reflective tags attached to the tracked medium to locate it [56].

In recent years, depth sensing cameras have become far more prevalent after being heavily

pushed by the game industry. These cameras, such as the Microsoft Kinect [45] or Asus Xtion

[5], provide a user with a stream of pixel images where each pixel is characterized by a red

value, a green value, a blue value, and a depth approximation. Other users have used the

simpler, commercially available cameras that come equipped with most modern computers.

While these cameras do not provide depth information, they are far more common and much

less expensive. Motion capture cameras are typically priced in the thousands of dollars;

Vicon’s own ‘affordable’ brand, the Bonita, starts at $12,500 USD for two cameras and simple

supporting hardware/software [55]. Depth cameras begin at approximately $100 [45]. Web

cameras, however, cost tens of dollars and more often than not are a sunk cost for users.

9

An important side note regarding camera selection is that chosen web cameras should ideally

support the USB Device Class Definition for Video Devices (UVC) standard so that generic

drivers can be used. A list of compatible devices for Linux systems, which should also work

on other systems, is maintained by the Linux UVC Development community [12]. Modern

laptops are typically equipped with cameras that are compliant with this standard.

The interpretation of information from these devices falls to the techniques of computer

vision which involves the acquisition, processing, analyzing, and understanding of images

often with the assistance of knowledge from the domains of physics, statistics, geometry, and

learning theory [39]. The field is a wide one, but of particular interest to this thesis is the

sub-domain of video tracking. A tracking algorithm for closed loop control of a helicopter

needs to perform the following tasks in real time:

1. identify target object in scene,

2. follow the object as it moves through the video sequence, and

3. estimate the state of the system (including positions, derivatives, etc.).

Common challenges to all vision algorithms are the effects of variations in illumination,

occlusion of objects, and background noise. Illumination variation refers to a scenario where

a significant change in brightness occurs in an image stream which may affect the color and

appearance of objects in a scene over time. Such disturbances may result from changes in the

environment or changes to the camera’s internal properties such as a change in exposure time.

Occlusion of objects refers to a case where the tracked object is at least partially obscured

by some other feature of the image such as if a helicopter were to fly behind a chair in the

middle of an image. Background noise refers to features or changes in the environment (not

the tracked object) that lead to difficulties in tracking the object. Examples include extra

objects or colors similar to the tracked target, or excessive background motion. Low light

conditions can also cause excessive noise in images [57].

There is no general solution to object recognition in computer vision; instead, there exists

a huge collection of algorithms that work well for certain domains and restrictions and poorly

for others. What follows is a brief survey of some popular methods for object recognition and

tracking using two-dimensional RGB-encoded-color digital images.

10

Perhaps the simplest technique for recognition and tracking is to use the color information

encoded into images directly by way of thresholding [28]. This is a binary operation that asks

if a given pixel is above a give value - if it is, that pixel is updated with a designated value

indicating a positive. Otherwise it is given a designated value indicating a negative. A popular

convention in computer vision is to set a pixel to the greatest meaningful value for the image

type to represent “true” and to set a pixel to the minimum meaningful value for “false”.

thresh(pixel) =

maxV al pixel > thresh

minV al pixel ≤ thresh

Two threshold operations can be combined to determine if a pixel’s value is in a given

range. One operator tests the lower bound of the range and the second tests the upper value.

The results of both operations are combined with a ‘logical and’ operation to produce the

final result. Range thresholding is typically not implemented in this manner because of speed

concerns, but this abstraction is useful for mathematically understanding range thresholds.

The result of a threshold operation is a binary image, a black and white image where

each pixel has been classified as within tolerances or not. Binary images greatly reduce the

complexity of an image, can be achieved through many methods, and as such are typically

used as the inputs for all sorts of vision algorithms, including contour detection. They can

also be used together with a priori knowledge of the object that is being tracked to detect

that object. This kind of registration requires a model of the object being tracked, especially

if that model’s appearance in an image is a function of its pose. This could be accomplished

by looking for a certain shape or silhouette, for example. The simplest case is tracking a

brightly colored ball in a scene composed of starkly different color. A range based threshold

can be applied to the scene with a range corresponding to the approximate color of the ball

plus or minus some tolerance. The resulting binary image could be searched for a circular

white patch that would represent the 2D projection of the sphere onto the image [15].

A constraint of this method is that the object will need to be brightly colored and in

sufficient lighting for that color to be detected. This method is particularly sensitive to

noise in the forms of lighting changes, shadows, and background color. There are some

pre-processing stages that can help alleviate this issue, including converting RBG images

11

Figure 2.1: Binary Thresholding Operation

to the Hue-Saturation-Value color space where the color of a pixel is represented by single

value on the color wheel. This value, the hue in HSV, is fairly invariant to lighting changes.

Color equalization can also help produce stark contrast in the image and make colors more

detectable.

An alternate means of color tracking uses histograms to perform color based object detec-

tion. Histogram tracking involves computing the color histogram of the target object(s) from

a sample image. This reference histogram can cover one or multiple channels. The bins of the

histogram can then be used as a probability look-up table for future images. Each pixel of a

new image can be classified into the bin it belongs to and the relative height of that bin can be

used as a probability estimation. This resulting probability map can be used in combination

with a variety of cluster finding algorithms to match the center of the target object. Such al-

gorithms include MEANSHIFT and CAMSHIFT [33]. An advantage over pure blob tracking

is that this histogram can represent non-rigid objects that have more than one predominant

color. This method is slightly more computationally expensive than basic thresholding but

is more robust and allows for the estimation of confidences. Intel has successfully used this

method to track human faces on consumer web cams in real time since 1998 [29].

Another popular method for object detection and tracking is background subtraction.

This method takes as input a reference image and a second image to search, and it produces

a gray-scale image where pixel values indicate magnitude of that pixels change (since the

reference). Background subtraction works by analyzing the difference between a reference

image of a scene before any objects are added and a second image taken after the object has

12

(a) Histogram of Skin Color (b) Probability Map From Histogram

Figure 2.2: A Histogram Tracking Operation to Detect a Human Face [40]

been added. The algorithm assumes that the parts of the images that change from the first

to the second image are regions of interest that might contained the object to be tracked. In

effect a third image, gray-scaled and the same size as the first two is created where each pixel’s

value is equal to the difference between the corresponding pixel on the first and second image.

The resulting gray-scale image could then be further processed, such as by thresholding with

some small value that would reduce noise without disrupting object detection [57]. This

method works best when the background is very static and of sufficiently different color from

the tracked object for there to be a meaningful difference. Movement in the background can

obscure object detection or give false positives. To counteract this, some ‘adaptive’ methods

apply a low-pass filter to the input images and produce a background representation that

morphs over time allowing for more robust detection [59].

Another general approach to extracting significant information from images to look not

at the color values directly but at the change in those values over an entire image. Gradient

detection algorithms can be used to provide a representation of the outline or shape of objects

in the scene. There are many approaches to detecting edges in an image [25]. Some, such

as the Sobel operator, approximate the magnitude of the first derivative of an image channel

and the results are then thresholded to retain only the edge pixels. Other techniques involve

13

Figure 2.3: Background Subtraction Before and After

some local averaging, approximation of noise, and higher level derivatives. Examples of these

operations are the Canny edge detector and the Laplacian of Gaussian method. They are

generally more accurate than the Sobel operator in that they produce thinner lines, are

more robust to noise, and avoid detecting lines multiple times but are more computationally

expensive [31].

Here a binary image is made where positive values are likely edges. The resulting contours

can be compared to a database of known shapes using shape contexts and shape context

distance [27]. The best matching reference shape (that with the lowest distance) can be taken

as the classification of the sampled shape. These algorithms take advantage of the observation

that edges in images are largely unaffected by changes in light and color.

An important and more general tracking algorithm, which employs techniques from many

14

(a) Source Image (b) Output of Canny Edge Detector

Figure 2.4: Example of Canny Edge Detector

different image processing techniques, is feature tracking. [53] defines a local feature to be

an image pattern that differs from its immediate neighborhood as a result of a change in one

or more properties of the image. Commonly considered image properties are intensity, color,

and texture, but a feature can also be a corner, edge, or patch of pixels. Once identified as

a candidate, a feature is analyzed by an algorithm that converts the feature to a descriptor

which can be used for matching later. These features should ideally be detectable even after

changes in image scale, noise, illumination, and even some rotation. Feature tracking is

commonly used by computer vision algorithms to stitch several overlapping photos of a scene

into a single, larger photo.

Two images of the same scene or object can be correlated with this technique even if

the images were taken from slightly different perspectives. This is accomplished by running

the same feature detecting algorithm on each image independently. The resulting feature

descriptors of each image can be compared with one another and pairs of descriptors that

refer to the same feature identified. By analyzing the change in position of the same feature

sets between images, it is possible to estimate change in camera orientation relative to the

object between the images. If the feature detecting algorithm finds the same features in two

images despite a change in the object’s pose or image properties such as brightness, it is said

to be invariant to to those changes. State of the art algorithms like Robert Lowe’s SIFT

[42] and Tuytelaars’ SURF [26] algorithms are examples of such detectors that can match

descriptors of an object with up to 60 degrees of rotation out of plane. These algorithms are

fairly complex to implement and can be computationally expensive. As a side note, many

feature tracking methods including the two mentioned here are patent protected and not

15

available by default in most open source computer-vision algorithm distributions.

The above algorithms are not mutually exclusive and for some applications robustness

may be improved by applying a combination of methods. Zhou and Aggarwall found that

application of both color histogram tracking and adaptive background subtraction gave them

the best chance to detect and classify people, cars, motorcycles, and groups of people moving

through space [58]. It is also common for tracking information to be incorporated into some

statistical algorithm to make more reasoned classifications and improve robustness. Zhou and

Aggarwall, to classify objects and resolve problems associated with occlusion, fused frame by

frame tracking information with an extended Kalman Filter. Other researchers at ETH Zurich

have incorporated histogram based tracking with a particle filter to reliably track non-rigid

colored objects [47].

Figure 2.5: An example of a reference image, left, being matched to a complex scene usingfeatures

2.4 Pose Estimation

In some cases, it may be possible to extract 2D to 3D pose information of an object in an

image in addition to that object’s presence. The literature concerned with pose estimation

is very large and a full survey is beyond the scope of this thesis, but an excellent overview

may be found in [25]. Frequently, the workings of both object tracking and pose estimation

16

systems will be tightly coupled. Real time tracking often involves some form of ‘model’ of the

object being tracked (e.g. a specific set of colors, edges, features). If there is an understanding

of how that model varies with pose, it can be used to attempt to classify the pose of the same

object in some given image.

The nature of the model is one way to divide the field of algorithms that estimate pose

[41]. Some algorithms store 2D representations of an object and look for the presence and

orientation of a specific face of a 3D object. An example might be a reference image with

corresponding feature points pre-calculated that would then be compared to an input to see

if a match occurred, and if so, its partial orientation can be obtained. An extension of this

method is to create a composite model from a set of 2D models representing different views

of the same 3D object. By correlating the input image with the best of the different views, a

better estimate of pose can be obtained, however this requires extensive training with images

from many views. [32] represents the state of the art in this field where an object model is

learned from the descriptors gleaned from a set of images and can be used to classify groups

of objects for the purposes of robot grasping.

A more general approach is to use a 3D computer model, such as one that could be

developed in a CAD program. This model can be projected onto an arbitrary 2D surface

by the computer and correspondences can then be drawn between that projection and the

reference image. [41] uses such a method to precisely predict the pose of automobiles using

complex reference models in a way that is largely invariant to lighting. Despite being one

of the fastest methods available, a precise fit still took on about one minute to compute. A

great variety of other methods have been developed under the topic of 2D-3D pose estimation

using models of different types.

A subset of these algorithms drops the requirement to work with a general model and

instead elects to use specially designed ‘landmarks’ which have known geometry and are easily

segmented from a reference image. What these algorithms lose in generality, they make up for

in speed: recovering 6D position information (x,y,z, roll, pitch, yaw) in real-time. Appearing

as early as 1995, the POSIT algorithm [35] has been used to track a precisely modeled object

with at least four identifiable key points with known geometry relative to the model in all six

dimensions in real time. Relevant examples of systems with this philosophy been used by [46]

and [30]. [46] developed a system for real-time position and attitude control of space robots

17

using a geometric landmark detected with color vision at 60 Hz. The landmark consisted of

three color markers positioned at (1, 0), (-1, 0), and (1, 0) meters in a plane. Each frame

produced by the controlled robot is color segmented to identify in markers present in the

screen. By examining the order and relative position of the markers in the segmented image,

6D position relative to the marker can be calculated. Particularly interesting to this thesis

was the fact that this was accomplished with an uncalibrated camera: The angle calculations

were accomplished by examining relative lengths. [30] implemented a similar scheme to control

MAVs using a four marker ground-based landmark. The markers themselves were attached

to ground robots and came in a variety of types, including IR, active color LEDs, and passive

colored markers. This system also uses visual segmentation to detect markers in the frame

and then solves the Perspective-Three-Point problem (P3P) using a closed-form solution also

developed at ETH.

2.5 Control Systems

A state measurement or pose estimation component is only one piece of a functioning control

loop. Also required is some logic that will take the output of the pose estimation and use

that information to bring the pose to the desired value. The parameter being controlled, the

pose of a helicopter, is referred to as the process variable. The value of the process variable

is the setpoint of the controller. The controller itself is technically called a feedback controller

because it adjusts its output based on information it receives about the current state of the

system. Error is the difference between the process variable and the setpoint at a given

moment in time.

A classical approach to implementing this feedback logic for systems that are, or can be

modeled as, linear systems is a proportional-integral-derivate (or PID) controller. This type

of controller sets the input to the controlled system based on a measurement of the current

error (proportional), the accumulated error in the past (integral), and the predicted error

in the future (derivative). This has historically been the method by which feedback control

was developed for systems without dynamic models [48]. Other popular forms of control

use a technique called fuzzy control which more closely resembles the decision making and

inference process that people follow when controlling systems manually [37]. Information

18

about control systems, and PIDs in particular, can be found in many references, including

Modern Control Engineering [48]. For relatively simple systems and simple tasks, classic

control techniques can be tuned to work well. For some very complex systems and tasks,

it is extremely difficult if not impossible to develop a classical control system for the job.

For this class of problems, the techniques of the field of machine learning have shown some

success. For example, researchers at Stanford have employed machine learning to teach a

model helicopter to do incredible acrobatic maneuvers such as inverted takeoffs and flips in

place[20].

2.6 Concluding Thoughts

The following section presents an investigation into existing options and methods for de-

veloping a flying, micro robot. Topics discussed include system architectures and sensing

methodologies. Also in this chapter is a brief survey of popular computer vision techniques

for object recognition and pose estimation. Finally, it briefly touches on control methodolo-

gies. The next chapter lays out the logic behind design decisions made for this thesis in light

of the research presented in this chapter.

19

Chapter 3

Methodology

3.1 Requirements and Specifications

In the introduction of this thesis, the goal of this thesis is defined as the creation of a flying

robot system that:

1. is inexpensive,

2. is programmable,

3. is easy for a single person to use,

4. is hackable,

5. provides closed loop control, and

6. works in a class room environment.

Stated more qualitatively, the goal of this project is the creation of a system that enables

the programmatic control of micro-helicopters. The envisioned target audience for this work

is young students, ages 13+, learning to program in organized classrooms. Educators and

students are often pressed for time and resources, so low costs and ease of use are important

qualities. Cost considerations should consider not just the initial cost of the flying platform,

the controller, and other supporting items, but also long terms cost factors like maintenance

and durability. Ease of use refers to the ability of individual students to get started on a

helicopter control project with a minimum of help.

20

A secondary audience for this system is more advanced programmers, students, enthu-

siasts, and hackers that are curious about computer vision, control systems, and generally

extending the capabilities of machines around them. To this demographic, the quality of soft-

ware architecture, availability of hardware, and documentation are particularly important.

If the software is designed in such a way that components are self-contained and the entire

system is assembled from a well documented set of those components, then it becomes far

easier for a curious individual to affect the changes that he/she is interested in implementing.

This is what is meant by ‘hackable’.

The classroom setting imposes several more restrictions. Components of the system need

to be fairly robust, they must be replaceable, and they must be safe. Furthermore, the control

system should ideally work with multiple systems running simultaneously in a single room

without interference.

A working control system would be defined as the combination of software running on

a user’s computer, a helicopter, and any extra hardware that enables positive control of the

helicopter platform for an easily observed amount of time (many seconds). This system should

ideally be able to move the helicopter with stability between different 3D points and withstand

minor disturbances such as small air currents or momentary loss of vision.

Target conditions for operation, defined as those under which the system performs best,

include a large indoor room with bright, even lighting and a consistent color palette. A single

helicopter and its communication should be able to be effectively constrained to a five to ten

foot bubble to allow for the simultaneous operation of several helicopters in the same room.

Above all, this thesis and associated tools should provide an opportunity to learn about

computer vision, control systems, programming, and robotics.

3.2 Hardware Platform

The Chinese brand Syma is a popular manufacturer of a range of toy helicopters that are

inexpensive (with the cheapest models starting around $17 USD), rugged, and widely available

including replacement parts [24]. Besides the brand’s low cost and durability, the Syma S107

helicopter, in particular, has a few distinct advantages.

Each helicopter is controlled via an infrared light (IR) interface which, as discussed in

21

the previous section, is inexpensive, easy to use, and because of its limited (and sometimes

narrow) range, appealing for use in a possibly crowded room. As an added advantage, the

same hardware and techniques used to issue commands to the helicopter may offer an excel-

lent opportunity to discuss digital communication protocols and create remote controls for a

variety of household items such as air conditioners, cameras, televisions, and more.

Perhaps the greatest the advantage of the Syma over similar offerings is the large user

community that the Syma enjoys. The helicopters are, at least comparatively, ubiquitous and

many hobbyists have already investigated control schemes. As part of this community, many

hackers have worked on various means to control and modify these helicopters. This is an

invaluable resource for working with and debugging issues with the helicopters.

There are two primary disadvantages to Syma helicopters as platforms for educational

robotics exercises. First, they are light and have very little lift capacity (a matter of grams)

and secondly, they have no direct way to control roll. Unlike full-scale helicopters which are

equipped with swash plates that can tilt the main rotors, the Syma helicopter is stabilized by

a bar attached to the main rotor shaft and a gyroscope mounted to the controlling electronics.

This presents a distinct challenge to effectively controlling these helicopters in tight spaces.

Still, the Syma’s widespread availability and low cost make it the platform of choice for this

thesis. Note that the techniques used in this thesis should be adaptable to any other micro

air vehicle with appropriate adaptations.

One half of the equation necessary to create a working control system is feedback. Given

that the fundamental requirements set forth for this educational system include ‘low-cost’

and ‘simple’, and that Syma brand helicopters and their ilk come with no available on-board

telemetry, it falls entirely to some external device to close the loop. This thesis thus adopts

the external sensing model discussed in the literature review.

It is a requirement of this thesis that some form of personal computer be used to program

the helicopters, and given this requirement, the web cameras that come built into most laptops

are prime candidates for closing the sensory loop. A camera image of a helicopter contains, in

theory, all that is necessary to control that helicopter: position, rotation, distance. Extensive

examples and software APIs already exist for collecting and processing computer images.

These algorithms are also an excellent source of additional educational opportunities.

The disadvantage of a cameras, especially USB web cameras built into laptops, is that their

22

specifications are difficult to control. Laptops feature widely ranging models of cameras with

widely ranging properties. Some of these properties are linked with the physical characteristics

of the camera such as field of view and focal length. Others relate to the hardware and

controlling firmware such as, but not limited to, resolution, exposure timing, white balance,

saturation, and brightness. Some cameras conform to a uniform standard which is a subset

of USB protocol [12] but equally many do not. Furthermore, programmatically controlling

these properties is difficult because the interface to do so is operating system dependent and

the physical meaning of values given to each of these parameters itself varies from camera

to camera. A brightness of 50 might mean something different on camera A versus camera

B. This puts a burden on the user to control their own camera, if possible, and otherwise

dictates that any computer vision algorithm used should be fairly robust to variations in

camera properties.

Other options for sensing include adding on-board telemetry, or using a more complicated

vision setup such as a motion tracking system or a depth camera such as a Microsoft Kinect

or Asus Xtion. The difficulty with on board sensing is that the helicopters themselves have

a very small payload, and there is little room or capacity for additional hardware. Replacing

the existing hardware would be a large project unto itself. Furthermore, adding components

adds expense and there’s little chance that students or educators will have the tools and know-

how to perform the modifications themselves. A similar constraint prevents the use of more

complicated vision systems; the cost and expertise required to run them will erect a barrier

to entrance that few will overcome. Thus, this thesis adopts the simple two dimensional web

camera as its sensing instrument.

The final piece of the puzzle is the connecting element between the sensing and logic

system, and the helicopters themselves. For this purpose an Arduino Uno, a single IR LED,

and a USB cable were used. The Arduino Uno is a hobbyist micro-controller platform built

using an 8 bit Atmel AVR 328p processor running at 16 MHz [3]. It is shipped with a library

of software that effectively abstracts much of the low level knowledge about programming

and hardware that is required to program micro-controllers. Much like the Syma helicopters,

the Arduino platform’s real advantage is the extensive user base and the wealth of knowledge

that accompanies it. Arduinos represent the single most popular unified micro-controller

solution and such both the hardware and supporting software libraries are widely available.

23

Any Arduinos purchased for this project in an educational setting could be re-purposed for

any multitude of other projects.

Figure 3.1: Arduino Uno R3 [3]

That said, the Arduino Uno retails for $30 USD, has limited processing power, and does not

include a necessary USB cable (an expensive additional purchase) or a LED (an inexpensive

additional purchase), making the Arduino Uno quite expensive for its components. Still, there

exists many alternative options but few with the level of polish and consistency that Arduinos

feature. For example, Arduinos come ready to plug-and-play. They have pre-soldered female

headers that allow resistors and LEDs to be plugged directly in. Other platforms require some

amount of soldering. The Arduino libraries work on a wide range of AVR platforms, however,

meaning that anything developed for the actual Arduino should be straightforward to port

to alternate platforms. Possible alternatives including offerings from Digispark [6], retailing

for under $10, and the Teensy, a more powerful processor and platform for less than $20 USD

[21]. While these options are appealing, they both require some soldering and have some

incompatibilities with existing Arduino libraries. The development of an improved signal

transmission device is left as future work.

3.3 Software Platform

For the software interface to camera hardware, this project uses the open-source and cross-

platform computer vision library OpenCV [14]. Started in 1999 by the Intel Corporation,

OpenCV is one of the oldest and most mature libraries in computer vision and is usually the

24

first choice for academic research. Interfacing with hardware attached to a computer requires

inter-operation with that computers operating system, and each OS is different. The ad-

vantage of OpenCV, aside from its wealth of optimized algorithms for all common computer

vision operations, is that it handles hardware interfacing for the programmer. While adding

dependencies to a project should always be done with caution, replicating the functionality

available in OpenCV would be a huge project in and of itself. OpenCVs native computer lan-

guage(s) is C/C++ and this thesis follows suite. The C++ language allows for programming

with high level concepts while maintaining minimal computational overhead during runtime.

For a real-time video processing system such as this thesis, this is a necessary trait. Note that

OpenCV bindings are available for most popular languages.

Any software developed as a part of this thesis will itself be free and open-source. For users

who are familiar with the process of building C/C++ programs using traditional command

line techniques, makefiles are available. For those who desire a more hassle free installation,

this thesis’ software is designed to work with the KISS Institute for Practical Robotics’ KISS

Platform Editor [10]. This integrated development environment ships with cross platform

software libraries to access OpenCV and serial communication alleviating the need for a user

to build/install these him/herself.

3.4 Algorithms and Implementation

This section discusses the chosen system architecture, tracking methodology, and control

scheme. It also discusses more logistical concerns such as the Syma helicopter signal trans-

mission and the layer between the control application and transmission. Please note that the

term “transmission”, in this thesis, is used to indicate the process by which a digital signal is

broadcast to a receiving helicopter using infrared light pulses. It does not refer to mechanical

linkage between the motors and rotors on-board the helicopter which is a black box with

respect to this work.

3.4.1 Fiducial Markers

The following section discusses the means by which a helicopter’s pose information is extracted

from a camera image of that helicopter. This thesis’ literature review, section 2.4, discusses a

25

class of 2D-3D pose estimation algorithms designed to run in real-time that used landmarks,

or purpose designed markers that are simple to detect, to measure a cameras position against.

This thesis inverts this system by fixing the position of the camera and allowing the marker

to move. The following section outline s the specifications for this marker.

An effective marker should be easy to detect under a variety of conditions and provide

unambiguous information about the helicopter’s pose. It should also exert minimal influence

on the dynamics of the helicopter during flight. The most natural marker for the helicopter,

exerting no influence on dynamics and requiring no additional materials, is the helicopter

itself. That is an image of the helicopter itself should encode enough information about the

helicopter to extract its pose without any additional hardware. There are significant issues,

however, that make this difficult to implement.

One such difficulty is ambiguity in orientation measurements. Given a binary image where

positive pixel values indicate the presence of the helicopter, the position of the helicopter can

be taken as the mean position of the positive values. The height of the helicopter, or the

difference between the highest and lowest positive pixel position, provides an estimation for

the helicopter’s distance from the camera. Rotation of the helicopter (with a zero rotation

defined as facing the camera) is correlated with the height to width ratio of the bounding

box of the helicopter. The bounding box is the defined as the smallest rectangle that entirely

encompasses the positive pixels in a binary image.

The height to width ratio is maximized when the helicopter is facing toward or directly

away from the camera. Conversely, it is minimized when the helicopter orientation is perpen-

dicular to the camera. If these minimum and maximum ratios are known, then a measured

ratio can be transformed to rotation through interpolation. The quadrant that the rotation

is in, however, is not clear. For example, a rotation of thirty degrees counterclockwise from

zero would have the same ratio as thirty degrees clockwise.

Affixing some form of marker to each side of the helicopter, or painting the left and right

sides different colors provides a way to help resolve this ambiguity. However, this technique has

some issues from a practical standpoint. First, it requires modification of the helicopter itself

which may make it more difficult to use. Second, the helicopter’s tail becomes very difficult

to detect leading to ambiguous situations. Is a small width to height ratio the product of a

helicopter facing the camera or a failure to detect the tail? Using a single visual feature per

26

side gives no way to have confidence in the values being measured. An easier solution would

be to affix some form of marker which, regardless of orientation, always has two easy to detect

features with a known relationship.

Modifying the helicopter with some visual landmark, referred to from here on as a fiducial

marker provides such a solution. Inspiration for the design of this marker can be drawn from

real-world navigation lights, or colored lights mounted to many boats and aircraft used to

indicate position and heading under low light conditions. Traditionally, the left side of a

craft displays a green light and the right displays a red light [8]. Active markers such as

these would require either batteries or modification of the helicopter’s existing wiring. A

reasonable constraint on the system is operation in good lighting, so a passive colored marker

is a reasonable solution.

Figure 3.2: Reference 3-Color Fiducial Marker Constructed From Styrofoam

The passive fiducial marker settled on is a circular ring (seen in Figure 3.2) is attached to,

or in place of, the landing gear of the Syma helicopter, Figure 3.3. The surface of this ring

is divided into three equally sized rectangular areas and each area is painted with a unique

color. The colors are selected such that they are approximately equidistant on a color wheel.

No matter the orientation of the ring, it keeps the same rectangular profile in a camera image.

Furthermore, the combination and ratio of colors allows for the identification of any rotation

of the helicopter without ambiguity provided the helicopter is not capable of rolling. The fact

27

this marker design has equal height across the width makes it more robust to noise than a

shape similar to a sphere, resulting in more accurate estimations of depth.

Figure 3.3: Fiducial Marker Attached to Syma Helicopter

Finally, operating on the assumption that the helicopter does not roll or pitch significantly,

it can be assumed that the two dimensional projection of each color element on the ring is

itself a rectangle of equal height. Any point inside one of these rectangles should contain the

same color hue. These properties confer computational benefits: For one, only a single method

is necessary to determine the position and rotation of a helicopter in an image (as opposed

to multiple methods depending on relative orientation to the image plane). Additionally,

each color can be represented by a rectangle on the image plane as opposed to a potentially

complicated or noisy contour of individual points.

If one is aware of a point p inside of rectangular color element r, then the dimensions of

r can be found simply by looking for a color gradient above, below, left and right of p. Such

a gradient can be found by traversing in one of “cardinal” directions in groups of n pixels

until a significant shift in color hue is detected. The method involves traversing at most

width+height pixels. This thesis will refer to this method By-Estimate blob expansion. The

alternative is thresholding an entire image which involves performing an operation on each of

the width ∗height pixels in the image, an approximate difference of two orders of magnitude.

28

This method will be referred to as the By-Threshold method.

The disadvantage of using a fiducial marker like the color ring is that it requires a possibly

bulky attachment to the bottom of a micro-helicopter that has little lift capacity, significantly

damping the maximum speed of the helicopter. Furthermore, backgrounds with complicated

colors/patterns and varying lighting can induce many tracking failures including false-positives

and a failure to register the color squares. Thus, to be effective, the helicopters must be flown

in a fairly controlled, indoor environment with the understanding that the system will be quite

damped. Still this method represents an effective alternative to the most general methodolo-

gies which involve marking and finding approximately six key-points on the helicopter in each

frame which suffers from all the same perception problems but requires much more precision.

3.4.2 Vision-Based Helicopter Tracking

This section briefly discusses the core logic of tracking software in high-level terms. The

algorithm is designed to track a single Syma helicopter outfitted with a specially configured

fiducial marker between subsequent images from a stationary camera.

The fiducial marker should be a ring affixed to the bottom of the helicopter with an outer

surface area that consists of three solid colors, easily distinguished from one-another. One

color should be chosen as the front color and the nose of the helicopter should be aligned with

the center of that color’s span along the ring. This segment is designated color zero. Colors

one and two are the other segments of the ring enumerated counterclockwise around the ring.

There are two inputs to the tracking software. The first of which is the configuration

discussed above. This consists of the names of all three colors, a hue value representing that

color on the color-wheel, and a tolerance range about that center hue value. This tolerance

should be as small possible while still maintaining good search results.

The second input to the system is a stream of raw images of a fixed size encoded in an

eight-bit hue, saturation, value format. In order to minimize its impact on other logic a

user might want to use, the tracking software explicitly does not take control of an image

stream. Instead, it is necessary for the end-user to update the tracking engine with each

frame, including performing any desired preprocessing and color space conversions.

The output of the system is a data container representing the believed position (in X,Y

pixel coordinates) of the center of the fiducial marker and the believed rotation of the heli-

29

copter (in radians, from 0 to 2π with a zero rotation corresponding to the helicopter directly

facing the camera). It is possible for the algorithm to fail to locate the helicopter in which case

a point with an invalid position of (-1, -1) and rotation of 0 will be returned. The tracking

engine keeps a minimal amount of state information to better predict helicopter position, but

it is up to the user to keep his/her own information for control purposes.

The general algorithm for helicopter tracking is given in algorithm 1.

Algorithm 1 General Fiducial Marker Search Approach

Require: HSV encoded digital image, marker color configuration1: frame← getCameraFrame2: estimateBlobs← byEstimateExpansion(frame, colors, history)3: possiblePoses← findPossiblePoses(estimateBlobs)4: if possiblePoses is empty then5: estimateBlobs← byThresholdExpansion(frame, colors)6: possiblePoses← findPossiblePoses(estimateBlobs)7: end if8: bestPoint← pickBestPoint(possiblePoses, history) return bestPoint

The by-estimate method of blob expansion refers to the technique of looking for color

gradients surrounding a point suspected of being inside a color blob. The more general by-

threshold method uses binary thresholding and the color configuration to identify blobs.

Internally, the tracking system keeps a history of the helicopter positions and if the history

becomes too large, the oldest point is discarded for each new one. The first step of the

algorithm is to search for the helicopter where it was in the last frame. This is accomplished

looking for a new color blob at the center of each old one with the same color. This method

also examines the edges of the marker composed of color blobs to ensure that emerging

colors as a result of helicopter rotation are found. If this procedure fails to identify a valid

combination of color blobs, then the image will be thresholded using the color configuration

and blob detection will be run on the resulting images.

3.4.3 Control System

The helicopter perception engine returns a believed position (or an invalid one if no candidate

could be found) for each frame it is fed. This calculation is assisted by the history of images

that has come before it, but only picks a value based on what it sees in the current image. This

is done in accordance with the philosophy that components should be as simple as possible

30

(but no simpler), and it also means that the stream of information can be momentarily subject

to noise and errors. It falls onto a separate module to parse the stream of information into a

more reasoned belief about the helicopter position.

With respect to the reference implementation presented here, this module is referred to

as the “controller” application. This application takes as input the stream of information

from the tracking engine and outputs a signal suitable for bringing the helicopter to a target

position. This process works in three basic stages:

1. Each new point is used in combination with the previous belief and a general knowledge

of physics to generate a new belief of where the helicopter is. This is accomplished with

an exponential moving average on the input data stream. estimate = (ratio∗sample)+

(1−ratio)∗old estimate where ratio is the percentage of the old estimate to replace with

the new sample. An effective ratio can be chosen by analyzing the time characteristics

of the system. Certain variables, like rotation, can produce erroneous results with this

method and are handled separately. This is discussed in detail in section 5.2, the state

estimation portion of the controls chapter.

2. Armed with a new estimation of the helicopter position, the controller passes this in-

formation onto individual sub-modules responsible for calculating the output for each

of the helicopter’s control channels: throttle, pitch, and yaw. This affords each sub

module an opportunity to keep some form of internal state such as an integration of the

error in an efficient manner.

3. When called upon, each sub-module is responsible for calculating a signal value that will

bring its parameter under control. The exact method by which this is achieved is left

to each sub module to implement. The reference implementation is a PID control loop

with gain values determined by testing. By separating the action of updating the state

of the system from the actual calculation of the signal in the program interface, more

expensive algorithms can be broken between the two actions. Furthermore, the nature

of the system is that the rate at which information is received is different than the rate

at which signals can be issued. Most web cameras operate at a frame rate between 15

and 30 frames per second while the Syma helicopter only operates at approximately 5

Hz. Separation of these tasks saves some calculations.

31

Further details for the reference control system are available in the control section of the

report. For details on how to implement your own control system or sub-system, see the

Developer Manual in Appendix B.

3.4.4 Helicopter Signaling

The final major component of the system is the hardware/software communication system

that bridges the computer vision and control algorithms with the helicopter itself. This system

consists of a micro-controller and LED (Arduino), a USB cable, a software library facilitating

the communication between the micro-controller and host computer, and a program for the

embedded system to translate signals to IR pulses.

The Syma S107 works on an IR protocol that has been the subject of previous investigation

by enthusiasts on-line [54]. From this existing documentation and direct investigation into

the protocol with an IR receiver, the properties of the broadcast communication were found

to be:

1. transmission medium: 940 nm IR modulated at 38Khz

2. a packet header is represented by 2 ms high, 2 ms low

3. a binary one is represented by 300 µseconds high, 700 µ seconds low

4. a binary zero is represented by 300 µseconds high, 300 µ seconds low

5. each signal is 32 bits long in and sent in big-endian order

6. signal packets are sent every 120 ms

Figure 3.4: Syma S107 Signal Indicating Full Throttle and Neutral Yaw, Pitch, Trim

The structure of each packet is shown in Figure 3.4. It is, in order:

1. header

32

2. yaw byte (representing decimal number from 0 - 127). Neutral at 64, and Full Right at

0.

3. pitch byte (0 - 127), Neutral at 64, and Full Forward at 0.

4. Throttle byte:

(a) If first bit is 1, the transmission is on band A

(b) If first bit is 0, the transmission is on band B

(c) 0 - 127 for band A, 128-255 for band B

5. trim byte (0 - 127)

The Syma protocol outlined above is used across several of the company’s devices, includ-

ing some platforms more complex than the Syma S107. As a simple model, the Syma S107

has only three channel inputs, and ignores the last (trim) byte. Trim is instead implemented

by adjusting the baseline value of the yaw parameter. Thus, for the purposes of this system,

a valid helicopter signal consists of three numbers ranging from 0 - 127, Y aw, P itch, Throttle

each padded by a single zero with a header and 8 bits of tail end padding.

In order to simplify the process of communicating with the micro-controller, the reference

communication system disallows the secondary transmission band. This frees the values 128-

255 for use in verifying signals. Thus, the communication protocol between host computer

and Arduino works as follows:

1. open serial connection between host and device at 9600 Bps, 1 stop bit, no parity

2. transmit yaw, pitch, and throttle in that order as bytes (ranging from 0 to 127)

3. transmit the checksum header signal, a byte with decimal value 128 (hex 80)

4. transmit the checksum byte, computed using algorithm found in listing 1.

The reference checksum algorithm is implemented with the following C code:

Listing 3.1: Reference Checksum Algorithm

uint8_t syma_compute_checksum(const uint8_t *signal)

{

33

uint16_t checksum = 0;

uint16_t i;

for (i = 0; i < SYMA_SIGNAL_LENGTH; i++) {

checksum += signal[i] << (SYMA_SIGNAL_LENGTH-1-i);

}

return checksum % 255;

}

The constant term SYMA SIGNAL LENGTH is defined elsewhere in the reference source

code to be the length, in bytes, of the signal to be sent to the Arduino for transmission. In

the reference implementation, this value is set to three (representing yaw, pitch, and throttle

respectively).

The Arduino’s role is to:

1. accept connection a serial connection at a prescribed rate

2. initialize the output signal to neutral yaw, neutral pitch, and no throttle.

3. maintain two buffers: one which contains the current signal, and the other which collects

bytes received over serial in a first-in-first-out manner.

4. flash the signal by working through the bytes in big endian order.

5. between signals (sent with a period of 120 ms), for every byte received:

(a) if the ‘expect checksum’ flag is set, compute the checksum of the current buffer and

compare against the incoming byte. If they match, copy the buffer to the current

signal buffer. Either way, disable the ‘expect checksum’ flag.

(b) if the ‘expect checksum’ byte is received (0x80), then set the corresponding flag.

(c) otherwise, push the new byte onto the buffer, shifting the other bytes down, and

discarding the oldest one.

This architecture allows signals to be streamed to the controller faster than the actual

control rate: Only the latest signal will be used when it’s time for a new transmission.

34

3.5 Summary

This chapter reviewed the reasoning behind the selection of the various pieces of hardware

and the major approaches to sensing and control. System architecture and software design

decisions were also briefly discussed. The next two chapters provide detailed reports on

the reference implementation of the visual tracking system, responsible for identifying fidu-

cial markers and reconstructing their pose, and the feedback control system, responsible for

bringing the helicopters state under control using information from the tracking system.

35

Chapter 4

The Tracking Component

This chapter presents a more thorough investigation of the vision component of the Syma

helicopter control system. From a system architecture standpoint, each component (e.g.,

tracking, signal calculation, or signal output) functions independently of the other systems.

It is the user application that ties different components together to provide control for a

given hardware platform. This thesis provides “reference” implementations for each of the

components, but interested individuals may provide their own implementations to change the

system behavior or extend control to new hardware. This model affords developers the ability

to treat their components as black boxes and implement whatever logic they see fit. The only

constraint is that each component must offer a set of specific services that define its interface.

The computer vision and tracking component has the simplest interface. It requires a

single track method which takes, for input, digital camera images with HSV color encoding

and information about the composition of the target fiducial marker. The structure of the

marker is described in section 3.4.1 and is provided to the system by the user at initialization.

If the system is able to identify the likely presence of the fiducial marker, then it outputs a

data structure which contains the following information:

1. the center of the helicopter marker in (x,y) pixel coordinates,

2. the width and height of the bounding rectangle of the marker in pixels,

3. the rotation of the marker from the center of the front color in radians (0 to 2π), and

36

4. the blob composition of the marker (colors and positions).

In the event of a failure to identify the marker, the tracker will return a data structure with

a position value of (-1, -1) and undefined values for the other fields. One by-product of defining

the interface in this manner is that the vision component does not take ownership of the video

stream, thereby allowing more flexibility for the user to architect a controller. A second by-

product is that the tracking system must be capable of operating without prior knowledge of

the helicopter position. This is a necessary property of the tracking component to be able to

recover from the effects of occlusion, a condition where the helicopter is temporarily obscured

in the camera image. On the other hand, the tracking component is free to keep some form

of internal state to improve the robustness of its operation (though it must be able to work

without it).

4.1 Reference Implementation

In addition to the explicit requirements set forth by the vision components interface, there

are some practical considerations that must also be addressed. First, the vision tracking

algorithms should run in real time with enough margin to allow users to also run their own

logic. For a camera running at thirty frames per second, this means that the vision algorithm

needs to complete before the next image is available in 33 milliseconds. The smaller the

required processing time, the lower the latency of the entire system, and the more responsive

the ultimate control will be.

Second, the Syma helicopter is a dynamic vehicle making positive control very difficult

without reliable tracking. The algorithm should be fairly robust against background color

noise. Additionally, the helicopter itself accepts new control commands approximately every

120 to 200 milliseconds, or every four to six frames of a thirty-frames-per-second web camera.

If it does not receive a new command after approximately a second, the helicopter responds

by disabling throttle and falling out of the sky. To have a chance of recovery from such a

situation, the tracking algorithm needs to quickly identify the helicopter in the next frames,

even as the helicopter moves a large amount between images and becomes blurred due to its

motion.

A technical challenge to robust tracking across many platforms results from the fact that

37

many cameras have hardware or software controls, typically beyond the control of this system,

that automatically adjust properties like white-balance and auto-exposure. For example, if a

light turns off in the room where the camera is running, the camera may begin to increase

exposure time resulting in a frame-rate that is a fraction of the ideal rate and colors that are

significantly muddled. As much as possible, the system should be invariant to these changes.

As implemented, the tracking algorithm assumes that all input images are HSV encoded

and of a constant size. If these assumptions are met, the algorithm 1 on page 30 gives a

general overview of how the vision tracking system operates.

Before the tracking component can be used, however, it must be initialized. The initial-

ization takes as an argument: the marker configuration, telling the tracking software what

hue values constitute each color on the marker, and where that color is relative to the front

of the ring. With the core tracking software, this thesis also provides a tool for generating

these marker configurations. See Appendix A for more information.

Once configured, the user can begin feeding the algorithm images through the “track”

interface. The first step of the tracking algorithm, once given an RGB image, is to convert

that image’s color space to HSV. This step makes the following logic easier to express as

the color of an object is now a function of a single numerical value instead of three. More

importantly, the hue channel is largely invariant to shifts in illumination that result from

changes in the environment or properties of the camera.

To optimize tracking performance in terms of both speed and accuracy, the vision compo-

nent keeps a minimal amount of state information pertaining to the history of the marker. If

the delay between images is sufficiently small relative to the helicopter’s motion, the tracker

can assume “spatial locality”, the condition where the helicopter marker is in approximately

the same position from one frame to the next. When the helicopter does move, it can also

be assumed that there exists locality in the marker velocity. The tracking engine remembers

recent positions and velocities, using them to compute an estimation of the location of the

marker before any searching is performed. An accurate estimate improves the chance of the

blob expansion algorithms working quickly.

There are two such blob expansion algorithms, the by-estimate method and the by-

threshold method. These methods, discussed in greater depth in the following sections, are

used together to provide tracking that is fast and robust. In general, the by-estimate method

38

attempts to identify a color blob by looking for color gradients above, below, left, and right of

some estimate. The by-threshold method conducts a blob search by binary thresholding the

entire image for each color on the marker and then detecting contiguous blobs. In practical

terms, by-estimate method constitutes a local search with low computational cost, compared

to by-threshold method which is global and expensive.

A problem encountered by the by-estimation method is that it is possible for a color point

in the current frame to no longer be inside the color in the next frame if the helicopter is

moving quickly. For both methods, complicated color environments increase the probability

of detecting false-positives or masking the presence of the helicopter marker. As much as

possible, these failure conditions need to be detected and appropriate actions taken.

To achieve fast yet robust tracking, the reference implementation uses a simple combina-

tion of the by-estimate and by-threshold methods. The by-estimate method is used whenever

the tracker can make a guess about the location of the marker. This guess is formed by

remembering a brief history of the position and motion of the helicopter, and using that

information to extrapolate forward in time. If no guess can be made or the guess fails to

yield a viable marker candidate, then the by-threshold method is used. Most of the time, the

marker will be quickly identified by the by-estimate method, but if this is the first frame, if the

helicopter has accelerated quickly, or if the helicopter was occluded in previous frames, then

the by-threshold method provides a safety net that allows tracking to recover. Whatever the

chosen algorithm, the results of color blob search is a collection of “blob” objects; containers

holding properties such as a bounding rectangle, center position, and color.

Given a set of blobs, the next step is to see what permutation of those blobs forms a

valid marker. This process, also discussed in detail in a following section, involves looking for

permutations of color blobs that are spatially configured such that they satisfy the constraints

of the marker configuration. Properties of two blobs such as relative position and bounding

rectangle dimension ratios are examples of metrics used to determine if those blobs might

be a marker. This process is first run using the results of the by-estimate blob search and if

the results indicate no marker candidates, the process will be run again using the output of

the by-threshold blob search. Each time a blob pair is accepted as a candidate, the physical

pose associated with the properties of that candidate are also calculated. The output of the

candidate search stage is a collection of the pose data structures enumerated at the beginning

39

of this section.

The last computational stage of the vision component takes the list of possible helicopter

poses and chooses the most likely one to be the result of the tracking operation. Each

candidate is given a score based on its proximity to the expected helicopter pose and the

highest score, representing the closest match, is said to be “most likely”. If there are no

candidates or the best score is beneath a chosen minimum value, this component of the vision

system returns an invalid point. The return value is then added to the history of the tracker

where it may be used to form future guesses about the fiducial marker’s position.

This concludes the brief tour of the components of the vision system. What follows is a

more detailed discussion of how each operates.

Figure 4.1: A Tracked Helicopter Marker with Rotation Estimate in Degrees

4.2 By-Estimate Blob Detection

At the heart of the tracking component are the algorithms which perceive the fiducial marker

in a given camera image. In the reference implementation, this operation must run in real-

time and is used to control a highly dynamic system. Therefore, properties such as speed

and accuracy are paramount. If the search for blobs can be “windowed”, or constrained to

40

a subspace of the entire image, then both the computational complexity (time and space)

of the search and the likelihood of false positives can be mitigated. This is the philosophy

behind the by-estimate method of blob detection, the preferred method when there is prior

knowledge of the helicopter state, including position, velocity, and acceleration.

The internals of the by-estimate method could be implemented several ways. For example,

it could search a subregion of the input image using a thresholding method and the known color

configuration. This would accomplish the goal of restricting computational complexity and

improving accuracy, but there are further improvements that could be made. The reference

marker design is a ring with rectangular segments of color affixed to the outer surface area.

When this ring is projected onto a 2D camera image, the visible color segments will appear

to be approximate rectangles. There will be some divergence from this model based on the

physical properties of the camera lens and the orientation of the camera relative to the marker,

but for the given use case (a web camera looking approximately straight onto a marker at

a distance of a few feet) this divergence is small. Furthermore the helicopter is stabilized in

a way that ensures that during normal operation it will not roll or pitch significantly so the

boundary of each color rectangle will be even with the image itself. The dimensions of this

rectangle can then be defined by four points, one on each side.

The reference implementation of the by-estimate method attempts to quickly identify

these four points, and thus a color blob, by searching directly up, down, left, and right of the

estimate. This is accomplished by assuming that the estimate is inside a rectangular block

of pixels with similar hue values and that an edge of this block is characterized by a sudden

shift in hue. Explicitly, the algorithm for the by-estimate expansion of a blob is as follows:

1. Sample the color of the initial guess and save that information.

2. For each of the cardinal directions:

(a) Move d pixels in the target direction and sample the hue again. Take the difference

of this sample and the previous, and add that difference to a running average.

(b) If the running average of the hue change exceeds a threshold T , then mark this as

an edge of the rectangle.

(c) Otherwise, repeat the above steps until an edge is detected or the limits of the

image are met.

41

Figure 4.2: Result of by-estimate method on figure 4.1. The blue rectangles represent possiblecolor blobs. The green dots represent the centers of color blobs in previous frames while blue andred dots represent right and left possibilities respectively. The entire image has been converted toHSV color space.

3. Using the four intersecting points, reconstruct the bounding rectangle and return it.

Compared to an operation on the entire image (e.g. thresholding) which takes W ∗ H

operations, this process takes at most W +H operations. The performance and accuracy of

this method can be tuned by adjusting the values of d, the pixel step between tests, and T ,

the hue change threshold. Other optimizations include testing saturation, in addition to hue,

to prevent white or black backgrounds from throwing off the results, and allowing each search

vector to hit multiple false points before it returns an edge. This prevents certain varieties

of noise, such as shadows or the helicopter tail, from causing an early return. Finally, it may

also be beneficial to enforce an absolute maximum deviation from the first hue sample to stop

slight gradients from throwing off tracking results.

The following search technique is used to find helicopter markers using algorithm 2.

42

Algorithm 2 By-estimate Marker Search

Require: previous state information, HSV image, color configuration1: if no point found last frame then return empty blob set2: end if3: vector ← getEstimatedDisplacement(stateInformation)4: returnBlobSet← nil5: for each blob in last pose do6: estimate← blob.position+ vector7: centerBlob← by-estimate-expansion(estimate, image)8: rightEstimate← blob.position+ vector + sideSearchDisplacementblob()9: rightBlob← by-estimate-expansion(rightEstimate, image)

10: leftEstimate← blob.position+ vector − sideSearchDisplacementblob()11: leftBlob← by-estimate-expansion(leftEstimate, image)12: if centerBlob.color ≈ blob.color then addBlob(returnBlobSet, centerBlob)13: end if14: if rightBlob.color ≈ blob.color.next then addBlob(returnBlobSet, leftBlob)15: end if16: if leftBlob.color ≈ blob.color.prev then addBlob(returnBlobSet, leftBlob)17: end if18: end for

return returnBlobSet

The core operation of this algorithm begins by estimating the motion of the marker be-

tween frames. Each blob of the last marker should then be at the position it was last frame,

plus this new displacement. Therefore, run a search at this new location to see if it matches

the color of the last blob. If it does, add it to the returned set of blobs.

One special case that must be handled is that of a rotating marker. When the marker is

in motion, color blobs will go out of and into view of the camera. If this motion is continuous,

there will always be a stage where the blob is present in one frame and not in the next. When

this occurs, a new blob should be present on the far side of the marker from the disappearing

blob. To detect new colors as they merge, for every estimate expansion of a previous blob

center, a point is expanded to the left and right of the previous blob. This has proven to work

reliably in picking up new colors as they come into view, but there are some side effects.

One such side effect is redundant blob expansion where each color in the marker is ex-

panded twice (once by itself and once by its neighbor). Given the expansion’s fast execution

time, this redundancy actually ends up being beneficial as it helps to guard against noise in

the image. The other side effect is that side expansions occasionally identify blobs of color

that do not belong to a marker. This makes the algorithm that extracts marker positions

work harder and can occasionally lead to false positives.

43

This algorithm returns a collection of blobs, sorted by color, that will then be checked

to see if some permutation can form a valid fiducial marker. See the section 4.4, Marker

Candidate Detection, for more information. If this blob search algorithm fails to yield the

components of a marker, then that task falls to the by-threshold method.

4.3 By-Threshold Blob Detection

Complementing the local blob search algorithm described in the by-estimate section is a search

method that efficiently scans an entire image for color blobs without prior knowledge of the

marker’s location. This section describes the internal workings of such a global method,

described in this thesis as the by-threshold method.

As the name indicates, the reference implementation uses color thresholding to achieve

this goal. More information about the theory behind color thresholding can be found in the

chapter 2, the Literature Review of this thesis.

One of the inputs of the tracking system is information about the construction of the color

marker. This information includes a position, center hue value, and hue tolerances for each

color on the marker. Using a tool provided by this thesis, users can automatically generate

these configurations. The user must select a region of an image and the tool will calculate

statistical data about this region including the mean and standard deviation of each channel.

By default, the center of a color is given the value of the sample mean and the tolerance is

set to plus or minus one standard deviation.

The by-threshold method works by performing a binary threshold operation using informa-

tion from all three channels of an HSV image, as well as an upper and lower threshold value.

This means that a positive value is given to a pixel that falls between an upper and lower

limit for each of its channels. This method is advantageous because it allows finer control on

what values pass through, including the removal colors that do not have enough saturation or

value to be candidates for marker blobs. This is particularly useful for removing very white

or black pixels which are characterized by a lack of hue and value respectively, and are not

associated a particular hue. The threshold limits are set to the color configuration’s center

value plus/minus the tolerance for each channel.

Because of these tests, the by-threshold method is slower in operation than the by-estimate

44

method. It must examine width ∗height pixels (where width and height are the size in pixels

of the image being searched) and perform six branch statements on each one (two per channel

on three total channels) for each color on the marker. However, the results, given a good

color configuration, are much less prone to error than the by-estimate method because noise

or slight occlusion of a color blob will not deter the rest of the blob from being detected

properly.

Figure 4.3: Result of by-threshold method on figure 4.1. The blue rectangles represent possiblecolor blobs. The entire image has been converted to HSV color space.

Thresholding the input image against the colors in the configuration reference produces

three binary images where positive values indicate the possible presence of a marker blob.

Still more processing must be done to reduce noise and extract information required by the

algorithm that selects valid permutations of blobs. Such additional information includes blob

center position and the bounding rectangle.

45

Algorithm 3 By-threshold Blob Search

Require: HSV image, color configuration1: returnBlobs← nil2: for each color in color configuration do3: if color ≈ red then4: binaryImage← redThreshold(image, color)5: else6: binaryImage← threshold(image, color)7: end if8: binaryImage← imageOpening(binaryImage)9: contours← detectContours(binaryImage)

10: contours← sortBySize(contours)11: rectangles← calculateBoundingRectangles(contours)

appendList(blobs, returnBlobs)12: end for

return returnBlobs

The algorithm by which the by-threshold method of blob expansion returns a list of blob

candidates is outlined in algorithm 3. The following section is a discussion of each step in this

process, including edge cases, optimizations, and other knowledge necessary to implement

this logic.

The first step in the by-threshold method is to conduct the threshold operation based on

information stored in the marker configuration. There exists special edge cases that stem from

the fact that the hue portion of the HSV color space forms a closed surface. Practically, this

means that the maximum and minimum hue values represent the same color. The distance

between the red hue at 179 degrees (using the 180 degree OpenCV color wheel) and the red

hue at 1 degree is 2 degrees, not 178 degrees, a fact that must be considered when calculating

the difference between colors. For threshold operations, this means that if the minimum value

is greater in magnitude than the maximum value, all of the values of the image need to be

rotated around the color wheel until the operation makes sense again. In algorithm 3, the

function redThreshold encodes this process.

The raw binary image produced by a threshold operation for a given color may contain a

great deal of noise, or positive values that result from small colored pieces of the environment,

lighting, reflections, or artifacts of the conversion from RGB to HSV color spaces. These

values often appear as specks of white on the binary image. Eliminating this noise will result

in dramatically fewer blob detections which results in increased performance and accuracy of

the tracking method. White noise in the image is handled by performing an image “opening”

46

on each binary image. This process involves first eroding the white values of the image to

reduce noise and then dilating the image to fill in gaps [17].

The next step of the algorithm is to separate and classify distinct blobs in each filtered,

binary image. This is accomplished by detecting the outer contours of each shape formed by

groups of positive values. OpenCV provides functionality for this task in its image processing

module. References for both the application programming interface and underlying algorithm

(with references) may be found in the OpenCV documentation [16]. If the color configuration

is set appropriately and the background does not prominently feature one of the marker colors,

only a small number of distinct contours should be detected.

This is not always the case, however, and the next to last step of the by-threshold method

sorts the remaining contours by their enclosed areas. From this sorted set, all contours with

areas below a set threshold are discarded. By discarding excessively small contours in this

stage, the algorithm ensures that only prominent image features of a given color are considered

as candidates for blobs on a marker.

The final stage of the by-threshold method handles the bookkeeping necessary to pass

the contours onto the next stage. Each distinct contour is labeled with its associated color,

the centroid of the area it encloses, and the bounding rectangle of the entire contour. The

resulting “blobs” are collected and returned to be composed into markers in the next stage.

4.4 Marker Candidate Detection

With a few exceptions, the appearance of the fiducial marker in an image will manifest itself

as a pair of heterogeneous, colored rectangles. The by-estimate and by-threshold methods of

blob expansion and detection work on a single color at a time. Given a collection of color

blobs generated by one of the previous steps, the next task is to identify permutations of those

blobs that make sense in light of the geometry of the marker, the physics of the helicopter,

and known color configuration. This task falls to the marker candidate detection method,

which this section describes in more detail.

This method makes several assumptions about the system to make its job more tractable.

Specifically, these are:

1. the fiducial marker is a perfect circle,

47

2. the three color segments are equal in dimensions and cover the entire surface of the ring,

3. the marker does not pitch significantly,

4. the marker does not roll significantly,

5. the camera is looking straight onto the marker, and

6. the entire marker is visible in the camera image.

The first two assumptions pertain to the construction of the marker. While unlikely that

either of these assumptions is entirely accurate, it is far easier to ensure that the markers

are well constructed than to attempt to program corrections into the system. Deviations

from these assumptions negatively affects the accuracy of calculated poses for the helicopter,

especially the rotation property. Programmatic corrections would require additional training

on the marker which increases the burden on the user for little in return. This system is

designed to give reasonable estimates on helicopter pose very quickly. It does not take the

place of special purpose equipment like motion capture systems, if accuracy is required.

Assumptions three and four deal with the physics of the hardware system. As discussed in

the methodology section, the Syma helicopter is equipped with a stabilizing bar on the rotor

shaft and is further stabilized with an on-board gyroscope. As a result, these helicopters

do not visibly roll while properly operating. The helicopter can pitch visibly during normal

flight, but the angle of pitch is small. Given these properties, it is not unreasonable for the

vision systems to make the assumptions it does regarding marker orientation. The benefits

are significant as many otherwise valid combinations of colors can be disregarded, leading to

a drop in both computational effort and false positives.

Due to the effects of perspective, a helicopter marker with the same pose in the physical

world will appear slightly differently in the top left corner of the system than in the bottom

right corner. The magnitude of this shift depends on intrinsic camera properties such as

field of view. This thesis does not assume a specific camera or calibration so it not generally

possible to calculate the magnitude. Therefore, when calculating rotation based on a perceived

marker, the pose estimator assumes that it is looking straight onto the helicopter.

The final assumption is made solely for the purpose of simplifying logic in the marker

detector. In practical terms, this assumption means that the marker system does not attempt

48

to create a pose guess based on partial color fragments and previous history. Instead, all of

the information used to define a marker must be present in the image. This has not proven

to be an issue in practical testing and it dramatically cuts down on the number of possible

candidates which reduces computational complexity and false positives.

Following immediately from these assumptions are a set of conditions for the composition

of blobs into a representation of the fiducial marker. Two color blobs belonging to the same

fiducial marker should have equivalent bounding rectangle height and y-position properties.

The width of a given blob will depend on its rotation, but the distance between the centers

of two blobs should not exceed the height of the blobs multiplied by the width to height

ratio of the marker. It is further assumed that, for the specific case of the three-color marker

presented here, there are no more than two colors visible at any given time. These two colors

must be present in the order prescribed by the color configuration.

If the helicopter marker is perfectly circular, is divided evenly between three colors, and

the camera can perceive 180 degrees of the marker then there will be situations about the

0, 120, and 240 degree marks where three colors will be visible. Due to the curvature of the

marker, however, the two colors on the ends which constitute thirty degrees of the marker

each will be hardly visible: the farther away the marker is from the camera, the more difficult

the detection. The result of this is, for all practical purposes, that the entire marker can be

represented by one color at these special angles. Thus, the vision component makes a special

exception to its general rule requiring two colors to constitute a blob. If the marker history

indicates it is close to a special angle and if there are no viable two-color candidates, the

marker will consider single color candidates.

All viable fiducial marker combinations can be found by testing each possible permutation

of two unique color blobs against these conditions. The resulting operation has quadratic

(O(n2)) complexity. Computational complexity can be controlled by adjusting the parameters

of the by-estimate and by-threshold blob expansion methods to allow fewer blobs to pass

through. When a viable candidate is calculated, the blobs that compose it are passed on to

the pose estimation stage.

The pose estimation stage of the marker finder is responsible for transforming color permu-

tations into “track points” which consists of an estimation of position, rotation, the bounding

rectangle of the marker, and its composition as discussed in the introduction to this chapter.

49

If a combination consists of a single color, then the corresponding track point inherits the

center position and bounding rectangle property from the blob it is formed from. Rotation

is equivalent to the rotation of the center of the blob color about the fiducial marker’s ring.

This means 0, 120, and 240 degrees are the only viable outputs for the default configuration.

Figure 4.4: The helicopter from figure 4.1 rotated to face directly at the camera and illustratinga marker configuration of one color. Note the color blob on the left of the marker is not detected.

If the combination is of multiple colors, then the bounding rectangle is formed by the

union of the bounding rectangles of the composing blobs and the center position is taken to

be the center of the resulting rectangle. Rotation poses a more complex challenge, but the

assumption that the camera is looking straight onto the helicopter makes the calculation more

straightforward. For the purposes of calculating rotation, the marker is modeled as a hemi-

sphere that is projected onto the axis of its diameter. Using the perfect circle assumption, the

magnitude of a rotation (in radians) to a known point from the center of the marker is given

by the following equation: magRotation = arcsin 2x where the variable x represents the ratio

of the center-to-known-point distance to the overall length of the observed marker. In the

rotation calculation, the length of the marker and center point come from the bounding rect-

angle calculation. The “known point” is the junction x-coordinate of the boundary between

distinct color blobs. The rotation value that this boundary represents can be calculated from

50

the colors forming it and will take on values of 60, 180, and 300 degrees with the default

marker configuration.

When this pose estimation has been performed for every candidate, the resulting estima-

tion collection is returned. The next, and last, stage of the vision component will select the

optimal candidate from this collection. This will represent the result of the entire tracking

operation. If no permutation of color blobs could be found to satisfy the constraints of the

algorithm, it is possible that an empty collection of estimations will result from the marker

finder.

4.5 Candidate Selection

The final stage of the vision component takes as input a collection of possible poses that have

been derived from color blobs found in the image. It also accepts the estimate of the position

of the marker calculated at the beginning of the entire vision process. This guess is a function

of the last known pose and an estimation of velocity. This stage must select the best pose

from the input collection to be the result of the entire calculation.

If no possible poses were calculated, the candidate selector immediately returns an invalid

point indicating failure as the result of its calculation. If given a single pose estimation, then

that estimate is returned immediately as the result of the calculation. If the algorithm is

given an invalid point as a guess, as might be the case if the tracking system has just started,

then the largest marker in terms of pixel area is chosen as the result. Otherwise, if there are

multiple helicopters in the frame or if the background of the image is particularly complex,

there may be multiple candidate points, and the “best” one must be chosen.

The case of multiple candidates is handled by iterating over the collection of poses, scoring

each pose with its “distance” from the guess, and retaining the point with the best score. The

scoring algorithm operates by calculating the percent differences between the candidate and

the guess for the properties of x-position, y-position, bounding rectangle dimensions, and

rotation. Each of these percent differences is normalized by the total number of scoring

properties and subtracted from 1.0. Thus, a point with entirely the same properties as

the guess will score a perfect 1.0. The percent change is calculated by subtracting the guess

property from the candidate property and normalizing that amount by the maximum possible

51

change in the property (which is the size of the image frame for coordinate positions and 360

degrees for rotations) ensuring that no score can go below 0. The highest of these scores is

returned.

4.6 Closing Comments

This chapter has discussed the concept of the vision component of the helicopter control

system and how the reference implementation works. Please refer to Appendix A for more

information about how to programmatically interface to the logic described in this chapter

using C++ and OpenCV. The result of the track interface in the vision component is a best

guess of the pose of the helicopter in the camera frame. This pose estimation may be fed into

a feedback controller to automate flying of the helicopter. The reference implementation is

described in the next chapter.

52

Chapter 5

Feedback Control Component

This chapter presents a more thorough investigation into the design and implementation of

a feedback controller for the Syma helicopter control system. While this thesis provides

a “reference” implementation of this component, interested individuals may provide their

own implementations to change the system behavior or extend control to new hardware. A

compatible extension must provide a defined set of services that constitute the interface of

the feedback control component.

The feedback controller’s core responsibility is calculating what movements the controlled

system must execute to bring its pose to a specific value. For a Syma helicopter, these

movements are encoded into the digital signal that is broadcast from the control system

to the helicopter. The primary method of the feedback controller’s interface is the control

method which takes as inputs the current and desired pose of the system and produces a

digital signal meant for broadcast to the helicopter. This digital signal should help bring the

helicopter’s pose to the target state.

This thesis uses the terms “state” and “pose” to mean similar, but slightly different

things. “Pose” is defined as the helicopter’s position in space at a moment in time (x,y,

z, and rotation). “State” includes information from the pose and, additionally, properties of

the helicopter that are dependent on time, such as estimations of velocity and acceleration.

Effective control requires information about the state of the helicopter which must be derived

from individual poses. However, measured pose estimations may not always be accurate or

precise representations of the true pose of the helicopter. When the pose information deviates

53

from the true value, it falls on the feedback controller to limit deviation in its estimate of

state. Techniques for accomplishing this task are discussed in section 5.2 of this thesis, the

State Estimator.

For safety and ease of use, the controller should output a “shutdown” signal when it

does not have a positive idea of the controlled system’s pose. This prevents, for example, a

helicopter from flying before it is being tracked or flying away if the system has lost track of

it. That said, it is unlikely that the tracker will always find a marker even if it is present in

the camera view. The control system should not fail because a single frame was dropped, but

instead, should fail only if an amount of time deemed “unrecoverable” has passed without

new knowledge of the helicopter pose. No definitions for this value are given as it will depend

greatly on the system being controlled. For the Syma S107G reference implementation, this

value is taken to be one second. This is accomplished through the other required method of

the feedback controller interface, reset. The reset method should reset any internal state that

the feedback controller keeps and should zero the output of the controller. From a practical

standpoint, this feature allows for running multiple helicopter trials without resetting the

controlling program, prevents helicopters from flying out of control, and provides a general

mechanism to handle error conditions in user programs.

5.1 The Reference Implementation

A micro helicopter in flight is a very dynamic system and its behavior depends on many

factors that are difficult to control and equally difficult to ignore; factors such as air currents

and ground effects play large roles and a general control system must be robust to them.

The reference implementation of the feedback control system is not such a general controller.

Rather, it is a proof of concept showing that positive control of Syma helicopters is possible

for a noticeable amount of time. That said, the reference implementation also provides a

framework that allows users to build more complex control systems by supporting “hot-

swapping” of control logic. Here, “hot-swappable” refers to a trait of the controller that

allows portions of it to be changed, or swapped, while the system is running, or hot. This

feature is discussed in the Master Controller section of this chapter.

The reference implementation of the feedback controller has three subcomponents:

54

1. a state estimator, which has the job of converting a potentially noisy data stream into

a coherent estimate of the controlled system’s state,

2. a master control unit, which serves as the interface between user programs and under-

lying control logic, and

3. channel control units, which are responsible for individually calculating each system

control channel’s next input, together forming the underlying control logic.

In order to calculate an appropriate output signal, the controller must be aware of the

controlled system’s current pose. This pose is the first argument to the control method. The

interface does not stipulate where this pose estimation comes from. It may be the most recent

value from the tracker or it may come from some user defined, intermediate program. An

example of such a program would be a user defined state estimation system for combining

the results of multiple sensors.

The reference implementation of the control method, however, is designed to accept the

latest pose estimate from the vision component. The first thing this control method does is

use this new pose to update its belief about the state of the helicopter by calling the state

estimator. The reference tracker may return invalid poses to represent a failure to identify

the marker or it may return a noisy pose that does not represent the true pose of the system.

Reducing the sensitivity of the controller to these types of errors is the responsibility of the

state estimator. In addition to error mitigation, this subcomponent also estimates elements

of the state of the system that are derived from pose inputs such as derivative and integral

information.

In the step after state estimation, the input setpoint and the newly updated state estimate

are passed as arguments to a set of “channel controllers”. Each channel controller is respon-

sible for calculating the magnitude of one component of the digital signal to be transmitted

to the helicopter, or other controlled platform. For the Syma S107G, there are three channel

controllers, one each for yaw, pitch, and throttle. When all channel controllers have finished

their calculations, the result of each is assembled into an appropriately formatted signal suit-

able for transmission. This architecture helps keep control logic modular, allowing users to

change the channel controllers to better respond to the current system state as the helicopter

is in flight.

55

The reference controller is designed to effectively stabilize an airborne helicopter in a

camera frame. This means that the user of the program must start the entire control system

so that it is running and then manually insert the helicopter into the target scene, ideally away

from surfaces. The reason for this constraint is that it has proven very difficult to stabilize

the helicopter when it is operating in ground effect, a condition caused by the interference of

a surface with the airflow pattern of the rotor system. This issue was the primary impetus

behind the modular design of the master control unit. A controller specifically designed to

handle take off could be used when the system starts and then swapped for a controller meant

for free air flight.

An interesting characteristic of the Syma 107G helicopter, from a control standpoint, is

the discrepancy between the control and sensing rates. Control signals are broadcast with

a period of 120 to 200 milliseconds (approximately 8 to 5 Hz). New information from the

camera is available approximately every 33 milliseconds (30 Hz). The reference controller

operates by feeding information to the controller at 30 Hz, calculating an output, and sending

this output over serial to the Arduino responsible for signal transmission. The firmware of

the Arduino is design to handle input at a faster rate than it outputs so this discrepancy

is not an issue. Users looking to implement control for new hardware should be careful to

handle “flow control”, either by designing their logic to handle variable input/output rates or

by being careful to throttle output rates.

This concludes the brief tour of the components of the feedback control system. What

follows is a more detailed discussion of how each of these subcomponents operates.

5.2 State Estimation

Whenever the control interface is called, the feedback controller first uses any new pose

information to update its knowledge of the state of the controlled system. To facilitate this

task, each new pose passed as an argument to the control method is immediately given a

timestamp and the resulting structure is cached in memory. Derivatives of all position and

dimension attributes of state are calculated using a first order forward difference scheme.

Integrals are calculated using a running Riemann sum.

It is also possible that the input to the system is a pose that indicates a failure to track

56

or a pose that does not represent the marker being controlled. The other task of the state

estimator is to turn the potentially noisy stream of information from the tracking component

into a more reliable estimate of the helicopter’s true state. The reference implementation

achieves this goal through two mechanisms - first, it removes invalid tracking information and

second, it filters valid components of the input stream.

If an input pose is invalid, then the state estimator sets a flag and discards the invalid pose.

The flag indicates that tracking has possibly failed and a time is associated with the raising

of the flag. If subsequent pose inputs indicate that tracking has been reestablished, then the

flag is set back to false and operation continues as normal. If, however, more invalid poses are

input, the controller examines the difference between the time the invalid flag was set and the

current time. If this time value exceeds a threshold of one second, then the controller’s state

estimation is reset and any calls to it return a state indicating no information is available.

Valid tracking must be reestablished for a full second before the state estimator returns to

normal functioning. Changes in the validity of the state estimation are accompanied by a call

to the reset interface provided by the user.

The other mechanism for ensuring a reliable stream is filtering. There are a variety

of techniques for extracting improved state estimates from a stream of noisy information.

These techniques range from simple moving averages to statistical techniques built on top of

mathematical models of system dynamics. To keep things as simple as possible, the reference

implementation uses a low pass filter to control against spikes in the input stream. Each

update of the state estimator, the new value of attributes, x from the system state, is given

by the weighted average of x from the new pose estimate and x from the current state

estimate. The weight of attributes in new pose samples can be decreased to prove better

filtering against noise or increased to reduce the time response of the filter relative to the

system. Note, however, that the Syma S107G is a fast moving system and a filtering scheme

with a high time constant will be detrimental to effective control.

The filter implementation is simple. For most tracked attributes such as the x coordinate,

y coordinate, and the bounding rectangle dimensions, the process is as simple as estimate =

oldEstimate∗(1.0−r)+newSample∗r where r is the weight discussed earlier. Time dependent

calculations, such as derivatives, are also filtered in this manner but with a different, slower

weight to better control noise. Some other tracked attributes have properties that are not so

57

easily calculated or are not appropriate for control inputs in their raw form.

The information returned by the tracking component is given in the camera’s reference

frame where position and distance values are encoded in units of pixels. Derived information,

such as velocity, is also expressed in terms of pixels. An equivalent pixel distance at two

different depths represents different physical distances due to the effects of perspective. To

be useful for control, values presented in the camera’s reference frame should be converted

to physical coordinates meaningful to the controlled system. Fortunately, the size of the

helicopter’s fiducial marker serves as a constant that can give a physical meaning to pixel

distances. These pixel differences are transformed to units of “marker heights” by dividing

them by the estimated pixel height of the marker. The distances can be further transformed

to any desired unit, but the reference marker design has a height of about one inch, allowing

for easy mental calculations. The height property of the marker is chosen because this mea-

surement tends to be less noisy than the width. The tracking system must recognize two blobs

to recover the entire width, but height requires only one. The validity of this transformation

is predicated on the same assumptions that much of the vision component relies on, including

an assumption that the camera image distortion is minimal and that the full marker height

is detected.

Rotation is a more difficult problem to handle due to the closed nature of its domain

(i.e., the rotation value “wraps”). The tracking system reports a helicopter rotation from 0

radians to 2π radians. Any scheme that implements a form of averaging creates an edge-

case when dealing with rotation. If a helicopter’s counter clockwise rotation continues past

2π radians, the tracker will begin returning poses that have small rotation values (rotation

wrapped around). In this case, if the results are averaged, the new small values will pull the

state estimation of rotation down in value. A control system will interpret this as a sudden

angular acceleration in the clockwise direction, leading to a control value meant to correct it.

An elegant solution to this problem is to use a one-dimensional analog of a quaternion. In

practical terms, a quaternion is a four-dimensional vector, sometimes interpreted as a rotation

axis and rotation amount, often used to represent three-dimensional rotations without being

subject to “gimbal lock” [36]. In this thesis, one dimensional rotations are translated into

a two dimensional vector by calculating the x and y positions of the given rotation on the

unit circle. These unit circle coordinates are averaged in place of the real-value numbers

58

representing rotation. This two dimensional representation is not subject to the wrapping

problem. If desired, real value rotation is recoverable by converting the coordinate pair back

to polar coordinates. As a side benefit, the length, l, of the polar coordinate is an indicator

of the transience of the rotation. Typically, this value will be approximately 1.0, but a fast

rotational motion will cause it to shorten.

When the state estimator has finished its job, the next step for the control method is to

call the master controller. Here, the estimated state will be used to calculate an appropriate

control signal for transmission to the helicopter.

5.3 Master Controller

As discussed in the introduction to this chapter, the reference feedback controller separates

logic for each input channel of the Syma helicopter (yaw, pitch, and throttle) and allows

users to change that logic while the system is running. This is done to simplify the process

of designing controllers which often naturally operate in different states. For example, the

process of helicopter takeoff has different physics than flight in free air and would likely be

controlled using a different technique.

The master control submodule oversees the logistical concerns required to make this imple-

mentation feasible. It takes ownership of the individual channel logic, automatically updates

the state estimate, and collates the output of each individual channel controller into a final

system output. In short, it implements all the necessary facilities to make control work prop-

erly. It also exposes interfaces to allow the user to change a controller by providing specific

program logic and what channel that logic is responsible for. If no channel controller is avail-

able for a given channel, then a call for output on that channel will return a neutral value by

default. For the Syma S107G helicopter, neutral values are 64, 64, and 0 for yaw, pitch, and

throttle respectively. Finally, the master controller supports querying state information from

the state estimator. This facilitates the construction of higher level logic, such as a waypoint

system, on top of the existing system.

The “hot-swappable” controllers are implemented in C++ using polymorphism and a base

class defining the submodule’s interface with pure virtual functions. The master controller also

handles any memory management related to the controllers it possesses. This implementation

59

is just a suggestion based on the general availability of polymorphism in modern computer

languages. Also, please note that the reference controller only supports control based on the

marker center position and rotation fields of the setpoint. Controlling the helicopter marker’s

height and width, equivalent to depth, is not yet supported.

5.4 Channel Controllers

The next to last step of the feedback controller’s control method is the stage where the value

of each signal channel is determined. For the Syma helicopter, there are three channels:

yaw, pitch, and throttle. Each of these channels is associated with a “channel controller”

that embodies a program logic for transforming a given system state to a controller output.

This transformation should accomplish the task of bringing the estimated system state to the

setpoint’s state.

Every call to control results in a subsequent call to each of the registered channel controllers

which take the estimated system state and the desired setpoint as arguments. The order in

which these controllers are called is not defined and users should not make assumptions

of this nature. The only other constraints on the channel controllers is that they should

always produce valid output for the channel for which they are responsible for. On the Syma

helicopter, this means an integer value between 0 and 127.

This architecture was designed with the goal of maximizing the flexibility and extendibility

of the reference controller. A difficulty with this methodology is that it is more technically

complex because it relies on intermediate C++ topics, like inheritance and virtual functions.

This has the potential to put off “would be” student-hackers and for this reason, an alternative,

simpler implementation is provided, one that offers the same reference logic without hot-

swapping but uses nothing more than simple object oriented code.

Perhaps more fundamentally, this model fails to truly acknowledge the coupled nature

of each channel on the state of the system. For example, pitch forward can induce lateral

motion and attempting to yaw can cause a loss of thrust (and subsequent drop in height). It

is assumed that this coupling is not significant enough to be explicitly modeled in the control

system, and that the logic of each controller should be insensitive to this. That said, each

control module has access to the full system state and can optionally choose to neutralize its

60

output while some other variable is brought under control. A perfect example of this is a

pitch controller which suppresses its output while yaw is not close enough to a setpoint.

All of the reference channel controllers implement proportional-integral-derivative (PID)

feedback controllers underneath. These reference controllers work with the classical form of

the PID equation, and base their derivative actions on the process variable. The individual

controllers differ in exactly how they calculate error and in the particular gain values, or

weights, given to each of the proportional, integral, and derivate terms. These gain are preset

to values selected after extensive testing, but users may elect to change them at will.

The following sections discuss the reference implementation for each of the three channel

controllers.

5.4.1 Yaw Control

The yaw channel controller’s primary task is to control the rotation of the helicopter. From the

perspective of the physics of the system, this is one of the easiest attributes to control because

the helicopter carries very little momentum through its yaw motion. However, rotation is also

one of the noisiest parameters to measure because it depends on reliable detection of many

factors. These characteristics suggest that only a degree of proportional control is necessary

for stable control of yaw. Positive control can be made more accurate and faster with integral

control, but only if measurements are fairly accurate.

Note that a yaw channel input of approximately 64 indicates no motion in the yaw direc-

tion. The results of the internal PID are added to this central value to determine the output

of the controller. This center or neutral value varies and the original controller implements

trim control by changing this value. Error for the yaw channel controller is calculated by

finding the signed, minimum path distance between two points on the rotation “circle”. See

figure X for further illustration. The reference yaw control is always active.

5.4.2 Pitch Control

The pitch channel controller’s primary task is bringing the x-coordinate position of the heli-

copter under control. Unlike yaw, motion induced by helicopter pitch carries with it noticeable

momentum. As a result, the pitch controller has a relatively large gain applied to its deriva-

61

tive action in addition to proportional action. It is important to note that a pitch control

input of approximately 64 indicates no pitch motion to the helicopter. Thus, the output of a

PID control should be added to this natural center value.

Error for the pitch channel controller is defined to be the distance between the marker’s

center point and the setpoint normalized by the height of the helicopter marker (see the

discussion on coordinate transformations in the state estimator section). The reference im-

plementation of pitch control is suppressed when the helicopter’s orientation is not close to

reading 90 or 270 degrees, as these rotations represent normal vectors to the camera at which

pitch induced motion stays in the x-y plane of the image.

5.4.3 Throttle Control

The throttle channel controller’s primary task is to bring the altitude of the helicopter under

control. This task is complicated by the nature of control inputs to the helicopter. For all

practical purposes, the state of the system inside the helicopter, including rotor speeds and

battery level, is a black box. The same control input at two different battery levels produces

noticeably different thrusts and the controller must do its best to account for this. After

approximately five minutes, the helicopter will no longer be able to remain airborne.

From a control standpoint, these characteristics indicate that the “neutral point”, or

reference value, at which the helicopter maintains steady altitude is an unknown value that

changes as a function of time. Other variables influence the reference value, including the

weight of any attached markers, the age of the battery, and any environmental effects such

as proximity to a surface, or air currents. The controller attempts to overcome these issues

through the use of a large gain on the integral term of the underlying PID control and a

reasonable guess at the fully charged throttle reference point (approximately 75).

Error is calculated for this controller by dividing the pixel difference between the set point

and the y-coordinate of the center of the marker by the height of the marker. Like the yaw

channel controller, the throttle channel control is always active.

62

5.5 Closing Comments

This chapter has discussed the concept of the feedback controller component of the helicopter

control system and how the reference implementation works. Please refer to Appendix A

for more information about how to programmatically interface to the logic described in this

chapter using C++ and OpenCV.

The result of the control interface in the feedback controller component is a signal suitable

for transmission to the controlled platform that will attempt to bring the system state to a

desired value. This signal can then be sent to the controller responsible for broadcasting the

signal to the platform. The means by which the signal from control is calculated can be

altered during runtime by a user’s own program in the reference implementation. Again, see

Appendix A for more information. The next chapter presents a series of tests to characterize

the performance of both the tracking and feedback control components of the reference system.

63

Chapter 6

Tests and Results

This chapter presents a series of tests and results meant to quantify the performance of each

component of the control system developed in this thesis. The transmission component,

including the Arduino and accompanying electronics, is tested to quantify the performance

characteristic of range. The accuracy and reliability of pose estimations is the primary focus

of the evaluation of the vision component. Finally, the feedback controller is characterized by

examining the system’s behavior under control.

This chapter follows a test and then result format. Each test is presented, its goals stated,

and its setup explained. Following the exposition, the test results are presented and their

significance is discussed.

These discussions attempt to answer how the results relate to the stated thesis goal of

building a low cost (less than $50 USD) supplement to a laptop that will allow a STEM

student to experiment with aerial robotics in a classroom environment and then hack on it to

learn how it works. In addition, this chapter provides a comparison to a very similar thesis by

Currie [34], giving attention to the pros and cons of her approach compared to the approach

of this thesis. This discussion concludes with a critical analysis of the approach to Syma

helicopter control given by the reference controller.

64

6.1 Software Validation of Chosen Algorithms

This section documents efforts to profile the performance characteristics of the developed

thesis software under target operating conditions. Tracking software, in particular, needs to

run fast enough to leave sufficient computer resources for user programs on computer hardware

likely to be found in an educational environment.

6.1.1 Software Profiling

To get a sense of the runtime costs of the vision and feedback control components of the Syma

control system, the reference implementation was computationally profiled. It is important to

characterize the speed of the algorithms used as they will determine what computer hardware

can be realistically used and how much headroom is available for user programs. Time spent

analyzing images and calculating output signals also contributes to the latency of the system,

which in turn influences control. The common commercial web camera that the reference

implementation is designed for runs at 30 frames per second, or 33 milliseconds per frame.

The total processing time for a frame in the reference control system should be less than that.

A memory analysis tool, Valgrind, was used to profile memory usage. These tests help to

determine the suitability of the implementation presented in this thesis for general use. Profile

information will vary with hardware, operating system, and software versions, but general

trends can easily be discovered. The primary hardware for these tests was a Dell Precision

desktop, with Intel Core i7-3770 running at 3.40 GHz with 16 GB of system RAM, running

GNU/Linux (Ubuntu 13.10). The web camera used was a Logitech C270; this conforms to

the USB video device class allowing Linux, the test operating system, to control the camera

properties with a Video For Linux driver (V4L). Time profiling on Linux was conducted using

the operating system’s steady timers which are purpose made timers meant for measuring

durations, as opposed to points in time. Profiling was performed by measuring the time

period between the completion of an image fetch from the video source and the completion

of the helicopter tracking process. (Please note that while profiling data gives a sense of the

approximate run time of an algorithm, the results are highly dependent on system architecture

and any other computational loads.)

Results of the time profiling indicate that the vision component engenders the majority

65

of the computational effort of the control system. The first frame of the vision system takes

a hefty 40 milliseconds to process but after that, the worst case processing time (for a full

image running the by-threshold method) was found to be 6367 microseconds with a standard

deviation of 972 microseconds. These numbers were based off a video sample consisting of

793 frames, subtracting the first, startup frame. The vast majority of this time is spent

thresholding the three channels. The control system, in contrast, only uses approximately 20

microseconds, on average, to calculate the next signal.

Figure 6.1: Graphical summary of most expensive functions during profiling. Note that track-ing and color conversion is responsible for nearly 70% of the computational cost. The controlalgorithms were not expensive enough to be displayed.

Memory usage, per Valgrind’s analysis, was found to be approximately 35 MiB with

approximately 6000 bytes of leaked memory. This lost quantity does not appear to be a

function of time, instead appearing to be coming form the GUI toolkit of OpenCV and not

this thesis’ implementation. Memory usage can be significantly decreased by disabling the

GUI windows used for debugging.

6.2 Vision Component Tests

This section presents the tests used to validate the functionality of the vision control system.

Unless otherwise stated, these tests are conducted by first, recording sample video and later,

feeding that sample to the tracking algorithm as if it was occurring in real time. This prevents

the computationally expensive process of video encoding from unduly influencing the tests.

6.2.1 Rotation Detection Tests

This test assesses the accuracy of the rotation detection algorithm under good sensing con-

ditions. This was accomplished by placing the helicopter on a raised platform so that it was

66

approximately level to and directly in front of the sensing camera at a distance of approxi-

mately three feet with an unobstructed view. For comparison, a second test was run at six

feet. Ground truth was determined by using a protractor to carefully measure the rotation

of the marker relative to the camera plane. The corresponding measurement of rotation from

the vision component was similarly recorded. The helicopter’s rotation was measured in ten

degree increments, and the test was conducted live under conditions representative of the

system’s target operating environment. After the first semi-circle, the helicopter was flipped

in orientation to measure the back angles as the protractor only measured 180.0. This in-

cludes a fiducial marker used for testing that is not precisely constructed or calibrated, but

instead were made with the care and accuracy one might expect from middle school students.

A rotation measurement from the tracking component consists of the average and standard

deviation of three seconds of image frames.

Figure 6.2: Measured and Ground Truth Helicopter Rotation at 3 ft

Figure 6.2 depicts the results of the three foot trial. The maximum error encountered

at any orientation was approximately ten degrees and the maximum standard deviation was

1.64 degrees. The chart of error, shown in Figure 6.3, shows a fairly consistent negative error

across all measured angles with a mean of approximately five degrees. Here, a negative error

indicates that the rotation measurement is greater than the ground truth. An additional

67

Figure 6.3: Error in Degrees between Measured and Ground Truth Rotation at 3ft

interesting feature of the error chart is the presence of pronounced gradients near the “single

color” points of 0/360, 120, and 240 degrees.

The systematic error in the measurement of angle may be a product of several contributing

factors. The first factor is that the fiducial marker’s geometry may not be a perfect and evenly

segmented ring. By carefully measuring the length of each color arc on the fiducial marker

used in this test, the length of the 0 degree and 120 degree centered-color arcs were determined

to be approximately 3.875 inches while, the 240 degree mark was a half inch longer at 4.375

inches. Error when the long side is not present, from 0 to 90, appears to be less than when it

is.

Second, the helicopter itself appears to be mounted to the ring at a slight angle. Ground

truth was established relative to the nose of the helicopter, not the center of the marker itself.

Careful measurement of the marker shows that the difference between the two arc lengths

derived by dividing the front color into two segments at the nose’s position is almost 0.75

inches. This is significant given the approximately four inch length of the arcs.

The gradients found at the “single color” points are indicative of a problem in the design

of the three color fiducial marker. When the helicopter approaches one of these points, there

comes a period of time where, for all practical purposes, there is a single color in view. Before

68

the new color segment is large enough to detect, the vision component’s estimation of rotation

sticks to the angles 0, 120, and 240. At three feet, this “sticking” appears to occur for ten

degrees. At longer range, the issue becomes worse as the size of the color blobs become

smaller.

Figure 6.4: Measured and Ground Truth Helicopter Rotation at 6 ft

For this control system, three feet between the helicopter and the camera is a very close

range. At this helicopter depth, the size of the camera viewport is on the order of a couple

of feet, making it very easy for the helicopter to fly out of view. Six feet represents a more

realistic range for control. 6.4 shows the results of the orientation test at six feet. This chart

displays the same general trends as the data taken at three feet. There is still a consistent

negative bias of approximately five degrees. However, the sticking points at 0, 120, and 240

degrees have grown to be twenty or thirty degrees wide.

As an instrument to facilitate a control system in an educational context, this system

achieves acceptable accuracy. Most IR transmitters powered by a micro-controller will not

broadcast much past six feet in an appreciable arc, so the issue of rotation sticking at longer

ranges is a practical non-issue. Furthermore the irregular marker used in these tests demon-

strates that the reference vision component operates reasonably even without accurately built

hardware. This greatly increases the system’s practical applicability for young students in a

69

Figure 6.5: Error in Degrees between Measured and Ground Truth Rotation at 6 ft

classroom setting as they are unlikely to build their markers with great precision.

6.2.2 Speed of Tracking

This test seeks to understand the system’s limitations when dealing with a fast moving target.

A sheet of white wainscoting, 32 inches wide, was set up behind the test area. By measuring

the size of the same board in the camera image, a transformation can be developed from

pixels to inches. If the helicopter is then quickly translated across the board, the effects of

high velocity on tracking integrity can be measured. Vertical measurements were taken by

projecting the helicopter across the board by hand. Vertical measurements were taken by

dropping the helicopter from the top of the camera frame to the bottom. The free-fall test

represents a close approximation of the maximum possible speed of the helicopter. As long as

the helicopter remains fairly close to the board, the pixel translations can be effectively made

into real world units. Video of each of these tests is piped into a special program that runs the

tracking algorithm and gathers related statistics including total frames, false negative frames,

maximum frame to frame velocity, and most consecutive false negatives.

During the course of the test, the helicopter reached approximate maximum instantaneous

70

(frame to frame) velocities of 110 inches per second, horizontally, and 180 inches per second,

vertically. The chief difficulty with tracking at higher velocities is motion, a phenomenon

due to object displacement during the exposure of the image, even at 30 frames per second.

Figure 6.6 depicts a frame of the helicopter when it is at maximum velocity.

Figure 6.6: Demonstration of Helicopter Motion Blur at 30 Frames per Second

For the tests where the helicopter was introduced and tracked in the frame prior to being

translated, the helicopter was successfully registered in every test frame in which it was

present. In the case where the helicopter literally fell into the frame at high velocity, the

tracking was much less consistent. This appears to be due to the helicopter marker no longer

being approximately parallel with the x axis of the camera image and motion blur. The

rotation issue is a product of an early tracking software design decision, one intended to help

eliminate as many false positives as possible. The second issue appears to manifest itself

when the smaller color blob becomes too distorted to register: Without prior knowledge of

the helicopter position and velocity, the system will not take a single color blob to represent

the helicopter (again to reduce false positives).

71

6.2.3 General Tracking Performance

In general, the tracking component of the Syma control system does a good job estimating

the state of the helicopter. With bright, even lights and a background mostly constituted of

colors not on the fiducial marker, the system’s tracking performance at ranges less than eight

feet is seamless. Computational cost on a modern computer is also encouraging as the 6-8

millisecond processing time is far less than the maximum of 33 milliseconds. However, there

are still a number of issues.

The most obvious issue from testing is the tendency of the tracking system to “stick” to

single colors as the marker turns in a circle. This issue, compounded at range, reduces the

resolution of control that the system can offer. A related issue is a tendency for the system

to fail to track when the helicopter is in front of a color similar to one on its ring. The blob

expansion algorithms will see part of the background and the color blob on the ring as the

same, large object. At best it interprets this as the helicopter rotating. At worst, it drops

tracking all together.

These issues are a function of the information (or lack thereof) retrieved from the heli-

copter’s fiducial marker. Some improvement might be made by inferencing the appearance

of the helicopter based on the history of its movement. Another viable solution would in-

volve modifying the physical properties of the fiducial marker such that it provides viable

information at any angle.

The by-estimate method also tends to produce inaccurate rotations at close range because

the expansion vectors slightly overstep the actual edges of the colors. This occurs because

of a fault-prevention mechanism in the algorithm which allows each vector several errors

before detecting an edge. Solving this issue will involve redesigning the expansion method.

Alterations of this nature and the changes to the fiducial marker mentioned above will be

discussed further in the future work section of the conclusions chapter.

72

6.3 Feedback Controller Tests

6.3.1 Feedback Control Analysis

In this test, the full control system is engaged to hover a Syma helicopter in the center of the

camera view. This test was performed in an open area with the camera approximately four

feet off the ground. The helicopter was inserted manually into the camera frame at a depth

of approximately six feet. The helicopter was allowed to hover until the system was no longer

able to control it.

Figure 6.7: Throttle and Elevation Error For Hovering Helicopter

Figure 6.7 depicts the throttle control output of the system and the normalized error in

elevation with respect to time. Figure 6.8 depicts the pitch control output of the system and

the normalized x axis error with respect to time. As discussed in the controls chapter, the

normalized error is the absolute pixel error at a given time divided by the marker height at

that same time. This transformation helps account for the affects of perspective on the 2D

image. Note that a decrease in the vertical axis in these charts corresponds to an upward

helicopter motion, following the convention in computer coordinate systems. Also note that

where the term “pitch control” is used in this chapter, it refers to the pitch component of the

signal input and not the actual Euler angle. The helicopter actually pitches very little during

73

Figure 6.8: Throttle and X Axis Error For Hovering Helicopter

a given run.

The result of this test is that the helicopter initially corrects its altitude and settles into

stable control after ten seconds. For approximately five and a half minutes, the helicopter’s

elevation gently oscillates with a period of approximately 1.25 seconds and an amplitude that

averaged under four inches. After five and a half minutes, the helicopter’s batteries deplete and

it can no longer maintain altitude. Syma’s specifications state that flight duration of a brand

new helicopter is seven to ten minutes [?]. Anecdotal evidence from extensive testing done in

this thesis seems to suggest this might be a bit optimistic. In addition, the helicopters used in

testing have all been charged many times, so a six minute battery life seems very reasonable.

One interesting feature of Figure 6.7 is the steady climb of the throttle required to stabilize

the helicopter at a constant elevation. When the run started, the helicopter was near fully

charged and hovered at an average throttle input of approximately 73 (on the 0 - 127 scale).

After five minutes, the average throttle required to hover had risen to approximately 103

suggesting that a steady increase of about one throttle point per ten seconds is required

to stabilize the helicopter in air. This rate is small enough that, for this case, the throttle

controller’s integral term took care of the required adjustments.

Another interesting observation is the coupled nature of the throttle and pitch control.

74

Figure 6.9: Normalized and Unnormalized Elevation Error

The oscillations seen in the elevation control, Figure 6.8 appear to be related to the corrections

in pitch. Oscillations on both channels appear to have approximately the same frequency and

phase shift. The helicopter is able to correct its elevation very quickly as this is simply a

matter of perturbing the main rotor speed. It is not able to correct pitch nearly as easily

because this requires changing the direction of the motor on the tail of the helicopter and

waiting for the helicopter to pitch forward or backward. This process of changing direction

can take a couple of seconds and this delay is evident in the stability of the pitch control

using a PD controller. Adding integral control significantly smooths out the entire system’s

performance.

With this pitch integral control, the helicopter maintained its altitude within two inches

of the setpoint for the majority of the run. More dramatically, the average oscillation am-

plitude along the x-axis was reduced to within four inches of the setpoint. Furthermore, the

oscillations on both control channels themselves are much less apparent and are likely caused

as much by perturbations in air currents and error in the tracking system as they are by

control errors. It is likely that further testing and tuning will lead to faster settling time and

perhaps more accurate control. The accuracy of the current tuning is accurate enough that

it is obvious that the helicopter is under positive control and trying to hover at a given point.

75

Figure 6.10: Throttle and Elevation Error For Hover Command with Integral Control

The helicopter can be manipulated in 2D space by adjusting the setpoint of the control

system. The system sometimes fails when the helicopter is given a change in setpoint with

a very large real distance change, especially downward. While trying to achieve the point,

the overshoot inherit in a feedback control system can cause the helicopter to either hit the

ground or fly out of frame.

For the purposes of this control system, these results are very encouraging. When the

tracking component can operate reliably, the control system is very stable and a helicopter

can be flown for the length of its battery life. For students, educators, and enthusiasts, this

system provides a base on which to tweak, prototype, and extend the vision and control

components. There is still work to do, especially in regard to issuing commands in three

dimensional space. Still, the existing control system implementation fills the requirements of

a simple educational system. It demonstrates positive control for extended periods of time

and allows the helicopter to hover at a chosen point with enough accuracy that a student

could realistically run a “mission” with the helicopter.

76

Figure 6.11: Throttle and X-Axis Error For Hover Command with Integral Control

6.4 Failure Modes

The common methods by which a control system run was interrupted were, from most to

least common:

1. the helicopter flew outside of the range of the transmitting LED,

2. the helicopter’s battery ran out of charge,

3. the helicopter failed to get a good start, and

4. the tracking system lost a lock for too long.

The first issue is caused mostly by drift in the Z axis of the camera view, here defined

as into or out of the image plane. The reference implementation of the controller does not

yet actively try to control the depth of the helicopter. Implementing this feature will require

knowing camera intrinsics or an additional training step. Stability of the 2D system was

deemed a priority.

The helicopter tends to be very stable, but some disturbances in the environment or violent

corrections in the helicopter’s pitch can cause significant drift in the Z direction. Care must

77

be taken to point the transmitting LED at where the user expects the helicopter’s average

position for a run to be. Care must also be taken when issuing setpoints the helicopter as it

is possible to command the helicopter to fly right out of the range of the transmitter.

The next most common failure case is battery related. The capacity of the Syma’s battery

puts an effective limit of five to six minutes on any individual control run. It takes a further

half hour to charge the helicopter back up. After a few dozen runs, the battery life begins to

diminish significantly and the helicopter will no longer fly in free air without a replacement

battery. This is an issue for education systems using this technology, requiring replacement

batteries to be installed after a period of time.

Another common cause of failure is on the start. The reference control system does not

handle takeoff, but instead relies on the user to place the helicopter in the camera view.

This is an artifact of early testing which indicated that effective takeoffs were difficult to

execute without the helicopter drifting out of command range. The control system has no

notion of whether the helicopter is being held by a person, so integral and derivative actions

keep functioning. When the helicopter is finally released, the accumulated error due to these

actions can cause the helicopter to quickly fly up out of range or fall out of the sky. This is

another major section for future work.

The last failure mode is fairly rare with clean backgrounds, but can be an issue with

complicated ones. As discussed in the discussion of the vision component’s performance, the

helicopter can fly into a region of an image that makes marker detection difficult. In cases like

these, tracking is often lost at least temporarily. The system responds by repeating previous

commands for a period of one second, and often that is enough to move the helicopter out of

the difficult region. Other times, however, it is not enough and the system tells the helicopter

to fall. In the latter cases, the resulting motion is often too unstable to effectively control.

6.5 Currie’s Thesis

Late in the process of work on this thesis, a closely related work by Sarah Currie at the

University of Rhodes in South Africa was discovered [34]. Currie, as part of her honors

bachelor degree, developed an “autopilot” for the Syma S107G helicopter using a Microsoft

Kinect and an Arduino for signal transmission. This section provides critical analysis of the

78

differences between Currie’s work and this one.

In many ways, the two works are very similar. Both use an external sensing model, the

software for both is built on OpenCV, both use markers to detect rotation, and both appear

to use PID feedback controllers to achieve control. It must be noted that Currie’s thesis is

vague in some respects, especially in regard to implementation details of the feedback control

system and the performance characteristics of the resulting system. Therefore, this section

restricts itself to commentary on the visual tracking methodology employed by Currie.

The most prominent difference between Currie’s approach and the one taken by this work

is the method by which the visual tracking component operates. Currie tracks the helicopter’s

pose using two LEDs, a red one on the nose of the helicopter and a white one on the tail.

The red LED is part of the standard Syma helicopter, except that it is modified to not blink.

The white LED is an external component and is wired onto the Syma’s controlling circuitry.

Currie then uses a Kinect depth camera to first identify the two LEDs in an image stream

(using the RGB component of each image), and next, uses the depth map of the camera to

find the depth of a point between the two LEDs. Rotation was approximately recovered by

multiplying the pixel distance between LEDs by the distance to the helicopter in millimeters

as measured by the Kinect. The resulting value was then divided by 10000 to produce the

final “orientation value” where a value of 6.5-7.0 represented a normal rotation to the camera.

Currie achieves tracking by first thresholding an RGB image to find only the very bright

regions of the image which are assumed to be the LEDs. The binary image produced by the

thresholding operation is then run through an external blob tracking library, identifying each

blob and calculating its properties. The front and back LEDs are differentiated by sampling

the location of each blob in the original image and looking for those that are predominantly

red or blue (Currie notes that the white LED looks blue, likely due to white balance). To

improve the reliability of tracking, Currie applied the CamShift method, discussed in this

thesis’ literature review of this thesis, on the body of the helicopter to calculate a window in

which to use the threshold operation.

Currie’s active marker setup is simple and elegant. It imposes a negligible penalty to

weight and is unlikely to disturb any of the normal dynamics of the helicopter it is attached

to. Furthermore, the active marker should work well under almost any light conditions,

including low or no light situations. This thesis’ passive approach can be very sensitive to

79

light levels with some tunings and is certainly not capable of working in low or no light

environments. Currie’s use of depth cameras gives another dimension that provides useful

information for vision tracking and helicopter pose estimation.

Currie’s methodology poses some difficulties for the purposes of an educational tool. It

requires modification of the helicopter’s internal circuitry, and comparatively expensive equip-

ment, such as a depth camera, to operate. The use of an active marker also means that the

marker must carry a power source or tap into an existing one on whatever platform is being

used. This limits the general usefulness of this method.

From a performance standpoint, Currie’s tracking methodology is significantly slower than

the approach of this thesis. Currie states that a single frame took approximately 28 mil-

liseconds to process on a modern desktop processor, largely due to the expensive CamShift

windowing operation. For comparison, the tracking process in this thesis takes approximate

six to 8 milliseconds. Because of Currie’s use of brightness as the distinguishing factor for the

LEDs, bright backgrounds can produce false positives. Of course, this thesis’ approach fails

when color backgrounds are overly complex.

In general, Currie’s approach to sensing may be more robust for specific applications largely

due to the use of a Kinect depth camera. The approach this thesis, however, is arguably more

available and more practical because it runs faster, requires less expensive hardware, and

requires no external power source.

6.6 Concluding Thoughts

This chapter has presented a series of tests to characterize the performance and behavior of

the entire Syma control system. The software developed in this thesis was profiled for time

and space usage. The vision system’s ability to detect rotation was characterized, as was the

ability of the tracking mechanism to operate with high marker velocities. To test the entire

system, the control system was used to hover a helicopter for approximately six minutes until

the helicopter’s battery was out of charge.

80

Chapter 7

Conclusion

This thesis has presented a control system composed of arts and craft supplies, a toy helicopter,

hobbyist electronics, and supporting software that, when combined with a laptop, allows one

to autonomously and programatically fly a micro air vehicle for approximately $40 USD. The

system uses a Styrofoam colored ring, the surface of which is painted with three equally sized

color segments, affixed to the bottom of a Syma S107G helicopter combined with purpose

made blob tracking software to identify the position and pose of the helicopter in a stream of

images from a consumer web camera. In order to detect the helicopter reliably and quickly,

two complementary methods of blob tracking were composed to produce the final search

method. The first blob tracker uses an estimate of the helicopter’s position based on previous

values to estimate where the system will be and looks for it in that region of the image. The

second blob search method uses a simple but computationally expensive image thresholding

technique to globally search the image for blobs. Blobs are then combined to find possible

candidate points and a selection algorithm picks the most likely candidate as the final pose

estimation.

Pose information from this tracking component is then fed into a set of PID controllers

which are used to stabilize and control the position of the system in real space. These feedback

controllers keep an estimation of the state of the system that is low pass filtered to provide

robustness to noise and also includes state data such as velocity and acceleration. Limited

three dimensional control is achieved by using the height of the color ring, or fiducial marker,

to normalize pixel errors into the physical coordinate system of the helicopter.

81

The resulting control system is capable of hovering the helicopter at a depth of eight feet

and a given (x,y) setpoint for the entire length of the helicopter’s battery life with oscillations

of no more than four inches on average. By manipulating the setpoint, the helicopter can be

programatically controlled in the camera view.

Existing micro air vehicles tend to be research focused and use expensive hardware plat-

forms and expensive sensing equipment. This contribution of this thesis is a very inexpensive

micro air vehicle system that is particularly suitable for STEM education and curious hobby-

ists. The software is designed in a modular fashion and documented with intent of students

hacking on it to learn more about its components and to extend control to new platforms.

7.1 Future Work

The control system developed as part of this thesis has demonstrated the ability to stabilize a

Syma S107G helicopter for a noticeable period of time, but there remains many areas where

improvements can be made to make the entire system faster, more reliable, easier to hack,

and less expensive.

7.1.1 New Hardware Platforms

The most obvious avenue for extending the work done in this thesis is to use the provided

framework to provide control for new hardware platforms. Other three channel IR toy he-

licopter models are excellent candidates because their physics are all very similar and only

the transmission layer would need significant modification. Extending control to other IR

helicopters would be an excellent exercise for students learning about digital signaling and

control systems.

RC platforms, especially small quad-rotor vehicles, would take more work to make them

operational, but offer the benefit of having an additional input channel to counteract lateral

motion with. The transmission device would require the use of a radio transmitter and

knowledge of the quad-rotor’s control protocol. Fortunately, it appears that many small

quad-rotor design share similar protocols, such as the FlySky protocol.

Departing from flying vehicles, an exciting use of the tracking system developed in this

thesis would be for use with ground vehicles. These vehicles could be remote controlled toys

82

or custom made robots, and because of their comparatively stable physics, they could likely be

controlled far more readily. A controller of this nature might be useful in a robot competition

for autonomously controlling a robot relative to a base station (by having the marker on the

robot and camera on the base station), or controlling a robot based on a known beacon point

(where the marker is on the beacon and camera on the robot).

7.1.2 Improved Marker Design

The vision based tracking component of the reference control system also has several obvious

avenues for improvement. These improvements are hardware based, involving modification

of the fiducial marker that is attached to the tracked device, and software based, involving

changes to the nature of the software algorithms used to detect the marker.

One of negative consequences of using the existing marker design is that it is fairly bulky

and easily destabilizes the delicate balance of micro-helicopters if it is not carefully attached.

For an educational product aimed at middle school children in a classroom, it may be an

unrealistic assumption that the style of Styrofoam rings used in this thesis will be consistently

mounted. An improved marker design would greatly improve the usability of the system from

a controls and usability standpoint. An improved marker would ideally be lighter, perhaps by

perforating the ring or constructing it from wires, and would have a mechanism for consistent

mounting. As an example, a new ring could be constructed with a wireframe and hardpoints

specifically measured to mount to a Syma S107G helicopter.

From a tracking robustness standpoint, the reference fiducial marker design might be

improved by adding a fourth color. This change would help to eliminate the “dead zones”

at 0, 120, and 240 degrees where only a single color is realistically detectable and improve

rotation estimation at a distance. Two or three colors should always be visible with this new

configuration. This would come at the cost of complexity in the software, however, where

four colors must be detected, analyzed, and combined to form marker candidates. It may also

be the case that, by making each color blob smaller, accuracy is negatively affected. Some

of these issues could be counteracted through improvements to the software of the vision

tracking software.

83

7.1.3 Tracking Algorithm Improvements

The vision tracking software, as is, also has a fair amount of room for improvement and

optimization. While the software was designed with good programming practices in mind,

it has not been highly optimized. Many of the tracking components operations, such as

blob expansion, do not modify any of their input data and do not rely on the results of

other operations. Thus, there is great opportunity for parallelizing many sections of the

existing code on platforms that support multithreading. On the other hand, the end result of

optimizations of this nature is often much more complicated source code. For an educational

system, the risks may well outweigh the rewards.

The application of a “windowing” function to the input image is a less hardware spe-

cific optimization with the possibility of improving both accuracy and speed in the tracking

software. This windowing function would extract from the input image a subregion that rep-

resented the most likely position of the helicopter, plus some buffer. This concept is distinct

from the by-estimate blob expansion method because it applies not only to color searching

but also to all pre and post-processing steps that must be done to an image. Only this “re-

gion of interest” would be subject to expensive operations like color space conversion, color

thresholding, smoothing, and contour detection.

These optimizations may be necessary to effectively run this system on high resolution

cameras. A user with a 1080p web camera has almost seven times as many pixels to process

as a user with a standard 640 by 480 pixel camera. This translates a factor of seven increase

in computation time for many of the algorithms used in this thesis. Without significant

optimization, images from these cameras must be scaled down to keep the system running in

real time.

In the case of no prior history of the helicopter’s position, this region would be expanded

to match the dimensions of the image. Thus, the windowing operation would convert all of

the search techniques employed into local searches, possibly greatly improving performance.

The video controller’s implementation of the by-threshold method uses a rather primitive

technique of color thresholding. It performs two binary threshold operations on each channel

of the input image, an expensive technique. Moreover, this technique must perform more

operations to handle hues in the red region of the color circle because their values wrap from

84

0 to 180 degrees. As discussed in the literature review, histogram based tracking using back

projections have the potential to improve tracking speed and accuracy, and make the program

code simpler and easier to understand. This technique is no panacea, however, as it requires

its own set of post processing steps and only profiling can determine if it is truly a benefit.

7.1.4 Control Algorithm Improvements

There is great opportunity for future work in altering the reference control system to improve

control of Syma helicopters. A more intelligent method for state estimation that incorporates a

dynamic model of the controlled system, perhaps some variety of Kalman Filter, would benefit

control by allowing improved control during spotty detection. Most interesting, perhaps,

would be the design of a controller that attempts to control for helicopter movement along

the Z axis by manipulating rotation and pitch.

7.1.5 Signal Transmission Device Improvements

The existing solution for transmitting calculated digital signals to the helicopter involves

the use of an Arduino microcontroller, an LED, and a USB cable. This hardware costs

approximately $30 USD, largely due to the Arduino, more than the $20 USD approximate

price of the helicopter itself. Less expensive pre-made hardware exists that might fit the

requirements for ease of use and consistency. If this hardware will only be used in conjunction

with this control system, any device that saves cost and does not compromise performance is

very appealing.

7.1.6 Education Modules

Most of the future work identified by this thesis thus far has been some form of technical

improvement to the system. However the real value of this work lies in the educational

opportunities that it provides. A variety of “educational modules” can be built on top the

existing components.

For middle school age children, just running the entire control system will be a learning

exercise. Together, with some of the tools developed as part of this thesis, such as a PID

tuner and data logger, students can be introduced to the concept of PID controllers. By ma-

85

nipulating gain values and examining the resulting behavior of the helicopter with analytical

tools and their eyes, students can gain an intuitive understanding of control systems.

Extending control to new platforms is a promising exercise for more advanced students.

This task will challenge students to investigate and implement concepts related to digital

signal transmission. In the course of this investigation, students will likely learn not only

about digital signals, but the process of sniffing out and replicating signals. Armed with this

knowledge, students will be able to immediately interface with a huge variety of devices in

their daily lives including many air conditioners, televisions, and cameras.

There are also many opportunities for students to learn about foundational concepts in

computer vision and feedback control. A computer vision module might introduce students

to the concepts of digital images and their encodings, image thresholding, image filtering, and

ultimately, blob detection. Each concept could be introduced in a brief lesson and supported

with documentation and examples from OpenCV. A feedback control module could discuss

concepts such as error and numerical methods for integrals and derivatives.

86

Chapter 8

Bibliography

[1] Amazon prime air. http://www.amazon.com/b?node=8037720011.

[2] Amazon retail page for parrot ar drone 2.0 quadcopter. http://www.amazon.com/Parrot-AR-Drone-Quadricopter-Controlled-Android/dp/B007HZLLOK.

[3] Arduino. http://www.arduino.cc/.

[4] Ascending technologies research price list.http://www.asctec.de/downloads/flyer/AscTec RESEARCH Pricelist.pdf.

[5] Asus xtion pro. http://www.asus.com/Multimedia/Xtion PRO/.

[6] Digispark usb development board. http://digistump.com/category.

[7] First robotics competition. http://www.usfirst.org/.

[8] Helicopter Flying Handbook. http://www.faa.gov/regulations policies/handbooks manuals/aviation/helicopter flying handbook/media/hfh ch13.pdf.

[9] Kiss institute for practical robotics. http://www.kipr.org/.

[10] Kiss institute for practical robotics. http://www.kipr.org/hardware-software.

[11] Lego mindstorm. http://www.mindstorms.lego.com/.

[12] Linux uvc supported devices. http://www.ideasonboard.org/uvc/#devices.

[13] Oksde botball grant. http://ok.gov/sde/oksde-botball-grant.

[14] Opencv. http://opencv.org/.

[15] Opencv basic thresholding operations.http://docs.opencv.org/doc/tutorials/imgproc/threshold/threshold.html.

[16] Opencv documentation on structural analysis and shape descriptor. http://docs.opencv.org/modules/imgproc/doc/structural analysis and shape descriptors.html.

[17] Opencv morphology transformations. http://docs.opencv.org/doc/tutorials/imgproc/opening closing hats/opening closing hats.html.

[18] Outfilming film production and aerial cinematography. http://www.outfilming.com/.

87

http://www.amazon.com/b?node=8037720011

http://www.amazon.com/Parrot-AR-Drone-Quadricopter-Controlled-Android/dp/B007HZLLOK

http://www.amazon.com/Parrot-AR-Drone-Quadricopter-Controlled-Android/dp/B007HZLLOK

http://www.arduino.cc/

http://www.asctec.de/downloads/flyer/AscTec_RESEARCH_Pricelist.pdf

http://www.asus.com/Multimedia/Xtion_PRO/

http://digistump.com/category

http://www.usfirst.org/

http://www.faa.gov/regulations_policies/handbooks_manuals/aviation/helicopter_flying_handbook/media/hfh_ch13.pdf

http://www.faa.gov/regulations_policies/handbooks_manuals/aviation/helicopter_flying_handbook/media/hfh_ch13.pdf

http://www.kipr.org/

http://www.kipr.org/hardware-software

http://www.mindstorms.lego.com/

http://www.ideasonboard.org/uvc/#devices

http://ok.gov/sde/oksde-botball-grant

http://opencv.org/

http://docs.opencv.org/doc/tutorials/imgproc/threshold/threshold.html

http://docs.opencv.org/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html

http://docs.opencv.org/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html

http://docs.opencv.org/doc/tutorials/imgproc/opening_closing_hats/opening_closing_hats.html

http://docs.opencv.org/doc/tutorials/imgproc/opening_closing_hats/opening_closing_hats.html

http://www.outfilming.com/

[19] Rcgroups.com syma s107 helicopter discussion.http://www.rcgroups.com/forums/showthread.php?t=1176146.

[20] Stanford university autonomous helicopter. http://heli.stanford.edu/.

[21] Teensy usb development board. https://www.pjrc.com/teensy/.

[22] Stem-c partnerships: Computing education for the 21st century, December 2013. http://www.nsf.gov/publications/pub summ.jsp?WT.z pims id=503582&ods key=nsf14523.

[23] Amazon. Estes 4606 proto x nano r/c quadcopter.http://www.amazon.com/Estes-Proto-Quadcopter-Colors-Black/dp/B00G924W98.

[24] Amazon. Syma s107/s107g r/c helicopter.http://www.amazon.com/Syma-S107-S107G-Helicopter-Colors/dp/8499000606.

[25] Pedram Azad. Visual Perception for Manipulation and Imitation in Humanoid Robots.,volume 4 of Cognitive Systems Monographs. Springer, 2009.

[26] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. Speeded-up robustfeatures (surf). Comput. Vis. Image Underst., 110(3):346–359, June 2008.

[27] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition usingshape contexts. IEEE Trans. Pattern Anal. Mach. Intell., 24(4):509–522, April 2002.

[28] Dr. Gary Rost Bradski and Adrian Kaehler. Learning Opencv, 1st Edition. O’ReillyMedia, Inc., first edition, 2008.

[29] Gary R. Bradski. Computer vision face tracking for use in a perceptual user interface,1998.

[30] Andreas Breitenmoser, Laurent Kneip, and Roland Siegwart. A monocular vision-basedsystem for 6d relative robot localization. pages 79–85, 2011.

[31] John Canny. A computational approach to edge detection. Pattern Analysis andMachine Intelligence, IEEE Transactions on, PAMI-8(6):679–698, Nov 1986.

[32] Alvaro Collet Romea and Siddhartha Srinivasa. Efficient multi-view object recognitionand full pose estimation. In 2010 IEEE International Conference on Robotics andAutomation (ICRA 2010), May 2010.

[33] D. Comaniciu and P. Meer. Mean shift: a robust approach toward feature spaceanalysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on,24(5):603–619, May 2002.

[34] S. A. Currie. Auto-pilot:autonomous control of a remote controlled helicopter. Master’sthesis, Rhodes University, 2012.http://www.cs.ru.ac.za/research/g09C0298/index.html.

[35] Daniel F. DeMenthon and Larry S. Davis. Model-based object pose in 25 lines of code.International Journal of Computer Vision, 15:123–141, 1995.

[36] James Diebel. Representing attitude: Euler angles, unit quaternions, and rotationvectors, 2006.

[37] Gang Feng. A survey on analysis and design of model-based fuzzy control systems.Fuzzy Systems, IEEE Transactions on, 14(5):676–697, Oct 2006.

88

http://www.rcgroups.com/forums/showthread.php?t=1176146

http://heli.stanford.edu/

https://www.pjrc.com/teensy/

http://www.nsf.gov/publications/pub_summ.jsp?WT.z_pims_id=503582&ods_key=nsf14523

http://www.nsf.gov/publications/pub_summ.jsp?WT.z_pims_id=503582&ods_key=nsf14523

http://www.amazon.com/Estes-Proto-Quadcopter-Colors-Black/dp/B00G924W98

http://www.amazon.com/Syma-S107-S107G-Helicopter-Colors/dp/8499000606

http://www.cs.ru.ac.za/research/g09C0298/index.html

[38] Mike Field. Fpga heli, March 2012.http://hamsterworks.co.nz/mediawiki/index.php/FPGAheli.

[39] David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach. PrenticeHall Professional Technical Reference, 2002.

[40] Robin Hewitt. How opencv’s face tracker works.http://www.cognotics.com/opencv/servo 2007 series/part 3/index.html.

[41] S. Jayawardena, M. Hutter, and N. Brewer. A novel illumination-invariant loss formonocular 3d pose estimation. In Digital Image Computing Techniques andApplications (DICTA), 2011 International Conference on, pages 37–44, Dec 2011.

[42] David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J.Comput. Vision, 60(2):91–110, November 2004.

[43] Masayoshi Matsuoka, Alan Chen, Surya P. N. Singh, Adam Coates, Y. Ng, andSebastian Thrun. Autonomous helicopter tracking and localization using aself-surveying camera array. The International Journal of Robotics Research, 26, 2007.

[44] Nathan Michael, D. Mellinger, Q. Lindsey, and V. Kumar. The grasp multiplemicro-uav testbed. Robotics Automation Magazine, IEEE, 17(3):56–65, Sept 2010.

[45] Microsoft. Microsoft kinect. http://www.xbox.com/en-US/kinect.

[46] David Miller, Anne Wright, Randy Sargent, Rob Cohen, Teresa Hunt, and Y Sargent.Attitude and position control using real-time color tracking, 1997.

[47] Katja Nummiaro, Esther Koller-meier, and Luc Van Gool. Color features for trackingnon-rigid objects. Special Issue on Visual Surveillance, Chinese Journal of Automation,May 2003, 29:345–355, 2003.

[48] Katsuhiko Ogata. Modern Control Engineering. Prentice Hall PTR, Upper SaddleRiver, NJ, USA, 4th edition, 2001.

[49] Rapporteur Planning Committee for the Convocation on Rising Above the GatheringStorm: Two Years Later, Thomas Arrison. Rising Above the Gathering Storm TwoYears Later: Accelerating Progress Toward a Brighter Economic Future. Summary of aConvocation. The National Academies Press, 2009.

[50] Vidya Raju. Modeling and control of rc miniature coaxial. Master’s thesis, ETHZurich, 2011.

[51] S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar. Vision-based state estimation forautonomous rotorcraft mavs in complex environments. pages 1758–1764, May 2013.

[52] Syma Toys. Syma x1 4 ch remove control quad copter.http://www.symatoys.com/product/show/1878.html.

[53] Tinne Tuytelaars and Krystian Mikolajczyk. Local invariant feature detectors: Asurvey. Found. Trends. Comput. Graph. Vis., 3(3):177–280, July 2008.

[54] Agustin Vergottini. Arduino helicopter infrared controller. Blogger, May 2011.http://www.avergottini.com/2011/05/arduino-helicopter-infrared-controller.html.

[55] Vicon. Bonita, affordable motion capture. http://www.vicon.com/System/Bonita.

[56] Vicon. Vicon MX Hardware System Reference, 1.4 edition, 2006.

89

http://hamsterworks.co.nz/mediawiki/index.php/FPGAheli

http://www.cognotics.com/opencv/servo_2007_series/part_3/index.html

http://www.xbox.com/en-US/kinect

http://www.symatoys.com/product/show/1878.html

http://www.avergottini.com/2011/05/arduino-helicopter-infrared-controller.html

http://www.vicon.com/System/Bonita

[57] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: A survey. ACMComput. Surv., 38(4), December 2006.

[58] Quming Zhou and J. K. Aggarwal. Object tracking in an outdoor environment usingfusion of features and cameras. Image Vision Comput., 24(11):1244–1255, 2006.

[59] Z. Zivkovic. Improved adaptive gaussian mixture model for background subtraction. InPattern Recognition, 2004. ICPR 2004. Proceedings of the 17th InternationalConference on, volume 2, pages 28–31 Vol.2, Aug 2004.

90

Appendix A

User Manual

This appendix presents a brief overview of how to get started flying Syma helicopters using

the control system developed in this thesis. This guide is further broken down into the

following sections:

1. this introduction,

2. an installation guide,

3. an enumeration of the proposed software application programming interface (API),

and

4. a brief guide for those curious about diving deeper into the code.

A.1 Installation

The system developed in this thesis is composed of three individual components which must

be installed or constructed:

1. the visual marker for the helicopter,

2. the micro-controller based signal transmitter, and

3. the tracking software.

91

A.1.1 Tracking Software Compilation

The software compilation and dependencies will vary for each operating system the system

is run on. Please refer to the README.md file that accompanies your distribution of this

file for detailed information about how to build the software from scratch. This project

depends on the OpenCV computer vision libraries and has been tested with version 2.4.

Refer to http://opencv.org/ for more information on obtaining this library. Finally, the

project also depends on a serial library to provide communication between the host

computer and the transmitting device. A library that is functional with Unix based systems

is provided. A Windows operating system version is in development.

Users may elect to forgo the process of building software themselves and use the KISS

Platform IDE, http://www.kipr.org/kiss-platform-windows. This IDE ships with OpenCV

and cross platform libraries for accomplishing various tasks, including serial communication.

Using this IDE, you must import the KISS project file included in your distribution of the

software and compile.

A.1.2 Micro-Controller Setup and Programming

The micro-controller used to implement the transmission device is the Arduino Uno. To

prepare the Arduino for use with this project, visit http://www.arduino.cc/ and download

the latest edition of the Arduino IDE. This IDE comes with libraries necessary to interact

with and program the Arduino over a USB connection. Once installed, follow the

instructions located in README.md file that accompanies the software to program the

Arduino. After the device is programmed, one may insert an IR LED into the appropriate

pin on the micro-controller to complete the transmitter setup.

A.1.3 Visual Marker Construction

The final stage of preparation involves constructing the fiducial marker to attach to the

helicopter. Required materials include: a Styrofoam cup with an outermost diameter of

approximately 3.5 inches, three different colors of construction paper (ideally far apart on

the color wheel, e.g., yellow, red, and blue), and hot-glue.

1. Remove the top, “lip” portion of the Styrofoam cup using scissors or a knife.

92

http://opencv.org/

http://www.kipr.org/kiss-platform-windows

http://www.arduino.cc/

2. Measure the height and diameter of the ring, and calculate the approximate

circumference.

3. Place the construction paper in a single stack and on the top layer mark a rectangle

with the appropriate height and width to cover one third of the ring. It is better to

error on the side of too long.

4. Cut out all of the rectangles at once.

5. Carefully fold the color paper around the outside surface of the ring. Check to make

sure the dimensions are correct before proceeding.

6. If the lengths are too long, carefully trim all three of the colors at once until the fit the

right.

7. Place a small strip of hot glue under the end of each color strip and affix them to the

marker.

8. Orient the ring the correct manner for the desired color configuration.

9. Place the helicopter inside the ring and align it such that the front color’s middle

point is aligned with the nose of the helicopter.

10. Glue the marker to the helicopter at the nose and at the two supporting beams on the

rear of the helicopter.

A.1.4 Running the System

Once each of the software, transmitter, and color marker are properly installed and

configured, the system is ready to run. To begin, load the example application code and

compile it with the appropriate makefile or KISS IDE. The details of the API are given in

the following section.

Before starting the application, configure your operational space. The system should ideally

be used in a bright environment with a plain background of dissimilar color to the marker.

The next step is to train the system on the marker under these conditions. To do so, build

and run the “Syma Configure” application and follow the instructions displayed on the

console. This process will generate a configuration file which contains information about the

93

colors on your marker and the physical properties of your camera. After training, the

system is ready to run.

Place the camera on an elevated surface, such as a table, so that is of similar elevation to

the desired helicopter position. Place the transmitter under the starting position of the

helicopter and connect it to the host computer with a USB cable. When this is

accomplished, build and run the sample application.

1. When the application begins, it will begin searching for the helicopter marker.

2. Turn on the helicopter and hold it in the center of the camera image for a second. The

tracking system will begin issuing commands.

3. When the helicopter blades start spinning, let go of the helicopter.

4. The helicopter can then be manipulated by clicking on the camera image to change

the setpoint.

A.2 API

The software that powers the tracking engine, control system, and transmission device is

programmed in C++ using object-oriented programming techniques. Users who would like

to program to these interfaces, please skip to the next section.

For educational purposes, the system also comes with a simpler, C based API for those

interested in getting started quickly. Listed below are the function prototypes for this API

and associated descriptions.

Listing A.1: Simple Control API

typedef struct {

int x;

int y;

int width;

int height;

float yaw;

} syma_state_t;

94

/*

Starts the tracking system.

configFile - the name of the configuration file produced by the config program

serialPort - the name of the serial port that the transmitter is connected to

useInternalCamera - if 1, the program takes ownership of the camera and must

be updated with syma_refresh(). If 0, user is responsible for providing

images via the syma_refresh_by_image() function.

Returns - 1 on success, 0 on failure

*/

int syma_start(const char* configFile, const char* serialPort, int

useInternalCamera);

/*

Releases resources and closes any open connections


*/

int syma_stop(void);

/*

Manually issue a command to the helicopter

pitch - full forward = 0, full backward = 127, neutral = 64

yaw - full right = 0, full left = 127, neutral = 64

throttle = min throttle = 0, max throttle = 127


*/

int syma_send_signal(unsigned char pitch, unsigned char yaw, unsigned char throttle);

/*

Updates camera image, tracks, runs control logic, and issues command.

This pulls a camera image from the iternal camera object. If compiled with

KISS, the camera object singleton from libkovan is used. If syma_start

was not called with useInternalCamera set to true, this method

immediately returns a failure.

95


*/

int syma_refresh(void);

/*

Same as syma_refresh() but the user must specificy a pointer to the image

data in a BGR888 format. Used with the syma_start() method with the

useInternalCamera flag set to false.


*/

int syma_refresh_by_image(const unsigned char* data, int width, int height);

/*

Retrieves state information from last tracking update.

Returns - struct with info about helicopter state. Position is -1, -1 if

tracking failed.

*/

syma_state_t syma_get_state(void);

/*

Sets the target position of the helicopter

target - state struct with information about desired position of the

helicopter. May not activally control on all channels. See documentation.

Returns - 1 if the target was valid, 0 if not

*/

int syma_set_target(syma_state_t target);

/*

Set the pixel position target of the helicopter. Infers desired yaw to be 90

degrees from the camera.

x - x pixel position target

y - y pixel position target

Returns - 1 if the target was valid, 0 if not

96

*/

int syma_set_target_pos(int x, int y);

/*

Retrieves the width of the current image in pixels

Returns - value greater than zero on success, 0 on failure

*/

int syma_get_cam_width(void);

/*

Retrieves the height of the current image in pixels

Returns - value greater than zero on success, 0 on failure

*/

int syma_get_cam_height(void);

Using this API, the helicopter can be stabilized at a set yaw in the center of the image with

the following simple program.

#include <stdio.h>

#include "syma_simple.h"

int main(void)

{

// Start the tracking library and check to make sure it

// succeeded

if(syma_start("confix.txt", "/dev/ttyUSB0", 1) == 0) {

fprintf(stderr, "Failed to initialize library\n");

return 1;

}

// Set initial target

syma_set_target_pos(syma_get_cam_width()/2, syma_get_cam_height()/2);

while(1) {

// Pull camera image and track

97

syma_refresh();

// Get information about state

syma_state_t state = syma_get_state();

// Check to see if tracking was lost

if (state.x < 0 || state.y < 0) {

break;

}

}

// Cleanup

syma_stop();

return 0;

}

A.3 Hacking

To jump into the inner workings of the existing library, please consult the following files:

SymaEngine.cpp/hpp These files define the logic behind the tracking system. This is an

excellent place to start if you are curious about modifying the tracking logic.

SymaEngineSearch.cpp defines the methods responsible for composing blobs and

estimating pose.

SymaController.cpp/hpp These files define the logic behind the master controller. The

files SymaPid.cpp/hpp implement the default logic for each of the PID controllers.

SymaStateEstimator.cpp/hpp These files define the logic for filtering the output of the

tracking system before being forwarded to the controllers.

SymaMatUtils.cpp/hpp and SymaBlobSearch.cpp/hpp define the helper utilities

that are used to directly manipulate and process images. The by-threshold and

by-estimate implementations are defined in these files respectively.

SymaStrategy.hpp This file defines a virtual class, or interface, for control logic. If you

are interested in implementing your own control logic to work with the existing

interface, this is a great place to start.

98

These files constitute the core logic of the tracking and control system. Other files provide

support.

99

a low cost, vision based micro helicopter system for ...c3p0.ou.edu/irl/theses/meyer-ms.pdfthis...

Documents