computer vision system - arxiv

25
A Vision-Based Control Method for Autonomous Landing of Vertical Flight Aircraft On a Moving Platform Without Using GPS Bochan Lee Graduate Research Assistant Texas A&M University College Station, TX, USA Vishnu Saj Graduate Student Texas A&M University College Station, TX, USA Moble Benedict Assistant Professor Texas A&M University College Station, TX, USA Dileep Kalathil Assistant Professor Texas A&M University College Station, TX, USA ABSTRACT The paper discusses a novel vision-based estimation and control approach to enable fully autonomous tracking and landing of vertical take-off and landing (VTOL) capable unmanned aerial vehicles (UAVs) on moving platforms with- out relying on a GPS signal. A unique feature of the present method is that it accomplishes this task without tracking the landing pad itself; however, by utilizing a standardized visual cue installed normal to the landing pad and parallel to the pilot’s/vehicle’s line of sight. A computer vision system using a single monocular camera is developed to detect the visual cue and then accurately estimate the heading of the UAV and its relative distances in all three directions to the landing pad. Through comparison with a Vicon-based motion capture system, the capability of the present vision system to measure distances in real-time within an accuracy of less than a centimeter and heading within a degree with the right visual cue, is demonstrated. A gain-scheduled proportional integral derivative (PID) control system is integrated with the vision system and then implemented on a quad-rotor-UAV dynamic model in a realistic simulation program called Gazebo. Extensive simulations are conducted to demonstrate the ability of the controller to achieve robust tracking and landing on platforms moving in arbitrary trajectories. Repeated flight tests, using both stationary and moving platforms are successfully conducted with less than 5 centimeters of landing error. NOTATION A Camera intrinsic matrix c u Corner position-column c v Corner position-row f x Focal length of camera in horizontal direction f y Focal length of camera in vertical direction K D Derivative gain K I Integral gain K P Proportional gain R Rotation matrix R basic Basic rotation matrix R modi f ied Modified rotation matrix s Scaling factor t Translation vector u Image pixel position-column u 0 Center of image-column v Image pixel position-row v 0 Center of image-row X Sideward relative distance, meter Y Vertical relative distance, meter Z Forward relative distance, meter α Relative yaw angle, rad β Relative pitch angle, rad γ Relative roll angle, rad θ Aircraft pitch angle, deg ρ u Parameter to convert pixels to SI units-column ρ v Parameter to convert pixels to SI units-row φ Aircraft roll angle, deg ψ Aircraft yaw angle, deg f (u, v) Image gradient INTRODUCTION Fully autonomous approach and landing of vertical take- off and landing (VTOL) capable unmanned aerial vehicles (UAVs) on moving platforms without relying on GPS is an extremely challenging task. In order to accomplish this, the flight controller has to first perceive the motion of the plat- form using machine vision, then find the relative distances, velocities, and heading of the aircraft with respect to the plat- form and then control the aircraft so that it can keep up with the platform motion and finally execute a touch-down. Such a maneuver is demanding even for human pilots and therefore, its automation is of significant interest whether the aircraft is manned or unmanned. Out of all the possible moving platforms, the most difficult one to land on is a small ship at sea due to the limited land- 1 arXiv:2008.05699v3 [cs.RO] 17 Aug 2020

Upload: others

Post on 03-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMPUTER VISION SYSTEM - arXiv

A Vision-Based Control Method for Autonomous Landing ofVertical Flight Aircraft On a Moving Platform Without Using GPS

Bochan LeeGraduate Research Assistant

Texas A&M UniversityCollege Station, TX, USA

Vishnu SajGraduate Student

Texas A&M UniversityCollege Station, TX, USA

Moble BenedictAssistant Professor

Texas A&M UniversityCollege Station, TX, USA

Dileep KalathilAssistant Professor

Texas A&M UniversityCollege Station, TX, USA

ABSTRACTThe paper discusses a novel vision-based estimation and control approach to enable fully autonomous tracking andlanding of vertical take-off and landing (VTOL) capable unmanned aerial vehicles (UAVs) on moving platforms with-out relying on a GPS signal. A unique feature of the present method is that it accomplishes this task without trackingthe landing pad itself; however, by utilizing a standardized visual cue installed normal to the landing pad and parallelto the pilot’s/vehicle’s line of sight. A computer vision system using a single monocular camera is developed to detectthe visual cue and then accurately estimate the heading of the UAV and its relative distances in all three directions tothe landing pad. Through comparison with a Vicon-based motion capture system, the capability of the present visionsystem to measure distances in real-time within an accuracy of less than a centimeter and heading within a degreewith the right visual cue, is demonstrated. A gain-scheduled proportional integral derivative (PID) control system isintegrated with the vision system and then implemented on a quad-rotor-UAV dynamic model in a realistic simulationprogram called Gazebo. Extensive simulations are conducted to demonstrate the ability of the controller to achieverobust tracking and landing on platforms moving in arbitrary trajectories. Repeated flight tests, using both stationaryand moving platforms are successfully conducted with less than 5 centimeters of landing error.

NOTATION

A Camera intrinsic matrixcu Corner position-columncv Corner position-rowfx Focal length of camera in horizontal directionfy Focal length of camera in vertical directionKD Derivative gainKI Integral gainKP Proportional gainR Rotation matrixRbasic Basic rotation matrixRmodi f ied Modified rotation matrixs Scaling factort Translation vectoru Image pixel position-columnu0 Center of image-columnv Image pixel position-rowv0 Center of image-rowX Sideward relative distance, meterY Vertical relative distance, meterZ Forward relative distance, meterα Relative yaw angle, radβ Relative pitch angle, rad

γ Relative roll angle, radθ Aircraft pitch angle, degρu Parameter to convert pixels to SI units-columnρv Parameter to convert pixels to SI units-rowφ Aircraft roll angle, degψ Aircraft yaw angle, deg∇ f (u,v) Image gradient

INTRODUCTION

Fully autonomous approach and landing of vertical take-off and landing (VTOL) capable unmanned aerial vehicles(UAVs) on moving platforms without relying on GPS is anextremely challenging task. In order to accomplish this, theflight controller has to first perceive the motion of the plat-form using machine vision, then find the relative distances,velocities, and heading of the aircraft with respect to the plat-form and then control the aircraft so that it can keep up withthe platform motion and finally execute a touch-down. Such amaneuver is demanding even for human pilots and therefore,its automation is of significant interest whether the aircraft ismanned or unmanned.

Out of all the possible moving platforms, the most difficultone to land on is a small ship at sea due to the limited land-

1

arX

iv:2

008.

0569

9v3

[cs

.RO

] 1

7 A

ug 2

020

Page 2: COMPUTER VISION SYSTEM - arXiv

ing space, six degrees of freedom ship deck motions, limitedvisual references for pilots, and lack of alternative landingspots as shown in Video (Ref. 1). Despite these challenges,the Navy helicopter pilots have managed to successfully landon ships for decades. Therefore, it is imperative to understandand gain inspiration from the the Navy helicopter ship land-ing procedure, which is discussed in Refs. (Refs. 2, 3). Thepresent study began by asking two key questions: (1) what isthe secret behind the success of Navy helicopter pilots when itcomes to ship landing? and (2) can some of the key ideas fromthis procedure be applied to VTOL UAVs landing on movingplatforms?

Contrary to intuition, Navy helicopter pilots land their aircraftwithout visually focusing on the landing spot. In fact, Navypilots are trained not to follow ship deck motions for two mainreasons. First, spatial disorientation can occur when a pilothas no fixed, visible horizon to refer to, which is a criticalelement for maintaining a proper sense of helicopter attitudeindependent of ship motions. Second, constantly changing thehelicopter pose to match ship deck motions can trigger unsta-ble helicopter dynamics introducing serious potential hazards.Therefore, a pilot tries to control the helicopter in a stablemanner independent of the ship’s roll and pitch motions, andthen lands vertically. Vertical landing on a moving ship deckis tested and proven to be safe within the operating limits. It isalso recommended to land quickly in order to prevent the shipdeck from impacting one of the helicopter landing gear skidsor wheels causing a rollover.

If the pilot is not looking at the ship deck while landing, thenwhat is the visual cue that he/she is referring to? The key vi-sual aid that helps pilots to land safely on ships is a ”horizonreference bar” shown in Fig. 1, which is gyro-stabilized toindicate a perfect horizon independent of ship motions and iswidely used in most modern Navies (Refs. 4, 5). Pilots there-fore control the helicopter by referencing the horizon bar in-stead of responding to ship motions.

Figure 1. Horizon Reference Bar

The present study adopts this proven ship landing method andautomates it to land on moving platforms. First, a cameraplays the role of pilot’s eyes, and a computer vision systemprocesses the visual information from images. Then, a con-trol system uses the visual data to manipulate the UAV simi-lar to a pilot controlling a helicopter. This vision-based con-trol system can either be image- or position-based, depend-

ing on how the visual data is used. The image-based methoddirectly uses visual information obtained from a 2D cameraview while position-based control first estimates the pose ofan object by transforming the data from a 2D camera planeinto 3D space. Previous efforts to land a UAV on a movingplatform have used image-based control to center the landingplatform in a downward-facing 2D camera view (Ref. 6), andposition-based control to minimize the relative distance fromthe landing platform in 3D (Ref. 7). Despite both techniquesshowing limited feasibility for UAV applications, these ap-proaches cannot be extended to general landing cases becausethey use external infrastructure such as a Vicon motion cap-ture system (Ref. 8) to feed states into the control loop (Ref. 6)or infrared(IR)-LEDs on the landing platform and onboard IRcamera (Ref. 7). There have been other studies which alsoachieved autonomous UAV landing on a moving platform byusing a proportional derivative (PD) controller in the presenceof time delays (Ref. 9), and a linear quadratic regulator (LQR)control system in real-time (Ref. 10). However, along with theUAV, they also control the moving platform to be placed at afavorable position for UAV landing, which cannot be appliedto a general case of an independently moving platform.

For a practical vision-based control solution, one shouldachieve the landing task with common sensors available onUAVs and without manipulating the landing platform mo-tions. In recent years, there have been several studies thatsought for a general solution by combining the informationfrom the UAV’s inertial measurement unit (IMU) with the es-timated pose from captured images (Ref. 11) or combiningUAV’s IMU information with measured distances from a dis-tance sensor (Ref. 12). There have also been other studieswhich utilize either red, green and blue (RGB) color informa-tion of a landing platform (Ref. 13) or an optical-flow basedcontrol method which uses relative velocity and proximity in-formation (Ref. 14). Although these vision-based methodsachieved autonomous UAV landing on platforms undergoingsimple motions by using minimum sensors such as a singleonboard camera and IMU, all of them visually tracked thelanding platform, unlike a helicopter ship landing procedurewhere the pilots do not look at the ship deck. An approach thatinvolves tracking the motion of the landing spot is not idealfor the VTOL UAV landing because (1) in order to track thelanding spot, the UAV needs to approach from a high altitude,which might not be desirable or even possible in many ap-plications, (2) in scenarios where the platform is undergoingcomplex motions, for example, a small ship in moderately-high sea states or a Humvee traversing rugged terrain, a so-phisticated vision system is required to capture the large, un-predictable platform motions, and (3) even if the platform mo-tion can be tracked precisely, there is always a potential haz-ard caused by the control system forcing the UAV to matchthe complex platform movements.

Conversely, this paper presents a vision-based autonomouslanding solution for VTOL UAVs which overcomes thesedrawbacks by implementing the Navy helicopter ship landingprocedure by tracking a gyro-stabilized horizon reference barrather than the moving platform itself. In the present study,

2

Page 3: COMPUTER VISION SYSTEM - arXiv

a single front-facing camera tracks a standardized visual cue(analogous to a horizon bar) that is located beyond the land-ing spot and installed parallel to the approach course. Asshown in Fig. 2, referring to the visual cue allows the UAVto approach horizontally similar to the helicopter ship land-ing procedure. Therefore, its operating range is greater in thehorizontal direction and it can be used for both approach andlanding phase. However, the operating range for a platform-motion-tracking approach is limited to the vertical space, andtherefore, it can be used only for the final landing phase andrequires another method for approach. Therefore, the methodpresented in this study is fundamentally different than previ-ous UAV landing techniques, and is applicable for all types ofVTOL aircraft.

Figure 2. Comparison of Previous Platform MotionTracking with Present Visual Cue Tracking Method

For the present study, instead of a standard horizon bar, acheckerboard pattern is used as a visual cue since it is eas-ily distinguishable from its surroundings and also offers re-dundancy with multiple corners that can be detected. Gain-scheduled proportional integral derivative (PID) controllersare configured to command roll, pitch, yaw, and throttle tothe UAV. As the UAV advances to the landing spot, multiplelayers of controllers are activated according to the relative dis-tance from the landing platform because that allows the UAVto approach at a higher speed when it is farther away and thenslow-down and carefully adjust its pose when it reaches closeproximity of the landing spot and eventually executes land-ing maneuver similar to how a remote helicopter pilot woulddo. In addition, the robust autonomous flight characteristicsare even more enhanced by configuring different flight modes,which include a scanning mode to search for the visual cue incase of detection failure, a tracking mode that is activated dur-ing normal flight conditions, and safety mode for respondingappropriately to unexpected situations.

To demonstrate the feasibility of this novel method, a system-atic approach has been taken from development to validation.The outline of the paper is as follows: (1) computer visionsystem development and validation using the Vicon motioncapture system (Ref. 8), (2) control system architecture and

the configured flight modes, (3) results from the Gazebo sim-ulation (Ref. 15) conducted for various scenarios, (4) flight-testing results from autonomous landings on both stationaryand moving platforms, and (5) the conclusions inferred fromthis study.

COMPUTER VISION SYSTEMVision-based control has made its mark in autonomous navi-gation systems with the advancement of computer vision algo-rithms. There have been many studies in this field to developmethodologies for detecting and extracting desired visual fea-tures from the surrounding environment (Refs. 16–18). Theextracted data are used for various purposes such as poseestimation, object recognition, reconstruction, and so forth.Among multiple methods, the projective transformation tech-niques (Ref. 19) are closely referenced for computing relativeposition and orientation.In this study, a computer vision system is developed to pro-vide accurate position and orientation information. The con-tributions of the present study in computer vision include im-provement of a previous detection method (Ref. 19), integra-tion of the detection and estimation methods, and validationof the entire vision system using the Vicon motion capturesystem. The Parrot Anafi quad-rotor UAV and the checker-board-based visual cue which have been used throughout thestudy are shown in Fig. 3.

Figure 3. UAV with Gimbal Camera and Visual Cue

The onboard front-facing camera has pitch and roll mechani-cal gimbals, which compensate for the pitch and roll motionof the UAV and keeps the camera level with the horizon. Thus,images captured by the camera do not experience the effectsof the pitch and roll motions. This reduces the complexitycaused by rotational motions. However, the effect of yaw isreflected in the images, which is utilized to extract the headingof the UAV with respect to the visual cue. The maximum res-olution of the camera is 4k ultra-high definition, UHD (4096x 2160) and the higher resolution helps in increasing the de-tection range. The size of squares on the checkerboard is theother factor that affects the detection range. As the squaresize increases, the effective range for detection also increases.However, there is a trade-off between the maximum effectiverange and the closest proximity from the visual cue since thecamera has to be located farther away for the visual cue de-tection in case of a bigger square size. The effective range for

3

Page 4: COMPUTER VISION SYSTEM - arXiv

each case is found through experiments and are provided inTable 1.

Table 1. Valid Range for Varying Resolution & Square Size

Resolution 80 mm Square 120 mm SquareqHD(540p) 13 m 15.5 mHD(720p) 14 m 17.5 mFHD(1080p) 15 m 19 m4K UHD(2160p) 17 m 25 m

In the present study, the square size is chosen to be 120 mmsince it can be detected accurately from distances as high as 25meters, and the square size is small enough to stay within thecamera view even when the visual cue is only 1 meter away.Even though higher resolution increases the effective range, italso increases the data size to process. Therefore, HD 720p(1280 x 720) resolution is selected considering the latency inlivestreaming and image processing for a relatively high band-width integrated vision-based feedback control system as wellas a good effective range. In the following sections, detaileddiscussions of the detection and estimation methods are pro-vided along with its validation.

Detection

Detection is one of the most important aspects in the presentcomputer vision system. The computation of the UAV’s rela-tive positions and heading angle depends heavily on how ac-curately the visual cue is detected. Note that for the presentapproach to work, the visual cue does not have to be of anyspecial shape or size as long as its dimensions are known andit can be distinguished from its surroundings. Hence, a num-ber of different visual cues are tested and representative casesare shown in Fig. 4.

Figure 4. Available Representative Visual Cues

Visual cues that vary in shape, size, and the number of cor-ners are tested to determine an appropriate one for a givenenvironment. Initially, a simple green colored bar is tested asa visual cue. It is basically a scaled-down version of the hori-zon reference bar used for ship landing. The green bar is de-tected using an HSV (hue, saturation, value) filter (Ref. 20).Once the green bar is separated out from the surroundings,the positions of corners are detected using the Harris cornermethod (Ref. 16). However, the direct application of the HSVfiltering method is not ideal since the HSV filter cannot guar-antee clear and perfect capture of the entire bar portion from

the background in a dynamic situation. Once the boundaryof the bar is not distinctive, the Harris corner algorithm alsoyields false corners which results in an inaccurate estimationof the UAV’s position and heading. In order to solve this is-sue, a new algorithm is developed to enclose the green bar ina bounding rectangle with sharp corners.

Despite the feasibility of the green bar, a visual cue that hasmore distinctive features and redundancy is considered in or-der to achieve robust VTOL UAV tracking and landing. Theredundancy can be enhanced by having more corners. Hence,the checkerboard is selected as the visual cue for this study be-cause its geometry exhibits local image features such as edges,lines, and corners which are greatly helpful in detection.

In addition, in order to increase the accuracy in detection,Forstner corner detection method, which is a more advanceddetection method than the Harris corner detection, is adoptedfor the present computer vision system (Ref. 21). TheForstner corner detection increases the accuracy by sub-pixelrefinement process. It is based on the fact that an ideal corneris a single point that tangent lines of the object cross perpen-dicular to each other as shown in Fig. 5.

Figure 5. Forstner Corner Detection Method

As shown in Fig. 5, the pixel information around a corneris not perfectly clear. Therefore, an approximation processfor defining the corner position (cu,cv) is required and is ex-pressed in Eq. (1).

(cu, cv) = argmincu,cv

∑u,v∈N

([∇ f (u,v)]T (u− cu,v− cv))2 (1)

The image gradient ∇ f (u,v) at the image pixel position (u,v)is perpendicular to line from (u,v) to corner position (cu,cv).In order to obtain (cu,cv), a least square estimation in a smallwindow is used. The projection of line from (cu,cv) to (u,v)on to the tangent line at (u,v) is required to be maximizedin the given window. In other words, the projection of im-age gradient on to the line segment connecting (cu,cv) and(u,v) has to be minimized for all (u,v) inside the window N.This method is implemented by using the existing OpenCVcheckerboard detector (Ref. 22).

4

Page 5: COMPUTER VISION SYSTEM - arXiv

This method accurately detects the corners, which improvesthe estimation quality; however, the detected corners are notrecognized in any specific order. Since the order is not as-signed, it can possibly experience unintended changes in theorder, depending on the captured image. This poses an is-sue to establish a correct coordinate system since it yieldsthe correct direction of axes only when the order of cornersis recognized from the top left to bottom right. Hence, thepresent detection method is improved to resolve this issue byassigning the proper order of corners considering the checker-board geometry and is then applied to simulations and flighttests. Using this method, different numbers of corners are alsotested along with varying resolutions and square sizes as a vi-sual cue selection process. The final selection is 9 corners forredundancy as well as considering the entire size of a visualcue with 120-millimeter squares.

Estimation

The estimation is based on a single camera calibration methodusing a planar object (Refs. 23, 24). The geometric relationbetween a camera and a planar object is shown in Fig. 6.

Figure 6. Geometric Relation between Image and Object

A conventional pinhole camera model is used to derive the ge-ometric relation. Consider a three-dimensional object (visualcue) in the space captured by a camera. A 3-D coordinate sys-tem can be defined with respect to the object and there is a 2-Dcoordinate system associated with the image frame. (X ,Y,Z)is a point on the object described in the visual cue body-fixedframe and (u,v) is a corresponding point on the image frame(pixel position in the image). The relationship between theimage coordinates and the body-fixed coordinates in matrixform is described in Eq. (2).

s

uv1

= A[

R t01x3 1

]−1

XYZ1

(2)

It is derived in a homogeneous coordinate system and con-verted to a cartesian coordinate system. (u,v) is an image

coordinate system and s is a scaling factor that comes fromtransforming the homogeneous coordinates to cartesian coor-dinates. R is a 3 x 3 rotation matrix and t is a 3 x 1 translationvector. R matrix has the information about the camera orienta-tion with respect to the visual cue and t gives the informationof how far the camera is from the particular (X ,Y,Z) point inthe visual cue body-fixed frame. O1x3 is 1 x 3 zero matrix. Thematrix which includes R, t, O1x3, and 1 is called the cameraextrinsic matrix, which varies from image to image. The ma-trix A is called the camera intrinsic matrix and its componentsare shown in Eq. (3).

A =

1/ρu 0 u00 1/ρv v00 0 1

fx 0 0 00 fy 0 00 0 1 0

(3)

1/ρu and 1/ρv are parameters that scale the image coordi-nates, which are in pixels to values in the International Systemof Units (SI). u0 and v0 are the center position in the imageplane. The first matrix shifts the center of the image plane tothe top left corner point. The second matrix contains the focallength information fx and fy, which are the focal lengths ofthe camera in the x and y directions, respectively. The productof two matrices forms the camera intrinsic matrix, which de-pends on the particular camera. This unique camera intrinsicmatrix can be found using a camera calibration (Ref. 25).

By solving Eq.(2), rotation and the translation vectors are ob-tained. In computer vision, the estimation of R and t are calledpose estimation. There are different algorithms to determinethe pose of the camera. A common class of algorithms calledthe Perspective-n-Point (PnP) algorithm is applied. Given aset of n 3-D coordinates of an object and its corresponding2-D projections on the image, this algorithm can estimate thepose of the camera. There are 6 degrees of freedom (DOF)for a camera, which are 3 DOF in rotation (roll, pitch, yaw)and 3 DOF in translation (X ,Y,Z). A minimum of 3 pointsare required to find a solution, but the solution is not unique.There should be a minimum of 4 points to obtain a unique so-lution; however, it can be more reliable and redundant whenthere are more points. An iterative method is used for thePnP algorithm since it is robust for objects which consist of aplanar surface. The iterative method is based on Levenberg-Marquardt optimization (Refs. 26, 27). In this method, thefunction finds such a pose that minimizes re-projection error,which is the sum of squared distances between the observedimage points (u,v) and projected object points (X ,Y,Z). Bydefault, the iterative algorithm sets the initial value of rotationand translation as zero and then updates during each iteration.

The OpenCV RANSAC method (Refs. 19, 28) is used to finda rough estimation for the extrinsic matrix. The RANSACmethod also identifies the outliers and removes them duringthe calculation. The initial guess and inliers are fed into theiterative algorithm to have a more accurate estimation. TheOpenCV solvePnP algorithm (Ref. 25) returns a rotation vec-tor and the translation vector. The rotation vector can be con-verted to the rotation matrix using the Rodrigues function.Thus, once the rotation matrix (R) and translation vector (t)

5

Page 6: COMPUTER VISION SYSTEM - arXiv

are obtained, −R−1t gives the relative distances in 3D fromthe camera to the origin of the visual cue body-fixed frame.

The present estimation method is modified from the existingalgorithm to take advantage of the gimbal camera. Since thegimbal corrects for the roll and pitch motion of a UAV, the im-ages only reflect the yaw motion (heading angle) with respectto the visual cue. Thus, roll and pitch angles computed bythe image can be regarded as camera noise and gimbal correc-tion errors. Therefore, the roll and pitch angles are excludedfrom the position estimation. The basic and modified rotationmatrix are specified in Eq. (4) and (5), respectively.

Rbasic =

cαcβ cαsβ sγ− sαcγ cαsβcγ + sαsγ

sαcβ sαsβ sγ− cαcγ sαsβcγ + cαsγ

−sβ cβ sγ cβcγ

(4)

Rmodi f ied =

cosα −sinα 0sinα −cosα 0

0 0 1

(5)

α , β , and γ represent yaw, pitch, and roll angles, respec-tively. Each row of the R matrix contributes to their respective(X ,Y,Z) coordinates. By setting pitch and roll angle to zero,the modified rotation matrix is obtained from the basic rota-tion matrix. The yaw angle is taken into consideration becauseit is required to set a UAV approaching course by maintaininga designated heading angle. The third row of the modifiedrotation matrix shows that the Z-coordinate is independent ofthe yaw angle α . Hence, the yaw angle has no contributionsin the estimation of forward relative distance (Z-coordinate)between the UAV and the visual cue. However, the sidewardrelative distance (X-coordinate) and vertical relative distance(Y -coordinate) are heavily dependent on the yaw angle.

It was apparent from multiple experiments that the yaw an-gle estimation becomes noisy as the camera gets farther awayfrom the visual cue. Since the estimation is based on thenumber of visual cue pixels in the image, the changes in pix-els with distance results in the noise. Despite this sensitivityissue, yaw estimation still shows a reasonable trend withinthe range that the forward relative distance is accurately esti-mated. To utilize the yaw estimation trend, instead of directlytaking the noisy yaw angle estimation, a moving average filteris configured and the results are shown in Fig. 7.

The red line denotes the yaw angle estimation results by thebasic algorithm and the blue line denotes the results after themoving average filter is applied. The moving average is cal-culated using the history of the previous estimation data. Alower and upper bound are defined for the next yaw angle es-timation value with respect to the current average. Wheneverthe estimated yaw angle is out of the defined range, it rejectsthat value and takes an average value. In this way, the mov-ing averaging filter provides stable yaw estimation by track-ing a trend in a further distance, and the accuracy naturallyincreases as it gets closer to the visual cue due to having morepixel data in the image.

Figure 7. Effect of Moving Average Filter on Yaw

In conclusion, the present estimation method is developed tosatisfy the practical requirements for the UAV tracking andlanding maneuver. When the UAV flies to the landing padfrom a further distance, having a continuous smooth estima-tion for the current position is more important than the accu-racy as long as the estimation is within a reasonable bound-ary. If the vision system provides fluctuating estimations tothe control system, the UAV will move in an arbitrary man-ner in response to the fluctuations. When the UAV is close tothe landing pad, the accuracy is crucial and the vision systemprovides smooth and precise estimations with the help of themoving average filter and more pixel data in the image.

Validation

The results obtained from the developed computer vision sys-tem are validated through position and attitude measurementsusing a Vicon motion capture system shown in Fig. 8.

Figure 8. Experimental Setup in Vicon System

Vicon is a motion capture system widely used in the entertain-ment industry to track the motion of people and other objectseven up to sub-millimeter resolution. The Vicon system usedfor this study can capture images at 120Hz using eight 16-megapixel digital video cameras. Camera-mounted strobes il-luminate small, retro-reflective markers, which are identifiedand processed by the imagers. The Vicon cameras track re-flective markers on both the UAV and the visual cue to obtainprecise position and orientation data, which is then used as

6

Page 7: COMPUTER VISION SYSTEM - arXiv

the ground truth for this study to validate the computer visionsystem. The comparisons of the forward relative distance andsideward relative distance are shown in Fig. 9a and 9b, re-spectively.

Figure 9a. Validation of Forward Distance Estimation

Figure 9b. Validation of Sideward Distance Estimation

The blue, red, and green lines denote the Vicon measure-ments, baseline algorithm estimations, and estimations fromthe present improved algorithm, respectively. As seen fromthe figure, the baseline and the improved algorithms have nodifference when it comes to the forward relative distance esti-mation, and both these methods can obtain the same level ofaccuracy as the Vicon measurements. The sideward distanceestimation by the baseline algorithm has fluctuations; how-ever, the improved algorithm shows smooth results due to themoving average filter. The maximum error in sideward dis-tance estimation is 1 centimeter, when compared to the Viconresult. The comparisons of the vertical relative distance andrelative yaw angle are as shown in Fig. 10a and 10a, respec-tively.

In both vertical relative distance and relative yaw angle com-

Figure 10a. Validation of Vertical Distance Estimation

Figure 10b. Validation of Yaw Angle Estimation

parison, the improved algorithm yields smooth estimation re-sults. The maximum error in the vertical relative distance isbelow 1 centimeter and the maximum error in the relative yawangle is 1 degree. Through the Vicon experiments, the abilityof the present computer vision system to detect the visual cueand precisely estimate the position and orientation is demon-strated.

FLIGHT CONTROL SYSTEM

The objective is to develop a flight control system that willenable any type of VTOL aircraft to land precisely on mov-ing platforms. The present vision-based controller will formthe outer-loop of a cascaded feedback control system whereit is assumed that the inner-loop will handle the basic attitudestability of the aircraft as well as generates the required pitch,roll, and yaw rates as demanded by the outer-loop. There-fore, the inner loop will be aircraft dependent. In this study,a quadcopter with fixed pitch propellers is selected and theinner-loop controls the rotating speed of each propeller. Pro-portional integral derivative (PID) controllers with scheduledgains are designed to achieve a realistic flight dynamic be-

7

Page 8: COMPUTER VISION SYSTEM - arXiv

havior as well as precise tracking and landing of the VTOLUAV. The standard form of the PID controller is representedin Eq. (6).

u(t) = KPe(t)+KI

∫τ

0e(τ)dτ +KD

ddt

e(t) (6)

e(t) denotes an error between a setpoint and relative posi-tion data with respect to the visual cue. Since the positionsand heading angle are computed from the vision system, dis-turbances and noises are already included. The update rateof the outer loop of the closed-loop feedback control systemis 0.1 seconds including the detection, estimation, and live-streaming through the computer vision system. KP, KI , andKD gains are tuned based on the effect of each parameter asshown in Table 2 (Ref. 29).

Table 2. Effect of KP, KI , and KD

Characteristic Increase KP Increase KI Increase KD

Rise Time Decrease Small SmallDecrease Decrease

Overshoot Increase Increase Decrease

Settling Time Small Increase DecreaseIncrease

Steady-state Decrease Large MinorError Decrease Change

Stability Degrade Degrade Improve

The different sets of gains are scheduled based on relative po-sitioning data to control the behavior of a UAV as desired.It allows the UAV to fly at high-speed when it is relativelyfar away from the visual cue and then slow down the relative

speed for cautious tracking when it gets closer to the land-ing point. The flight mode selector is also configured in or-der to adapt the UAV movements to the flight conditions andenhance safety. The states are the relative positions in threedimensions and the heading angle of the UAV with respectto the visual cue obtained from the computer vision system.The control inputs are pitch, roll, yaw, and throttle commands.Each control magnitude ranges from -100 to 100 as a percent-age of the total input. The outputs are UAV attitude and rawimages from the camera. The UAV attitude consists of rollangle φ , pitch angle θ , and yaw angle ψ .

Control Architecture

The vision-based PID control system with scheduled gains isstructured as shown in Fig. 11. Setpoints are the desired rel-ative positions and heading angle of the UAV. The mode se-lector automatically chooses a proper flight mode based onthe flight conditions. Depending on the selected flight mode,the control inputs are obtained through the gain-scheduledPID controllers or from the pre-assigned values. The parame-ters that are fed back include the estimated UAV position andheading angle obtained from the computer vision system.

Flight Modes

A robust flight control system should be able to operate un-der various flight conditions. Thus, flight modes are config-ured to fly the UAV in an appropriate manner given the sce-nario. They consist of the mode selector and gain-scheduledPID controllers in order to obtain the proper control inputs de-pending on the scenario. Three different modes are developedto enhance safety and tracking capability as follows.

1. Scanning mode: In the case of a visual cue detectionfailure, the scanning mode is activated anytime duringthe flight to search for the visual cue. Once this modeis engaged, the UAV starts yawing at a constant rate by

Figure 11. Vision-based Control System Architecture

8

Page 9: COMPUTER VISION SYSTEM - arXiv

commanding the pre-assigned control inputs instead ofrelying on the gain-scheduled PID controllers. It contin-ues to yaw until it detects the visual cue and one suc-cessful detection disengages this mode instantly. Thisallows the UAV to track the visual cue irrespective ofits initial heading orientation as long as the visual cue iswithin the detection range. During the flight, if the UAVloses the visual cue from the camera view, it stops mov-ing and then scans for the cue in order to get back ontrack. Hence, the scanning mode enhances safety as wellas increases operational range and robustness.

2. Tracking mode: This mode is activated once the vi-sual cue is properly detected. Based on the relative dis-tance and heading angle, different gain-scheduled PIDcontrollers are engaged as shown in Fig. 12.

First, three different pitch controllers are configuredbased on the forward relative distance. Level 1 pitchPID controller gains are scheduled to track the visual cueat the maximum speed when the distance to the landingspot is more than 6 meters. Level 2 pitch PID controllergains are scheduled to track the visual cue at moderatespeeds when the distance range is between 6 and 1.5meters. The controller is tuned to prevent overshootsand oscillations. Level 3 pitch PID controller gains arescheduled to track the visual cue within a range of 1.5meters. It focuses on minimizing steady-state error forprecise tracking and mimics the cautious final approachmaneuver of a helicopter. Second, two different roll con-trollers are configured based on the sideward relative dis-tance. Level 1 roll controller gains are scheduled to trackthe visual cue when the sideward distance is more than1.5 meters and is tuned to achieve fast-tracking, no over-shoots, and no oscillations. Level 2 roll controller gainsare scheduled to track the visual cue precisely within therange of 1.5 meters focusing on steady-state error mini-mization. Third, two different yaw controllers are config-ured based on the relative heading (yaw) angle to main-tain the desired approach angle. In order to prevent thevisual cue from getting out of the camera view by com-manding large pitch and roll motions with excessive yaw

angle, it gives priority to the yaw correction by reducingthe magnitude of pitch and roll control inputs. Level 1yaw controller is designed to slow down the pitch and rollmotion as well as correct the yaw angle quickly when therelative yaw angle is more than 15 degrees. Level 2 yawcontroller is designed to correct the yaw angle preciselywithout slowing down the pitch and roll motion when therelative yaw angle is less than 15 degrees. Lastly, onethrottle PID controller is configured to control the heavemotion based on the vertical relative distance. Consid-ering the vertical displacement that the UAV can havewithin the camera view, one set of PID gains is enoughto handle its heave motion.

3. Safety mode: This mode is activated to enhance safetyby responding appropriately to three different situationsshown in Fig. 13 and the safety mode supersedes anyother mode.First, a backward (nose-up) pitch is applied instantly toprevent a collision when crossing the landing pad center-line instead of relying on the pitch PID controller. Sec-ond, the directional approach is designed to control theUAV when estimated relative positions are not reliable.Particularly, there exists a distance range where the for-ward distance estimation is accurate enough but the side-ward distance and relative yaw angle estimation are notaccurate due to a small number of pixels. In this case, itdirectly uses pixel position data in a 2D camera planeto control roll motion instead of estimating a positionfrom an image. It does not correct for yaw angle butthe roll is commanded if the visual cue is located in thearea which is more than 15 percent offset from the centerof the image in order to make sure the visual cue is notlost from the camera view. Third, an immediate land-ing is attempted to reduce the time for the final verticalmovement by commanding throttle down instantly whenthe UAV enters into the landing threshold.

SIMULATIONSThe objective is to verify every aspect of the vision-basedflight control system under different scenarios and to demon-

Figure 12. Composition of Gain-scheduled PID Controllers

9

Page 10: COMPUTER VISION SYSTEM - arXiv

Figure 13. Schematic of Safety Mode

strate that extreme accuracy can be achieved by adopting thisnovel method of tracking a visual cue installed parallel to aUAV approach course instead of tracking a landing platform.

In the present study, the entire coding is done in the Pythonprogramming language and a realistic robot simulation pro-gram, Gazebo (Ref. 15), is used to simulate the physical andvisual surroundings of a UAV as shown in Fig. 14 and Video(Ref. 30).

The simulated UAV model has the same dimensions, massmoments of inertia, gimbal camera, and flight mechanism asthe physical UAV, Parrot Anafi. The simulation also imple-ments a moving vehicle that carries the visual cue and landingplatform as shown in Fig. 14. As determined by vision sys-tem experiments, the visual cue has a checkerboard patternconsists of 4 x 4 squares with a side length of 120 mm each,and the landing platform size is 3 x 3 ft. These characteristicsremain the same throughout simulations and flight tests.

The simulations are conducted in the same way as the actualflight tests including live-streaming, real-time vision process-ing, and feedback control loop. First, the simulated UAV isconnected via WIFI to the base station computer. Once theUAV takes off, the images from the simulated camera are

streamed to the computer. The images are processed one byone to obtain the relative position and heading information,which is utilized by the feedback controller running on thecomputer to generate the control inputs. Then, the computedcontrol inputs are sent back to the UAV via the WIFI connec-tion. This one cycle from streaming to sending commandsback to the UAV takes 0.1 seconds. This simulated system,once verified, can be directly applied to a real flight test bysimply switching an IP address from simulated UAV to phys-ical UAV. The following simulation results show the trackingcapability and landing accuracy in detail.

Landing Accuracy

To demonstrate the tracking and landing accuracy of thepresent vertical-visual-cue based method, multiple simula-tions are conducted with different landing thresholds andvarying the speed of the mov ing platform. The final land-ing locations from 100 independent simulations are plotted inFig. 15.

The green line depicts the landing pad and each mark denotesthe landing point for different speeds of the moving platform.Note that landing anywhere within the 25 x 25 cm area can

Figure 14. Simulated UAV and Moving Platform in Gazebo Simulation Program

10

Page 11: COMPUTER VISION SYSTEM - arXiv

Figure 15. Landing Points with Landing Threshold of 25 x 25cm(left), 15 x 15cm(center), and 5 x 5 cm(right)

be regarded as a safe landing considering the UAV size. Thepresent vision-based control system achieves precise landingwithin a 5 x 5 cm landing threshold for all platform speeds.When the platform is moving at 8.5 m/s, it requires the UAVto fly close to its maximum speed but it still can make accuratelandings. The dispersion of landing points in the figure can beattributed to the assigned landing threshold. The time historyof relative distance for each landing case is shown in Fig. 16.

Each line denotes the relative distance between the UAV andlanding pad center at different platform speeds. By the timethe UAV starts tracking after take-off, the platform is alreadyaccelerating forward up to its designated speed. Thus, the rel-ative distance increases at the beginning of the tracking phase.Also, it takes more time to land as the landing threshold ismade smaller. The average time to achieve a landing thresh-old in each case is specified in Table 3.

Table 3. Average Time to Achieve Landing Threshold

Vehicle Speed 5x5cm 15x15cm 25x25cm0.5m/s 15.3s 4.6s 3.3s2.0m/s 25.6s 7.0s 6.8s4.0m/s 26.9s 13.8s 10.3s6.0m/s 28.6s 16.4s 16.2s8.5m/s 35.2s 23.0s 22.9s

In conclusion, the landing accuracy depends on an assignedlanding threshold. The present system can achieve a pre-cise landing, however, there is a trade-off between the timeand landing accuracy. Thus, the landing threshold should beselected based on the priority of a mission, which could beaccuracy or time. In the following simulations, the mediumlanding threshold of 15 x 15 cm is applied.

Tracking Capability

To examine the tracking capability, simulations are conductedby varying the speed and the course of the moving platform.The four cases discussed in this paper include the platformmoving forward in a straight line at speeds 0.5 m/s and 8.5m/s as well as the platform moving in an S-pattern and a cir-cular pattern. The trajectories, final landing points, set-pointtracking data in time, and time history of control inputs areanalyzed for each case.

1. Case-1: Platform moving in a straight line at 0.5 m/s:The platform is programmed to move forward at 0.5 m/sspeed and the trajectories viewed from the top and sideare shown in Figs. 17a and 17b, respectively.

The solid blue line denotes the UAV trajectory whereasthe dashed red line denotes the platform trajectory from

Figure 16. Relative Distance with Landing Threshold of 25 x 25cm(left), 15 x 15cm(center), and 5 x 5 cm(right)

11

Page 12: COMPUTER VISION SYSTEM - arXiv

Figure 17a. Trajectories in Top View (Case-1)

Figure 17b. Trajectories in Side View (Case-1)

take-off to landing. The initial position of the UAV is onthe ground at a sideward distance of 1.5 meters from theplatform. The UAV takes off and then tracks the visualcue until satisfying the landing condition. Once the UAVenters the 15 x 15 cm landing space from the pad center,it lands quickly.

The forward relative distance and pitch command inputin time are shown in Figs. 18a and 18b, respectively.The blue and red solid lines denote the time history ofthe forward relative distance and pitch command input,respectively. The pitch command input ranges from -100 to 100 as percentage input where -100 is the maxi-mum pitch backward and 100 is the maximum pitch for-ward. Initially, the distance increases since the UAV hasto pick up speed from 0 m/s speed while the platformis already moving forward at 0.5 m/s. Due to the dif-ferent levels of gain-scheduled PID controllers activatedbased on the relative distance, the tracking rate changesfrom fast-tracking at a large distance to precise trackingin the proximity of the landing pad. It takes 4.4 secondsto reach the landing threshold of 15 cm from the landingpad center and the final landing deviation is 2 cm.

The sideward relative distance and roll command inputin time are shown in Figs. 19a and 19b, respectively.

Figure 18a. Forward Relative Distance (Case-1)

Figure 18b. Pitch Command Input (Case-1)

The blue and red solid lines denote the time history ofsideward relative distance and roll command input, re-spectively. The roll command input ranges from -100 to100 as percentage input where -100 is maximum roll leftand 100 is maximum roll right. For an initial 1 second,the roll is commanded based on the visual cue position in

a 2D camera view, which is the directional approach ofsafety mode. It prevents responding to the noisy side-

Figure 19a. Sideward Relative Distance (Case-1)

12

Page 13: COMPUTER VISION SYSTEM - arXiv

Figure 19b. Roll Command Input (Case-1)

ward distance estimates that happen while the movingaverage method is not activated. It takes 10 previous data(1 second) to activate the moving average method. Thefinal landing deviation in the sideward direction is 11 cm.

The vertical relative distance and throttle command inputin time are shown in Figs. 20a and 20b, respectively.

Figure 20a. Vertical Relative Distance (Case-1)

The blue and red solid lines denote the time history ofvertical relative distance and throttle command input, re-

Figure 20b. Throttle Command Input (Case-1)

spectively. The zero setpoint means 15 cm above thelanding pad. The throttle command input ranges from-100 to 100 as percentage input where -100 is maximumthrottle down and 100 is maximum throttle up. At 4.4seconds, the immediate landing is conducted by apply-ing the maximum throttle down command.

The relative yaw angle and yaw command input in timeare shown in Figs. 21a and 21b, respectively. The blueand red solid lines denote the time history of relativeyaw angle and yaw command input, respectively. Thezero setpoint means that a heading angle is set to zerofor making a parallel approach. The yaw command in-put ranges from -100 to 100 as percentage input where-100 is the maximum yaw left and 100 is the maximumyaw right. As in the case of the roll, it takes one secondfor the moving average method to be active. Therefore,instead of responding to initial fluctuating yaw angle es-timations, the directional approach of safety mode com-mands zero yaw for the first one second. Then, it startsregulating the yaw angle by the yaw PID controller oncethe moving average method is active.

Figure 21a. Relative Yaw Angle (Case-1)

2. Case-2: Platform moving in a straight line at 8.5 m/s:The platform is programmed to move forward at 8.5 m/s,

Figure 21b. Yaw Command Input (Case-1)

13

Page 14: COMPUTER VISION SYSTEM - arXiv

which requires the UAV to fly at its maximum speed dur-ing tracking. Therefore, it occasionally experiences con-trol saturation; however, it still makes a successful ap-proach and landing. The trajectories viewed from the topand side are shown in Figs. 22a and 22b, respectively.The solid blue line denotes the UAV trajectory whereasthe dashed red line denotes the platform trajectory fromtake-off to landing. The overshoots and undershoots ob-served in the trajectories are due to the control satura-tion when the UAV flies with its maximum pitch forwardcommand which limits other commands.

Figure 22a. Trajectories in Top View (Case-2)

Figure 22b. Trajectories in Side View (Case-2)

The forward relative distance and pitch command inputin time are shown in Figs. 23a and 23b, respectively. Upto 7.5 seconds, the relative distance increases since theplatform is moving faster than the UAV. When the rel-ative distance is further than 6 meters, the level 1 pitchcontroller yields maximum forward pitch for high-speedtracking. Since UAV control inputs are achieved throughpropeller speed variation, the maximum forward pitchthat requires the highest rotating speed for the two rearpropellers affects the capability to yield the other controlinputs as desired. Level 2 and 3 pitch controllers are ac-tivated at distances of 6 and 1.5 meters, respectively. Ittakes 23 seconds to reach the landing threshold of 15 cm

from the pad center and the final landing deviation in theforward direction is 12 cm.

Figure 23a. Forward Relative Distance (Case-2)

Figure 23b. Pitch Command Input (Case-2)

The sideward relative distance and roll command inputin time are shown in Figs. 24a and 24b, respectively.

Initially, the directional approach of safety mode is acti-vated to command roll with a magnitude of 10 based onthe visual cue position in a 2D camera view, and level

Figure 24a. Sideward Relative Distance (Case-2)

14

Page 15: COMPUTER VISION SYSTEM - arXiv

Figure 24b. Roll Command Input (Case-2)

2 roll controller takes over the control after 1.1 seconds.Between 4 and 17 seconds, it experiences limited controlauthority caused by maximum pitch forward commands.Thus, the relative distance is fully regulated after 17 sec-onds. The final sideward landing deviation is 11 cm.

The vertical relative distance and throttle command inputin time are shown in Figs. 25a and 25b, respectively.

Figure 25a. Vertical Relative Distance (Case-2)

Figure 25b. Throttle Command Input (Case-2)

It shows the time history of vertical relative distance withrespect to the setpoint which is 15 cm above the landing

pad as well as throttle command input after take-off. At23 seconds, the immediate landing is executed by apply-ing the maximum throttle down command.

The relative yaw angle and yaw command input in timeare shown in Figs. 26a and 26b, respectively. After thedirectional approach of safety mode commands zero yawfor an initial 1 second, it starts regulating the yaw an-gle by the yaw PID controller with the moving averagemethod using 10 previous data which allows stable track-ing. However, it experiences limited control authoritydue to maximum pitch forward commands between 4 and17 seconds. Therefore, the yaw angle is fully regulatedafter 17 seconds.

Figure 26a. Relative Yaw Angle (Case-2)

Figure 26b. Yaw Command Input (Case-2)

3. Case-3: Platform moving in S-pattern: The platformis programmed to move in an S-pattern at a constantspeed of 1 m/s and the trajectories viewed from the topand side have minimal overshoot as shown in Figs. 27aand 27b, respectively.

15

Page 16: COMPUTER VISION SYSTEM - arXiv

Figure 27a. Trajectories in Top View (Case-3)

Figure 27b. Trajectories in Side View (Case-3)

The forward relative distance and pitch command inputin time are shown in Figs. 28a and 28b, respectively.The pitch controller is switched from level 2 to 3 uponreaching the distance of 1.5 meters at 3.7 seconds andthe final forward landing deviation is 9 cm.

Figure 28a. Forward Relative Distance (Case-3)

Figure 28b. Pitch Command Input (Case-3)

The sideward relative distance and roll command inputin time are shown in Figs. 29a and 29b, respectively.The directional approach of safety mode is activated foran initial 1 second followed by level 1 roll controller. At2.5 seconds, the level 2 controller is activated for precisetracking and the sideward landing deviation is 13 cm.

Figure 29a. Sideward Relative Distance (Case-3)

Figure 29b. Roll Command Input (Case-3)

16

Page 17: COMPUTER VISION SYSTEM - arXiv

The vertical relative distance and throttle command inputin time are shown in Figs. 30a and 30b, respectively.The zero setpoint is 15 cm above the landing pad and itcommands maximum throttle down to land immediatelyat 7.8 seconds where the landing conditions are satisfied.

Figure 30a. Vertical Relative Distance (Case-3)

Figure 30b. Throttle Command Input (Case-3)

The relative yaw angle and yaw command input in timeare shown in Figs. 31a and 31b, respectively.

Figure 31a. Relative Yaw Angle (Case-3)

Figure 31b. Yaw Command Input (Case-3)

Due to the platform changing its course, the yaw angle isrelatively big. However, the level 2 yaw controller reg-ulates the yaw angle effectively. Since the yaw angle isnot one of the landing conditions, the UAV lands with ayaw angle of 1.7 degrees.

4. Case-4: Platform moving in circular pattern: Theplatform is programmed to move in a circular pattern ata speed of 1 m/s and the trajectories viewed from the topand side are shown in Figs. 32a and 32b, respectively.

Figure 32a. Trajectories in Top View (Case-4)

Figure 32b. Trajectories in Side View (Case-4)

17

Page 18: COMPUTER VISION SYSTEM - arXiv

Since the platform turns more than 90 degrees, the tra-jectory in the side view overlaps as the platform comesback. The entire trajectory shows that the UAV tracks thecircling platform with minimal overshoot.

The forward relative distance and pitch command inputin time are shown in Figs. 33a and 33b, respectively.The level 2 pitch controller operates until 3.8 secondsfollowed by level 3 controller. While tracking, the safetymode is intermittently activated to prevent a potentialcollision with platform structure, which commands -10pitch input if the UAV crosses the pad center. It landswith a forward deviation of 8 cm.

Figure 33a. Forward Relative Distance (Case-4)

Figure 33b. Pitch Command Input (Case-4)

The sideward relative distance and roll command input intime are shown in Figs. 34a and 34b, respectively. Thedirectional approach of safety mode is activated for aninitial 1 second followed by level 1 controller. At 2.5 sec-onds, the level 2 controller is activated for precise track-ing and it takes a relatively long time to regulate becausethe vehicle moves in a way to increase sideward relativedistance. The final landing deviation in the sideward di-rection is 14 cm.

Figure 34a. Sideward Relative Distance (Case-4)

Figure 34b. Roll Command Input (Case-4)

The vertical relative distance and throttle command inputin time are shown in Figs. 35a and 35b, respectively. Thezero setpoint means 15 cm above the landing pad and itcommands the maximum throttle down to land immedi-ately at 15.7 seconds at which the landing conditions aresatisfied.

Figure 35a. Vertical Relative Distance (Case-4)

18

Page 19: COMPUTER VISION SYSTEM - arXiv

Figure 35b. Throttle Command Input (Case-4)

The relative yaw angle and yaw command input in timeare shown in Figs. 36a and 36b, respectively. The direc-tional approach of safety mode commands zero yaw foran initial 1 second. Due to the circular trajectory of theplatform, the yaw angle is relatively large, however, keptless than 15 degrees. Thus, the level 2 yaw controllerwith the moving average method is activated to regulatethe yaw angle. Since the yaw angle is not one of the land-ing conditions, the UAV lands with having a yaw angleof 1.8 degrees at 15.7 seconds.

Figure 36a. Relative Yaw Angle (Case-4)

Figure 36b. Yaw Command Input (Case-4)

FLIGHT TESTSFlight tests are conducted to demonstrate the ability of the de-veloped vision-based autonomous VTOL UAV landing con-troller to achieve precise landing in practice. The experimen-tal setup for both fixed and moving platform landings is shownin Fig. 37. The visual cue and landing pad dimensions are thesame as in simulations. Flights are performed indoor whereGPS signal is not available and shown in Videos: Fixed Plat-form Landing (Ref. 31), Moving Platform Landing (Ref. 32).

Figure 37. Indoor Flight Experimental Setup

The schematic of the configured vision-based control systemis shown in Fig. 38.

Figure 38. Schematic of Flight Testing Process

The UAV is controlled by a Python script that runs on an ex-ternal base-station computer. The external computer commu-nicates with the UAV through the WIFI connection. The UAVtransmits raw images captured by the onboard camera to thecomputer in real-time. Then, the computer processes imagesto detect the visual cue and estimate the relative positions andheading angle. The image resolution used is 1280 x 720p andthis affects the effective range for detection and estimation.Once the relative positions and heading angle are computed,the feedback control loop yields the corresponding UAV com-mand inputs which are roll, pitch, throttle, and yaw. The com-mands are sent back to the UAV and then the embedded inner-loop autopilot controls the rotating speed of each propeller to

19

Page 20: COMPUTER VISION SYSTEM - arXiv

achieve the commanded inputs. One cycle of this whole pro-cess takes 0.1 seconds, which is proven to be fast enough tocontrol the UAV behavior.

Flight tests are progressively conducted from a stationary plat-form landing to a moving platform landing. It requires cam-era calibration for the real camera sensor since the simulatedcamera is designed to only mimic the dimensions and gim-bal movement, but not the same sensor quality. Consideringflight conditions, it is designed to take 40 previous yaw datafor the moving average method, which can reduce yaw fluctu-ations. Since the update rate is 0.1 seconds, it takes 4 secondsfor the moving average method to be in effect. The numberof yaw data for the moving average method can be chosenproperly according to the flight conditions; for example, thesimulations took 10 previous yaw data for the moving aver-age method. Even though realistic simulations for the sameflight cases have been done, the flight tests still require morePID gain tuning. The specific gain values obtained from thesimulations are used as initial gains for flight tests, and the fi-nal gains are determined via multiple flight tests. The controlstrategy is designed in a way that the UAV stops streamingand makes a quick landing once it achieves the desired land-ing condition in actual flights. The following results show therelative positions, heading angle, and control inputs while thestreaming is operational, which is from after take-off to be-fore landing. More than 100 flight tests are conducted and therepresentative results are presented.

Landing on a stationary platform

Multiple tests are performed with various initial positions andheading angles. As a representative challenging case of sta-tionary platform landing, 45 degrees of initial yaw angle caseis presented. The final landing point is shown in Fig. 39. It isdesigned to land when the UAV enters the landing thresholdwhich is 15 x 15 cm from the landing pad center. The finallanding deviation from the center is 5 cm.

Figure 39. Final Landing Point (stationary platform)

The time history of forward relative distance and pitch com-mand input are shown in Figs. 40a and 40b, respectively.

Figure 40a. Forward Relative Distance (stationary plat-form)

Figure 40b. Pitch Command Input (stationary platform)

The time history of sideward relative distance and roll com-mand input during tracking are shown in Figs. 41a and 41b,respectively. Up to 3.4 seconds, the magnitudes of pitch androll commands are small due to the activation of the level 1yaw controller which gives priority to yaw correction. It takes6.7 seconds to satisfy the landing conditions.

Figure 41a. Sideward Relative Distance (stationary plat-form)

20

Page 21: COMPUTER VISION SYSTEM - arXiv

Figure 41b. Roll Command Input (stationary platform)

The time history of vertical relative distance and throttle com-mand input during tracking are shown in Figs. 42a and 42b,respectively.

Figure 42a. Vertical Relative Distance (stationary plat-form)

Figure 42b. Throttle Command Input (stationary plat-form)

The time history of relative yaw angle and yaw command in-put during tracking are shown in Figs. 43a and 43b, respec-

tively. Initially, the UAV has 45 degrees of heading angle,which is near the maximum that can be assigned without los-ing the visual cue in the camera view. Due to the higher imagequality of the real camera, the relative position and headingestimates in real experiments experience fewer fluctuationscompared to the simulations. Thus, it commands yaw fromthe beginning instead of applying the safety mode for the di-rectional approach, which had to be activated in the simula-tions. Until 3.4 seconds, the level 1 yaw controller correctsyaw and also reduces the magnitudes of pitch and roll com-mands not to lose the visual cue in the camera view when theyaw angle is large. After 4.1 seconds, the estimation and con-trol commands become smoother due to the moving averagemethod. By the time the UAV satisfies the landing conditions,the vertical distance is regulated to the visual cue height andthe yaw angle is decreased to zero degrees.

Figure 43a. Relative Yaw Angle (stationary platform)

Figure 43b. Yaw Command Input (stationary platform)

Landing on a moving platform

Multiple tests are performed while the platform is slowlymoving forward and the results from a representative case arepresented in detail. The landing deviation is 4 cm as shown inFig. 44.

21

Page 22: COMPUTER VISION SYSTEM - arXiv

Figure 44. Final Landing Point (moving platform)

The time history of forward relative distance and pitch com-mand input during tracking are shown in Figs. 45a and 45b,respectively.

Figure 45a. Forward Relative Distance (moving platform)

Figure 45b. Pitch Command Input (moving platform)

The time history of sideward relative distance and roll com-mand input during tracking are shown in Figs. 46a and 46b,respectively.

Figure 46a. Sideward Relative Distance (moving plat-form)

Figure 46b. Roll Command Input (moving platform)

The time history of vertical relative distance and throttle com-mand input during tracking are shown in Figs. 47a and 47b,respectively. The forward distance increases due to the plat-form already moving in the beginning, and sideward distanceis kept small throughout the flight. The setpoint for the ver-tical relative distance is the visual cue height on the movingplatform and it is almost the same height when the UAV startsstreaming right after take-off. Thus, it maintains the heightuntil landing. The landing conditions are satisfied at 5.5 sec-onds, it stops streaming and executes landing.

Figure 47a. Vertical Relative Distance (moving platform)

22

Page 23: COMPUTER VISION SYSTEM - arXiv

Figure 47b. Throttle Command Input (moving platform)

The relative yaw angle and yaw command input during track-ing are shown in Fig. 48a and 48b, respectively. Since theinitial heading angle of the UAV is intended to be parallel tothe visual cue, it has small yaws during flight. The fluctua-tions that exist until 4.1 seconds are at the noise level which is± 1 degree and become smooth after 4.1 seconds as the mov-ing average method is activated. When the UAV satisfies thelanding condition at 5.5 seconds, the final yaw angle is lessthan 1 degree.

Figure 48a. Relative Yaw Angle (moving platform)

Figure 48b. Yaw Command Input (moving platform)

SUMMARY AND CONCLUSIONS

The study focuses on the development and validation (viasimulations and flight testing) of a novel method inspired bymanned helicopter ship landing procedure for autonomoustracking and landing of VTOL UAVs on moving platformswithout relying on a GPS signal. The main novelty of thismethod is that instead of looking at the landing spot, it uti-lizes a standardized visual cue to track, approach and land onthe moving platform analogous to a pilot landing a helicopteron a moving ship by referencing the horizon bar (a visual aid).

To implement this method a computer vision system is de-veloped, which enables long-range detection and precise poseestimation by incorporating improved Forstner corner detec-tion and PnP estimation algorithm with a moving average fil-ter. With the use of a checkerboard type visual cue formedby 120mm squares, its maximum detection range is 25 me-ters. Within this range, by correlation with a Vicon-basedmotion capture system, it is demonstrated that the presentvision system has sub-centimeter position accuracy and sub-degree heading angle accuracy. The vision-based control sys-tem is configured with gain-scheduled PID controllers alongwith different flight modes, which allows a smart, situation-adaptive control approach to the landing maneuver, rather thanblindly moving to the landing pad. Based on the flight con-ditions and the relative distance to the landing platform, thecontroller will choose from the following modes: high-speedgross tracking, precise tracking at low speeds, scanning forthe visual cue, collision prevention, or immediate landing.The different flight modes are verified through the simulationsunder various flight scenarios including the landing platformmoving at high speeds, as well as following circular and S-shaped trajectories. The simulation results confirmed that theproposed visual cue tracking landing method is able to achievesafe, accurate, and robust landings repeatedly. The factor thataffects landing accuracy is the assigned landing threshold andthere is a trade-off between the accuracy and time. Even in ascenario where the UAV had to fly at its maximum speed, itlanded with an accuracy of 4 cm when a landing threshold wasset to 5 x 5 cm from the pad center. The transition from thesimulation to flight testing required only a few changes suchas camera calibration and minor adjustment in gains. Hun-dreds of flight tests were safely conducted, which includedboth stationary and moving platform autonomous landingswith a landing error of less than 5 cm.

The results presented in this paper demonstrate conclusivelythe feasibility of this novel autonomous approach and landingstrategy. In addition, this method offers additional freedomin the selection of the approach trajectory and height sinceit involves tracking a visual cue installed parallel to the air-craft’s view, which could be useful in real-life applications.The next step in this study is to improve the robustness of thismethod under uncertainties by bringing in the experience anddecision-making abilities of a human pilot via machine learn-ing approach.

23

Page 24: COMPUTER VISION SYSTEM - arXiv

Author contact: Bochan Lee [email protected] Saj [email protected] Benedict [email protected] Kalathil [email protected]

REFERENCES

1. Ruptly, “Helicopter Ship Landing at Rough Sea,”https://youtu.be/lDISL-jsF-Q?t=60,2016 (accessed June 30, 2020).

2. Lumsden, B., Wilkinson, C., and Padfield, G., “Chal-lenges at the helicopter-ship dynamic interface,” , 1998.

3. Colwell, J., “Maritime helicopter ship motion criteria-Challenges for operational guidance,” Challenges forOperational Guidance-NATO RTO Systems Conceptsand Integration Panel SCI-120. Berlin, Germany, 2002.

4. Stingl, A. L., “Vtol aircraft flight system,” US Patent3,487,553, January 6 1970.

5. “Helicopter Operations from Ships Other Than AircraftCarriers(HOSTAC),” Vol. I, 2017.

6. Lee, D., Ryan, T., and Kim, H. J., “Autonomous landingof a VTOL UAV on a moving platform using image-based visual servoing,” 2012 IEEE international confer-ence on robotics and automation, 2012.

7. Wenzel, K. E., Masselli, A., and Zell, A., “Automatictake off, tracking and landing of a miniature UAV on amoving carrier vehicle,” Journal of intelligent & roboticsystems, Vol. 61, (1-4), 2011, pp. 221–238.

8. “Vicon motion capture system,” http://www.vicon.com/.

9. Daly, J. M., Ma, Y., and Waslander, S. L., “Coordinatedlanding of a quadrotor on a skid-steered ground vehi-cle in the presence of time delays,” Autonomous Robots,Vol. 38, (2), 2015, pp. 179–191.

10. Ghamry, K. A., Dong, Y., Kamel, M. A., and Zhang,Y., “Real-time autonomous take-off, tracking and land-ing of UAV on a moving UGV platform,” 2016 24thMediterranean conference on control and automation(MED), 2016.

11. Araar, O., Aouf, N., and Vitanov, I., “Vision basedautonomous landing of multirotor UAV on movingplatform,” Journal of Intelligent & Robotic Systems,Vol. 85, (2), 2017, pp. 369–384.

12. Falanga, D., Zanchettin, A., Simovic, A., Delmerico,J., and Scaramuzza, D., “Vision-based autonomousquadrotor landing on a moving platform,” 2017 IEEEInternational Symposium on Safety, Security and Res-cue Robotics (SSRR), 2017.

13. Bi, Y., and Duan, H., “Implementation of autonomousvisual tracking and landing for a low-cost quadrotor,”Optik-International Journal for Light and Electron Op-tics, Vol. 124, (18), 2013, pp. 3296–3300.

14. Herisse, B., Hamel, T., Mahony, R., and Russotto, F.-X.,“Landing a VTOL unmanned aerial vehicle on a mov-ing platform using optical flow,” IEEE Transactions onrobotics, Vol. 28, (1), 2011, pp. 77–89.

15. “Gazebo simulation program,” http://gazebosim.org/.

16. Harris, C., and Stephens, M., “A combined corner andedge detector,” International Journal of Computer Vi-sion, 1988. DOI: 10.1023/A:1007963824710

17. Lindeberg, T., “Edge detection and ridge de-tection with automatic scale selection,” Inter-national Journal of Computer Vision, 1988.DOI: 10.1023/A:1008097225773

18. Ethan Rublee, K. K., Vincent Rabaud, and Bradski,G., “ORB: An efficient alternative to SIFT or SURF,”International Conference on Computer Vision, 2011.DOI: 10.1109/ICCV.2011.6126544

19. Hartley, R., and Zisserman, A., Multiple View Geometryin Computer Vision, Cambridge University Press, 2003.

20. Cheng, H. D., Jiang, X. H., Sun, Y., and Wang, J. L.,“Color image segmentation: Advances and prospects,”Pattern Recognition, Vol. 34, 2001, pp. 2259–2281.

21. Forstner, W., and Gulch, E., “A Fast Operator for De-tection and Precise Location of Distict Point, Cornersand Centres of Circular Features,” Proceedings of the IS-PRS Conference on Fast Processing of PhotogrammetricData, 1987.

22. Bradski, G., “The OpenCV Library,” Dr Dobbs Jour-nal of Software Tools, 2000. DOI: 10.1111/0023-8333.50.s1.10.

23. Zhang, Z., “Flexible Camera Calibration By Viewing aPlane From Unknown Orientations,” International Con-ference on Computer Vision, 1999.

24. Nozomu Araki, Y. K., Takao Sato, and Ishigaki, H.,“Vehicle’s orientation measurement method by singlecamera image using known shaped planar object,” Inter-national Journal of Innovative Computing, Informationand Control, 2011.

25. OPenCV, “Camera Calibration and 3D Reconstruction,”https://docs.opencv.org/master/d9/d0c/group__calib3d.html, 2020 (accessedJune 30, 2020).

26. Levenberg, K., “A Method for the Solution of CertainNon-Linear Problems in Least Squares,” Quarterly ofApplied Mathematics, 1944. DOI: 10.1090/qam/10666

24

Page 25: COMPUTER VISION SYSTEM - arXiv

27. Marquardt, D. W., “An Algorithm for Least-Squares Es-timation of Nonlinear Parameters,” SIAM Journal onApplied Mathematics, 1963. DOI: 10.1137/0111030

28. Bolles, M. A. F. . R. C., “Random Sample Consen-sus: A Paradigm for Model Fitting with Applicationsto Image Analysis and Automated Cartography,” , 1981.DOI: 10.1145/358669.358692

29. Ang, K. H., Chong, G., and Li, Y., “PID control systemanalysis, design, and technology,” IEEE transactions oncontrol systems technology, Vol. 13, (4), 2005, pp. 559–576.

30. Lee, B., “Vision-based VTOL UAV Landings on aMoving Platform in Gazebo Simulation,” https://youtu.be/8kAeVzGJyN8, 2020 (accessed June 30,2020).

31. Lee, B., “Flight Testing of Vision-based VTOL UAVLandings on a Fixed Platform,” https://youtu.be/w0dzVwBzFGk, 2020 (accessed June 30, 2020).

32. Lee, B., “Flight Testing of Vision-based VTOL UAVLandings on a Moving Platform,” https://youtu.be/IboT80OR1T8, 2020 (accessed June 30, 2020).

25