flight controlusing an artificial neural network

Flight Control Using an Artificial Neural Network

Gordon Wyeth and Gregg BuskeyComputer Science and Electrical Engineering

University of QueenslandSt Lucia, Queensland, 4072

Australia

Abstract

This paper details the developments to date of anunmanned air vehicle (UAV) based on a standardsize 60 model helicopter. The design goal is tohave the helicopter achieve stable hover with theaid of an INS and monocular vision. The focusof this paper is on the development of anartificial neural network that makes use of thisdata to generate hover command, which are usedto directly manipulate the flight servos. Currentresults show that a mapping can be learnt for thetraining data, but generalisation to unseen datahas been poor with current training sets.

1 IntroductionThis paper describes the preliminary aspects of the design ofan adaptive autonomous helicopter control system, forwhich the overall design goal is to achieve stable hoverunder most conditions. The system is. equipped with onboard monocular camera, inertial navigation system, globalpositioning system, ultrasonic proximity sensor (used foraltitude less than around 1m), and controller computer. Atpresent, the system's stimulus is the INS alone; essentially,the UAV is flying blind.

Previous works by Shakernia et al. [1] and R. Miller &o. Amidi [2] in the area of autonomous air vehicles havetaken the approach of hard coding the flight algorithms intoan onboard computer that is responsible for integratingflight strategy, making short term adjustments todisturbances, and maintaining actuator control. Theapproach described herein differs, in that the computer willlearn to fly through observation and experience. Since theflight algorithms are not coded directly, the on boardcomputer is required to learn how to mimic the actions ofthe trainer. Furthermore, the computer needs to be able torespond to situations previously unseen in the training data.Given the characteristic non-linearity of helicopter flightdynamics, this journey into unfamiliar control space canresult from anyone of a number of extemaI factors.Attempts to use traditional control methods for theconstruction of inverse control laws for helicopter plants,will inevitably encounter problems when faced with thetime-variant, non-linear nature of these systems. Scaledversions suffer even further problems with regard to

65

Jonathan RobertsCSIRO Manufacturing Science & Technology

P.O. Box 883, Kenmore, Australia. 4069Australia

stability, given their rotor inertia to frame inertia ratio is ofthe order of six times greater than that of their full sizeequivalents. The necessity for adaptation, coupled with theneed to be able to deal with these non-ideal dynamics,makes this problem particularly suited to neural networksandlor fuzzy logic controllers.

Autonomous helicopters have recently been employed inmapping arctic craters to be used in NASA extra-terrestrialtraining simulations [1]. This system was developed atCarnegie Mellon University, exhibiting full take-off:landing, and flight path tracking. Further work has beencarried out at Berkley University,. in which a UAV with anew method of translational and rotational movementestimation using a 4 image point estimation alignmentalgorithm is to soon be implemented [2].

1.1 Approach OutlineThe works described in [1] & [2] have involved using fuzzycontrollers, or genetic algorithms (or combinations thereot)to implement an inverse control law for primitivemovements, which is then used by a tactical planningcomputer (among other flight layers) to maintain flightcontrol. This work investigates the process of having anunmanned air vehicle, learn not only how to map its internalaeronautic structure to its desired movement, but also togain an internal representation through observation, of thetask required. Essentially, it uses direct mapping of sensorinputs to actuator control via·· an artificial neural network,such that the VAV is completely reactive. A feed forwardtemporal network using the back propagation trainingregime is employed to learn the INS to actuator controlrelationship (see Figure 1).

ControlComputer

PWM

Figure 1 - Autonomous Flight System

Temporal networks, in which some information aboutthe past also serves as input stimulus (sometimes termedpartially recurrent networks) have been successfully used inapplications in which some form of non-markov process isincorporated; speech reproduction [3] is one such example.While it is true that the network structure and learningalgorithm outlined above is not considered to be found inbiological organisms, it is nonetheless a useful tool fortaking the design of a U.A.V one step closer to a biologicalparallel.

2 HardwareThe system uses a JR Ergo standard size 60 modelhelicopter powered by a petrol driven two stroke engine,with a main rotor span of 1535mm and a gross weight of4.8kg. The onboard system is a combination ofconventional and high-tech flight equipment.Conventionally the helicopter is equipped with a traditionalradio transmitter and receiver and traditional flight controlservos. Unlike most model helicopters, the Ergo is alsoequipped with a flight computer based on a custom dualHC12 board mounted in the nose of the aircraft, and acontrol computer (PC104 Plus based .A.MD K6-2 300MHz)slung beneath the fuselage in an effort to offset thehelicopter center of inertia by as little as possi15le.

The flight computer reads the radio receiver signalswhich it can send to the control computer for logging. Itregenerates the control signals from the original, or asinstructed by the control computer, to manipulate the servosin autonomous flight. Individual servo channels can beswitched during flight to facilitate forms of partialautonomous operation. There are five servos controllingaileron, elevator, rudder, collective and throttle. Notecollective and throttle share the same linkage and hencereally need only one control signal between them.

The control computer is also equipped with an CrossbowINS (DMU-VG) provides 6 DOF information incorporating3 translational and 3 rotational components. The deviceuses micro-electro-mechanical-systems (MEMS) technologyallowing for silicon based tilt sensors and solid state gyrosto provide a smaller (475 grams) alternative to theirtraditionally cumbersome mechanical counterparts. Thems unit also utilises DSP to extract absolute pitch and rollinfonnation from the 6 rate sensors, accurate to within ± 1o.

This paper investigates the possibility of training a system tocorrelate the INS with the control signals generated by anexperienced pilot as shown in Figure 2.

INS

HC12 FlightComputer

Figure 2 - System Training Setup

66

The control computer is interfaced to a number of otherdevices not used in this paper. .A. single monocular ceDcamera is available through a Sensoray frame gra15bercapa15le of 50 frames per second. The control computer isalso equipped with an ethemet card allowing offlinecommunication with the base computers, and capable of 'upto 50 frames per second. The helicopter is also equippedwith GPS capabilities, although the intention is to rely onthis as little as possible, given the terrain dependant degreeofreliability experienced in [1].

One hardware module not yet implemented is a closedloop controller allowing precise manipulation of the engineRPM via the throttle servo. This is deemed necessary, basedon the'throttle to lift variations from flight to flight. Thisvariation is a consequent disadvantage of using a two strokeengine; the advantage being higher power to weight ratio.

3 System ArchitectureDuring autonomous flight, the neural network uses theinformation supplied 15y the INS about the helicopter'sattitude, to generate an appropriate set of servo responses.The INS information is periodically sampled and sent viathe RS-232 link to the control computer. This data isapplied to the inputs of the neural network, which is residenton the control computer. The network outputs are then sentto the flight computer, which in tum sends appropriate pulsewidth modulation signals to the five helicopter servos(aileron, elevator, rudder, collective). The network is trainedto generate the correct response to a given helicopter attitudeusing the back-propagation supervised learning algorithm.During this training phase, both the INS information and theservo inputs generated 15y' the pilot trainer are' sampled andlogged back to the control computer. These input/outputsets are then used to train the neural net on examples ofviable responses to a variety ofhover situations.

3.1 Data PreprocessingBefore the data is presented to the network, however, acertain degree of pre-processing is required. Firstly, itshould 15e noted that the sampling rates of the INS and servoinformation differ by around 20%, with sampling periods of30ms and 22ms respectively. Naturally the two need to 15ealigned to a common sampling frequency before meaningfultraining sets can be constructed. Using DSP interpolation toup'sample both signals to a common 1kHz samplingfrequency is not an option when using accelerometers. Thisis because acceleration is a second derivative term, which isoften noisy and hence plagued by high frequencycomponents. Therefore, there is no guarantee that the datawill satisfy the Nyquist criterion for successful signalupsampling, which' states that the highest spectrumfrequency must be less than half the original samplingfrequency. Therefore, it was decided to treat the INSsampling rate as the system sampling rate, and simply match'each INS sample to the nearest following servo sample. TheINS was chosen to be common simply because it ultimatelydetermines the state update rate of the system duringautonomous flight.

The next stage of the pre-processing is incorporated tomake the task of learning, and making generalizations about

the input to output mapping by the net more manageable.Each of the eight input and four output scalars are expandedinto 10 point vectors calculated to each take the fonn of anon-normalized Gaussian distribution. The result is theformation of a nearest neighbour association between likevalued input and output sets. More specifically, each vectorbin is assigned a scalar quantization. The value ~i of eachbin for a given scalar input ex is decided by the numericaldistance of its quantization value Xi rom the scalar asfollows:

In essence, each input and output looks like a sliding Gaussfunction. This concept is illustrated in Figure 3.

1 ,. S.calar..~ ~2 h .

0.5

1 Scalar =3

0.5

Figure 3 - Scalar Converts to Gaussianvector

The decision regarding the degree of nearest neighbourassociation (Gaussian variance), while not arbitrary, iscertainly not based on any solid theory. The design choiceof cr equaling 0.84 sets up vectors with peak bin values ofone having left and right neighbouring bins equaling 0.5;bins two degrees removed are for the most part zero. Thisof course, occurs only if the presented scalar matches avector quantization bin exactly, however the degree ofassociation for other scalar values is very similar.

Perhaps more important is the choice of the actual binquantization values. Given that outliers will always exist inthe training data, using a linear scale that includes the entirerange for a given input or output is unsatisfactory for tworeasons. The fITst is that it facilitates no room for moreextreme outliers during autonomous flight. The second isthat the outliers will stretch the vector such that most of theflight time is contained within the middle one or two vectorbin quantizations. The solution to both inadequacies comesin the form of using the statistical distribution for each inputand output to calculate their associated vector structure.More specifically, the middle two bins of the vectorassociated with the x-gyro for example, should havequantizations that envelope the region about the x-gyroaverage that covers 20% of the flight time. The second twofrom the middle should envelope a total 40% flight time,with the outer two. incorporating virtually t~e full 100%.

67

Note that with this method, scalars and quantization binvalues need to be normalized to their standardised randomvariable equivalents [5]. This structure results in increasedresolution in the region of most stable hover, while stillallowing for statistically outlying regions of control space tobe represented. The resolutioJ;l increase improves thenetwork's ability to interpret the intricate movementsassociated with stable hover.

3.2 Network ArchitectureThe network itself, is a temporal network usin:g a timewindow often samples (300ms), resulting in an input size of100 nodes for each of the eight scalar inputs (800). Theoutputs are treated only in their static form (40). Thistemporal nature allows the helicopter to react not only todirect stimulus, but also to the context within which it ispresented. The choice of the actual internal architecture ofthe network, including the type of neuron units (excitory /inhibitory or pure excitory) selected, is like any feedforward neural network trained using back-propagation;very much application dependant. Typically, a largenumber of hidden units allows for better values ofconvergence between the net outputs and the target outputscompared during training. However, a smaller number ofhidden units results in better generalisation when exposed tounseen situations during testmg, so like most engineeringproblems, a trade off exists between the two tenns of merit[6].

One method of minimizing the number of synapticconnections is to try and build some a priori knowledge ofthe system dynamics into the model [6]. The approachtaken for this system is to use a form of partially connectednetwork, the nature of which is described in sections 4 and5. Other parameters such as learning rate are alsoresponsible for a trade off between speed of convergence,and location of more optimal solutions; these are alsodiscussed in the following section.

4 Training AlgorithmThe Back Propagation training algorithm is a method ofsupervised neural network learning. During training, thenetwork is presented with a large number of input patterns(training set). The desired corresponding outputs are thencompared to the actual network output nodes. The errorbetween the desired and actual response is used to updatethe weights of the network inter-connections. This update isperformed after each pattern presentation. One run throughthe entire pattern set is tenned an epoch. The trainingprocess continues for multiple epochs, until a satisfactorilysmall error is produced. The test phase uses a different set ifinput patterns (test set). The corresponding outputs areagain compared to a set of desired outputs. This error isused to evaluate the networks ability to generalize. Usually,the training set and!or architecture needs to be assessed atthis point.

There are several parameters that can be adjusted toimprove training convergence. The two most relevant aremomentum a and learning rate 11. If the error is consideredto lie over a multi-dimension energy surface, 11 is consideredto designate the step sizes across this surface to reach a

Table 1 - Jlaried Architecture MSE's after 250 epochs.

Figure 4 -"MSE 'sfor Varying Hidden Number

0.251.131.522.24

40 hidden10 hidden6 hidden2 hidden

MSE Full (x 10-4)

0.361.111.892.21

M$E Partial' (x 10-4)401062

Hidden Units

MSE

X 10.3 Test MSEs (taken at 10 Epoch Inntervals) vs Training MSEs1.5 "

oI....---....--L. .....J--_..l---.l."'"---~_"""------.l._--"'-_~-'

o 50 100 150 200 250 300 350 400 450 500Epochs

visually, in order to test for possible ranges ofacceptable MSE.

• A partially connected network, ,in which x-gyro, yaccelerometer and roll were connected to one set ofhidden units, y-gyro, x-accelerometer and pitch toanother, and z-gyro and z-accelermoter connected. Thiswas to look for any synaptic redundancy within thenetwork

.' Test data simulations were run at five epoch trainingintervals, in an attempt to understand where, -if at anypoint, the influence of over-training began to manifestitself in the generalization performance.

• Increasing the stretch factor for the hidden units in orderto look for evidence of premature saturation.

5.1 ResultsThe relationship between number of hidden units and MSEon training data is shown in Table 1 below. It also shows acomparison between, partially and fully connected' units.These results were compiled using a training set comprising2.5 minutes flight time.

Figure 4 shows the test performance at 5 epoch trainingintervals for a network trained on the same data set. This isfor a varying number offiidden units, while using a constant11 of 0.02. This learning rate was found to give optimalconvergence speed vs [mal MSE, however any learning ratewithin tfie range of 0.01 to 0.04 gave comparable results.The upper four plots are test MSE; tfie lower four are

global minimum. Momentum, on the other hand, tries topush the process through any local minima. A learning ratetoo large results in the global minima being stepped across;too small and the .convergence time is prohibitively large.With the momentum, too large a value sees the processoscillate, while too small can result in local minimaentrapment. The actual shape (stretch) of the networkneuron transfer functions have also been found to affectconvergence [4]. As previously mentioned, the number ofhidden units, and the level of connectivity, can have a largeeffect on training speed, however, a trade off betweentraining and testing success exists.

4.1 Performance EvaluationThis success is most often gauged on the mean square errorbetween the ANN's response to the training or testing setinputs, and the corresponding targets. WHile this is certainlyrelevant. for classification systems, the measure of merit forsome control system ANN applications can 'be quitedifferent. Given that the system described in this paper mustlearn to develop a generic trajectory approximation based oninput stimulus, it makes more sense to evaluate thecloseness between the target sequence and the networkoutput sequence over time rather than consider each outputerror separately. This is not to say that MSE should beignored; only that it needs to be put in context as a gauge "ofthe network's ability to produce reasonaBle sequences. Onesuggested method for evaluating such a network's success isproposed in [7], in which merit is based on the correlationbetween the actual and target time sequences. At this point,visual evaluation ofplots betWeen the tWo have been used asan exercise of benching certain MSE ranges, which isessentially the same process; formal mathematicalcorrelation may be incorporated, in the final testing regime ifdeemed necessary.

5 ExperimentsExhaustive experiments were initially carried out usingexcitory/inhibitory (bipolar) neurons. While the results arenot included, it was found that no significant improvementin training success resulted from varying any or all of theparameters previously discussed. Changing to pure exhitoryunits (unipolar) resulted in immediate improvements. Theexperiments detailed herein use only unipolar neurons.

The experiments to date have focussed on deduction ofbest connectivity, activation unit type, and hidGen unitnumber. These investigations were carried out using severaltraining sets of varying sizes. Little has been investigatedinto the effect of varying initial weight ranges, as [4] foundfor largely connected networks, such variations have littleeffect on the rate of training convergence. The experimentsof interest carried out thus far are as follows:• First a network with a large number of hidden units (40)

was trained with learning rate of 0.02, momentum of0.95 and architecture fully connected. This was anattempt to converge the network to a very low MSE.The intention was to' then plot the network output onseen data to put MSE values into some context.

• Architecture was then changed to 10, 6 and 2 hiddenunits, so that a possibly worse MSE ,could be gauged

68

elevator0.78 ,--,.----r----,.---,.----r---....---..,..--,.-----.

0.6 L.--_L.--_L.--_.l--_.l--_~_~_.L..--_.l------Jo 200 400 600 800 1000 1200 1400 1600 1800

7 Controller ImplementationThe optimal neural network architecture described in thepreceding section will comprise the bulk of the, controllersoftware residing on the Controller Computer (PC).Interface procedures provide services to the neural netthrough which it can both access INS input data (state

6 Discussion of ResultsWhen considering training merit as a degree of correlationbetween desired and actual response sequences, rather thanas a measure of classification accuracy on a pattern bypattern basis, it can be seen from Figures 5, 6 & 7, that anetwork with a fmal MSE of 0.00018 yields a reasonablelevel of performance.

The results in Table 1 give a fair indication as to theminimum number of hidden units required beforeconsiderable MSE degradation occurs. While these werelogged from training on the small data set, the choice ofarchitecture to be used when training on the larger data setwas made based on these results. The clear relationshipbetween the fully and partially connected network "MSEpairs" (Table 1) furthers this focus on network pruning,demonstrating a degree of synaptic redundancy within afully connected architecture. The degree of redundancy wasfound to be even greater when training was carried out usingthe larger data set. This again allows interconnectionminimization with virtually no compromise in networktraining convergence.

Figures 4 & 5 provide some interesting insight into thetraining requirements of the system. While the intention ofthe associated experiments in Figure 4 was to fmd theoptimal number of training epochs before over-learningbegan, the results in fact indicated that the training regimewas suffering from a data deficiency. This conclusion isbased on the lack of variation shown for the span of epochsfor varying architectures in Figure 4. This is not overlysurprising, 'given a training set size of 2400 samples (around75 seconds), and a connection number of the order of 5000.A generic approximation for the number of required trainingpatterns is suggested in [6] to be equal to the total number ofconnections divided by the acceptable error level; thisindicates around 40000 samples for a 10 hidden unitarchitecture aiming for 80% accuracy. This is achievablewith 22 minutes of flying time. However, the resultsillustrated in Figure 5 show the system trains acceptably ondata sets far smaller than that estimated by the genericapproximation (16000 patterns). This is somewhat incontrast to expected behaviour given that the order ofpattern presentation in a temporal network cannot berandomized, as is suggested in [3] & [6] for networkclassification improvement. This better than expectedperformance on such a small training set, coupled with thesystem's relative performance independence to commonback propagation parameter variation implies that the hoverdata, as pre-processed and presented to the network,formulates a relatively well behaved optimization system.

The results indicate that the optimal system architectureis an 800 input, 40 output, 10 hidden partially connectedunit network trained on a six minute hover flight.

60

train Itest

50

1==

4030

epochs

2010

1L.--__~___L.._____1_..__......l___......l_____1

o

0.76

x 10-4 MSE on Test and Training sets for 10 Hidden Units9,-----,.----r----...,.-----.------.------,

training MSE. It was also found that increasing the hiddenstretch factor, improved generalisation, however for the 2.5minute data set, acceptable MSE was never achieved.

Figure 5 shows the training and test MSE for a networkwith the same parameters trained on a flight lasting morethan 6 minutes. Figure 6 compares the elevator sequencesproduced by the neural net (--) with that

The only parameter not varied for the larger set was hiddenunit number; this was not necessary since acceptablegeneralisation has seemingly been achieved with a 10hidden unit architecture. As with 'the smaller training set,training on the larger set exhibited little performancevariation between partially and fully connected networks.

r Igure () - .btevator LJemana vs j'lme

0.62

0.64

0.66

0.68

0.7

0.72

0.74

Figure 5 - MSE on 6 Minute Training Sets

of the pilot (full lines). It should be noted that variation ofthose parameters described for the smaller data set producedno considerable difference in performance when applied totraining on the larger set.

69

information), and send controlling demands to the flightcomputer (HC12). These demands are then forwarded to theservos in PWM form (See Figure 1).

The neural net· receives helicopter state updates every7.6ms. The net processes the updated information in a feedforward manner in order to produce updated servo demands.While the servos have their own associated closed loopcontrollers, the overall system is not considered to be in aclosed loop configuration in the classical sense. Rather, thecontrol software uses a "cause and effect" approach, suchthat it's output demands are gauged on the states fed backvia the INS in a fuzzy manner. Due to the frequencylimitations of the flight computer demand generationresulting from the PWM nature of the controlling signals,the rate of demand update at the actual·servos is limited to22ms. This is of no real concern to the neural net, as it wastrained on data logged during a piloted flight subjected tothe same hardware restrictions.

The testing regime for the controller will involve thepilot taking the helicopter into a stable state of hover. Thecontroller will then be given control one axis at a time untilfull autonomous hover is achieved. While preliminarytesting will relinquish aileron and elevator control only, it isexpected by the end of the year that the other two controlswill also be transferred.

8 Conclusions and Future WorkThis paper details the preliminary design of an ANN to beused as a purely reactive autonomous helicopter controller,using INS information as its only input stimulus. Particularattention was paid to the preprocessing methodology and theneural network architecture. It is found that a partiallyconnected neural network, in which x-gyro, roll and yaccelerometer are grouped, y-gyro, pitch, and xaccelerometer are grouped, and z-gyro and z-accelerometerare common to both, causes no degradation in training'performance compared with a fully connected architecture.It is also found that this problem is particularly ill-suited tobipolar activation units. Varying learning rate exhibits littleeffect on the overall performance so long as it is keptreasonably small (0.01->0.05). For the smaller data sets,increasing the stretch factor of the hidden layer units givesbetter (though never acceptable) performance on unseendata. Such a relationship is not evident when the largertraining sets are used; convergence rate is swift for allstretch values tested. For the. most part, successful trainingis found to be attributable to the training set.

This paper fmds that from an optimization viewpoint, thedynamically complex nature of a helicopter hover is in fact a"well behaved" system. The next step is to trial thecontroller on the helicopter; testing is expected to begin lateJuly. This will include carrying out investigations into thecause and implications of the neural net divergence onrudder and collective control. Following that is theincorporation of the vision system into the loop. Theintended use for the four-point algoritlnn described in [2] isprimarily translational displacement estimation from thehelipad center, and possibly velocity estimation relative tothe helipad.

70

References

1.R. Miller, O. A.n1idi, and M. Delouis, "Arctic Test Flightsof the CMU Autonomous Helicopter," Prepared/or theAssociation/or Unmanned Vehicle Systems Int. J999, 26th

Annual Symposium, http://www.ri.cmu.edu/projectlchopper(current January 2000).2. O. Shakemia, et aI., "Landing an Unmanned Air Vehicle:Vision Based Motion Estimation and Nonlinear Control,"Asian Journal of Control, Vol. 1, No.3, pp. 128-45, 1999.3. J. Hertz, A. Krogh, R.G. Palmer, "Recurrent Networks,"Introduction to the Theory o/Neural Computation, AddisonWesley Publishing Company, CA., 1991, pp. 163-196.4. G.F. Wyeth, A Hunt and Gather Robot, doctoraldissertation, Univ. of Queensland, St. Lucia, Dept.Electrical Engineering and Computer Science, 1997.5. D.S. Moore, and G.P. McCabe, "Looking at Data:Distributions," Introduction to the Practice 0/Statistics,W.H. Freeman and Company, N.J., 1993, pp. 63-74.6. S. Haykin, Neural Networks, a ComprehensiveFoundation, Macmillan Publishing Company, N.J., 1994.7. L.C. Jain, N.M. Martin, Fusion o/Neural Networks,Fuzzy Sets & Genetic Algorithms, CRC Press LLC, BocaRaton, 1999.

flight controlusing an artificial neural network

Documents