evolving large scale uav communication system

Evolving Large Scale UAV Communication System

Adrian AgoginoUCSC at NASA Ames

Mail Stop 269-3Moffett Field, CA 94035Adrian.K.Agogino@

nasa.gov

Chris HolmesParkerOregen State University

204 Rogers HallCorvallis, OR 97331

[email protected]

Kagan TumerOregen State University

204 Rogers HallCorvallis, OR [email protected]

ABSTRACTUnmanned Aerial Vehicles (UAVs) have traditionally beenused for short duration missions involving surveillance ormilitary operations. Advances in batteries, photovoltaicsand electric motors though, will soon allow large numbers ofsmall, cheap, solar powered unmanned aerial vehicles (UAVs)to fly long term missions at high altitudes. This will revo-lutionize the way UAVs are used, allowing them to formvast communication networks. However, to make effectiveuse of thousands (and perhaps millions) of UAVs owned bynumerous disparate institutions, intelligent and robust co-ordination algorithms are needed, as this domain introducesunique congestion and signal-to-noise issues. In this pa-per, we present a solution based on evolutionary algorithmsto a specific ad-hoc communication problem, where UAVscommunicate to ground-based customers over a single wide-spectrum communication channel. To maximize their band-width, UAVs need to optimally control their output powerlevels and orientation. Experimental results show that UAVsusing evolutionary algorithms in combination with appropri-ately shaped evaluation functions can form a robust com-munication network and perform 180% better than a fixedbaseline algorithm as well as 90% better than a basic evolu-tionary algorithm.

Categories and Subject DescriptorsI.2.11 [Computing Methodologies]: Artificial Intelligence—Multiagent systems

General TermsAlgorithms, Performance, Reliability

KeywordsUAVs, Evolution, Multiagent Systems

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GECCO’12, July 7-11, 2012, Philadelphia, Pennsylvania, USA.Copyright 2012 ACM 978-1-4503-1177-9/12/07 ...$10.00.

1. INTRODUCTIONTraditionally, unmanned aerial vehicles (UAVs) have been

used for targeted surveillance and military operations. Whetherthey are powered for many hours by jet engine, or for afraction of an hour by electric motor, UAV missions tend tohave the same profile: They are launched, they carry out amission, then they return [5]. Such missions are naturallytime-limited, labor intensive and logistically challenging, es-pecially since the repeated takeoff and landings need to becoordinated with air traffic control. Currently the number ofsimultaneous UAV missions is in the single digits. However,technology may soon change this. For more than a decade,development has progressed on solar powered UAVs withbattery reserves, allowing them to fly for a certain amountof time after dark. Rapid industry advancements in batterycapacity, electric motor technology and solar cell efficiency,is allowing the progression of solar powered UAVs to proceedeven faster [17]. Once UAV technology goes over the criti-cal hump, where they can fly all night on charges receivedduring the day, the mission profiles of UAVs will change rad-ically: they will go up, and stay up. UAV missions couldlast for months if not years, with just a small turnover formaintenance.

In fact, at the margins, we are already there as the solarpowered QinetiQ’s Zephyr has been demoed, flying for morethan two weeks [15]. Permanently flying UAVs allow aircraftto be used in domains currently monopolized by satelliteand ground based-systems, including two-way communica-tions, continuous surveillance and broadcasts. In additionthe technologies in solar UAVs are relatively cheap, withcontinual price declines. Soon these aircraft will be availableto nearly all nations, institutions and perhaps even individ-uals. Also since they do not need oxygen to burn fuel, theycan be designed to fly at altitudes much higher than conven-tional aircraft can fly, allowing them be used independentlyof the current air traffic system. Due to these factors, therewill likely be an explosion in the number and uses of UAVsin the future [3].

We believe that genetic and evolutionary algorithms incombination with multiagent techniques will have a key rolein the complex problem of controlling such a large diverseset of aircraft. Multiagent systems match the distributednature of the problem, while evolution addresses the messynonlinear control and coordination issues. While the po-tential applications of these UAVs are vast, we believe thatevolution and multiagent systems are applicable to a widerange of UAV applications that share these common prop-erties:

• Long distance communication is easy due to UAVs be-ing line-of-sight to each other and to ground.

• Communication congestion is severe due to UAVs be-ing line-of-sight.

• Coordination of UAVs should be distributed due totheir large numbers, their geographic separation, andUAVs being owned by different institutions.

• Hardware failures and integration failures will be com-mon, due to differing age, types, manufactures andsheer volume of UAVs.

These properties make efficient control difficult with a top-down approach, but are a natural for adaptive and dis-tributed approaches such as evolutionary algorithms andmultiagent systems.

In this paper we apply these multiagent techniques com-bined with evolution to the specific domain of creating anair-to-ground communication network over a single channel.In some ways this model is related to current WiFi networksthat can share many users over a single channel. However,the dynamics of having the signals come from high altitudeUAVs make the problem considerably different than thatof terrestrial communication: Huge numbers of UAVs maybe accessible at once, congestion may be severe, and manyfailed or uncooperative UAVs will be line-of-sight and willhave to be dealt with. In these senses, this domain sharesmany of the qualities that we expect future UAV problems tohave. In addition this domain is important in itself as it al-lows UAV based communication networks to be used for suchpurposes as voice communication and data networks with-out the need of expensive and difficult to maintain ground-based hardware. In addition it allows such networks to becreated in an adhoc way, where different aircraft are ownedby different institutions. If setup properly, this paradigmmay open the path for UAV based communications to beas ubiquitous for long-range communication, as WiFi is forshort-range communication today.

The main contribution of this paper is to present an ap-plication of evolutionary algorithms to the domain of UAVcommunication that is both useful for downlink communi-cation and shows the potential for swarms of high-altitude,low-cost UAVs likely to be common in the future. In Sec-tion 2 we describe current and future UAV systems as wellas multiagent communication. In Sections 3 and 4 we de-scribe details of the UAV communication domain used inthis paper. In Section 5 we discuss how the evolving agentscan be used to maximize customer communication bit rates.In Section 6 we present experimental results, showing howthe evolving agents are able to perform well under varyingconditions. Finally in Section 7 we provide a discussion onthe impact of this application and the implications of theresults.

2. BACKGROUNDIn the near future, solar UAVs will play critical roles in

the military, industrial, scientific, and academic communi-ties [14, 17, 18]. These devices have seemingly limitlessapplications including communications, reconnaissance mis-sions, space launch platforms, and wireless power beaming[10, 14]. Recent missions including NASA’s Pathfinder-Plusand QinetiQ’s Zephyr (which remained airborne for over two

weeks nonstop) have advanced the state of the art in so-lar powered UAVs, taking them from limited mission lifeand endurance to the point they can remain operational forweeks at a time [10, 15]. As a result of the increasing capa-bilities and availability of these devices coupled with theirfalling costs, a plethora of novel domains and applicationswill emerge to utilize the newly developed technological ca-pabilities of these platforms [14].

Methods of controlling and coordinating networks of UAVshave been researched including genetic and evolutionary al-gorithms and reinforcement learning methods[9, 2, 16, 13,11, 7, 6]. Genetic algorithms have been used to evolve de-cision trees that allow UAV teams to collaborate in searchmissions, and to facilitate UAV task assignments [16, 19]. Inaddition evolutionary algorithms have been shown to be ef-fective in single player and multi-player UAV path planningdomains [8]. Genetic programming has also been shown ef-fective in UAV multi-task allocation domains when there islimited communication [4].

In addition to evolutionary and genetic algorithms, swarmtechniques have also been used to coordinate UAVs. In thecooperative hunters domain a swarm of UAVs, using handcoded optimization methods, is used to search for one ormore “smart” targets [2]. Another UAV control problemfocuses on an NP-complete task allocation problem whichassigned tasks to swarms of UAVs [12]. Frequently UAVsare utilized in reconiassance tasks involving Automatic Tar-get Recognition (ATR). In such domains it is desired to havea balance between high coverage of discovered targets andbroad area coverage. One high-performing approach to solv-ing this coordination and control problem utilizes ant-basedswarm methods [12].

3. UAV TO EARTH COMMUNICATIONAs we progress into the information age, communication

becomes an increasingly critical component of every day life.Today, cellular phones, laptops, hand held computers, andother wireless electronic devices have changed the way we seeand interact with the world. At the core of these advance-ments is a well designed wireless communication network,which handles the workload and facilitates information shar-ing between devices connected to the network. Current net-works rely on a series of radio towers to facilitate this infor-mation sharing work load. Traditional towers have workedwell to date, but they have several key drawbacks:

1. They are expensive to build.

2. They are expensive to maintain.

3. They have limited communication due to obstructions(cannot communicate “around” obstructions).

4. They have static placement (holes in coverage areas).

In this paper we focus on a subset of this domain wherethere is a set of UAVs that are flying at fixed locations (flyingin small circles) for long periods of time (perhaps monthsor years) and are transmitting data to a set of customersbelow (see Figure 1). UAVs have an advantage in sendingdata from high altitude in that they can have line-of-sightcommunication to many customers. In addition by virtueof being overhead, such UAVs can focus on what areas ofthe surface they will project most of their signal power to,allowing for better coverage.

In this domain, each UAV can communicate to multiplecustomers. In addition communication is done over a sharedchannel (over the same frequency band) analogous to theway WiFi networks transmit data. Using a shared channelallows the system to be very adhoc, where UAVs can comeand go, and can decide whether or not to participate in thesystem without any need for channel arbitration. Note thatfor simplicity we only look at the download problem, whereUAVs are sending information down to customers. Also wemake no assumptions on how the UAVs get their data feeds.We believe that this half of the problem is the most im-portant, as typical internet use tends to be dominated bydownload traffic. Although the uplink problem is fairly sim-ilar as long as it is done on a different channel than thedownlink.

Figure 1: UAV Communications. A set of UAVs athigh altitude transmit data to a set of customers onground over a single communication channel. Thetask of the system is to maximize average bitratecustomers receive. Multiple UAVs may communi-cate to single customer. A UAV communicates toat most one customer.

4. SIGNAL DYNAMICSWe assume that the UAVs are all at similar altitudes and

communicate through directional antennas pointed towardsthe ground. The amount of area on the ground that is cov-ered by the UAV is determined by the gain of its antenna.Antennas with low gain, transmit over a wider area, butwithin that area the strength of the signal is lower (see Fig-ure 2). Antennas with high gain, have more signal power inthe center of their area, but transmit over a smaller area.The maximum signal received from a UAV is proportionalto the inverse square of the gain radius for the antenna:

Smaxj = aPj/r

2j (1)

where a is a constant, Pj is the power transmitted fromUAV j, and rj is the signal half-power radius for UAV j.Smaxj is the amount of signal received directly at the center

of the transmission. Further from the center, the amountof signal received decreases exponentially according to thesignal radius:

Si,j = e−b

ri,jrj Smax

j (2)

where b is a constant and ri,j is the distance from customer iand the center of UAV j’s transmission. The noise received

by customer i is simply the sum of the signal from all theUAVs it is not communicating with:

Ni =∑j /∈Ji

Si,j + k , (3)

where Ji is the set of UAVs customer i is communicatingwith and k is a constant for background noise. The maxi-mum communication rate for customer i can then be esti-mated from the signal-to-noise ratio using Shannon’s law:

Ci,j = B log2(1.0 + Si,j/Ni) , (4)

where B is the bandwidth of the channel in Hz 1. The totaldata rate for customer i is the sum of the data rates for eachUAV the customer is communicating with:

Ci =∑j∈Ji

Ci,j . (5)

rj

High Gain Low Gain Attenuation

ri,j

Figure 2: Signal Dynamics. UAVs with high-gainantennas throw a strong signal over a small area.UAVs with low-gain antennas throw weaker signalover larger area (Left). The strength of the signaldepends on how far the customer is away from thecenter of the signal cone (Right).

5. SYSTEM EVALUATION FUNCTION ANDAGENTS

The objective of this problem is to maximize the averagedata rate of each customer:

G =1

n

n∑i=1

Ci , (6)

where there are n customers, and G is the system evaluationfunction. Combining equation 1, 2, 3, 4, 5, we obtain:

G =1

n

n∑i=1

∑j∈Ji

B log2

1.0 +

aPj

r2je−b

ri,jrj

∑j /∈Ji

aPj

r2je−b

ri,jrj + k

, (7)

putting our global objective in terms of our control vari-ables: UAV power level, Pj , and indirectly, ri,j , through theorientation of the UAV.

1For simplicity, certain factors, such as relative inversesquare distance signal attenuation are ignored that were de-termined to have little impact on performance.

These controls allow us to change the signal-to-noise char-acteristics at different locations on the ground. However,this is a difficult problem as increasing the signal for onecustomer may increase the noise for another. It is especiallydifficult, since we want this communication network to beadhoc, where it is controlled in a distributed way: UAVsare entering and leaving the system, and some UAVs failto cooperate or operate correctly. Fortunately, evolutionaryalgorithms and multiagent techniques are a natural matchto this problem.

There are many possible agent definitions and controls forthe UAVs in our domain, including altitude, antenna gain,power levels and antenna angle. Here we focus on the lasttwo: adjusting the power level Pj and orientation (the direc-tion the transmitter points to) of each UAV (see Figure 3).We control each of these actions through agents. The so-lution to the full problem consists of the power level andorientation values for all the UAVs. However to simplifythe problem, we break the task into a multiagent system,where a single agent controls both power level and orienta-tion for each UAV. To perform control, each agent makesdiscrete actions. For adjusting power, the action is scaledexponentially to the action:

Pj = Pezpj , (8)

where P is the base power and zpj is the action of the agentfor UAV j controlling its power. To control orientation, anagent chooses one of nine directions: either straight down, orone of eight cardinal directions around the UAV. The angleof the pointing is fixed so that the center of the new orien-tation is moved a distance of r what it would have been if ithad pointed strait down (see Figure 3 - right). As describedin the next section, the values of the controls for each agentare determined by an evolutionary algorithm.

High Power Low Power Orientation

Figure 3: Agent Actions. An agent can choosepower level of UAV within certain range. An agentcan also choose orientation of antenna. The Agentmust choose power levels and orientations to balancegiving more signal to their customers and less noiseto other customers.

5.1 Evolving AgentsThe objective of each agent is to evolve the best values of

power level and orientations that will lead to the best systemfitness evaluation function, G. The value of power level andorientation is determined by the agent’s current policy. Eachpolicy contains a discrete value of zpj determining power level

and a discrete value of zoj determining orientation direction.The power level is discretized to 10 different values. Alongwith the 9 possible orientations each policy for each agenthas a total of 90 different possible values.

An agent’s policy is determined by an evolutionary algo-rithm, where an agent evolves a population of policies. Atevery time step, an agent chooses a policy from its popula-tion of policies using an epsilon-greedy selector, where thebest policy is chosen with probability 1 − ε and a randompolicy is chosen with probability ε. The chosen policy tabledetermines the power and orientation of the UAV for thattime step. Once all the power levels and orientations arechosen the performance of the system is evaluated (see Fig-ure 4). The types of evaluations that can be used is discussedin Section 5.2.

Once the current choice of policy has been evaluated, theevaluation of the policy is updated with a learning rate al-pha: New Value = (1 − α) * Old Evaluation + α * NewEvaluation. Each agent then updates its population by elim-inating the lowest value member of the population, and thencopies the highest value member of population and mutatesit. Mutation is applied by taking a random table entry andsetting each value to a number between 0 and 9 taken froma uniform random distribution. Various other forms of mu-tation were tried including mutating more table entries ateach evolution step, but they did not improve performance.

Agent 1

Population 1

PolicyPolicy

Policy

Agent 2

Population 2

PolicyPolicy

Policy

Agent n

Population n

PolicyPolicy

Policy

Power Level,

Orientation for

UAV 1

Global Evaluation

Agent (Difference) Evaluations

Power Level,

Orientation for

UAV 2

Power Level,

Orientation for

UAV n

Figure 4: Evolving Populations of Agents. Eachagent has its own population of policies. At everystep a policy is chosen from the population, whichdetermines the power level and orientation of a sin-gle UAV. The choice of power level and orienta-tion can then be evaluated in two different ways:1) Global evaluation looks at the value of all of theactions of all of the agents and returns the samevalue for all agents, 2) Shaped agent-specific evalua-tions (implemented with the “difference evaluation”in this paper) make a separate evaluation for eachagent using information about all of the actions fromall agents.

5.2 Agent Evaluation FunctionsThe final issue that needs to be addressed is selecting the

fitness evaluation function for the evolving agents. The firstand most direct approach is to let each agent receive the

system performance as its evaluation function. However, inmany domains using such an evaluation function leads toslow evolution. We will therefore also set up a second evalu-ation function based on an agent-specific evaluation. Giventhat agents aim to maximize their own evaluation functions,a critical task is to create “good” agent evaluation functions,or evaluation functions that when maximized by the agentslead to good overall system performance. In this work wefocus on “difference” evaluation functions which aim to pro-vide an evaluation function that is both sensitive to thatagent’s actions and aligned with the overall system evalua-tion function [1, 20].

5.2.1 Difference Evaluation FunctionConsider difference fitness evaluation function of the form

[1, 20]:

Di = G(z) −G(z − zi) , (9)

where zi is the action of agent i (determined by its policy)as defined in Subsection 5.1, and z− zi are the actions of allthe agents with the action of agent i removed. The secondterm of the difference evaluation, G(z − zi), represents acounterfactual of what the performance of the system is likewhen agent i is removed from the system (i.e. its powerlevel is dropped to zero). By subtracting the counterfactualfrom the original evaluation, this difference in some senseevaluates the agents contribution to the system.

There are two advantages to using D: First, the secondterm removes a significant portion of the impact of otheragents in the system. This happens since the impact of ac-tions that are irrelevant to agent i are removed by the sub-traction. This benefit has been dubbed“learnability”(agentshave an easier time evolving) in previous literature [1, 20].Second, because the second term does not depend on the ac-tions of agent i, any action taken by agent i that improvesD, also improves G. This happens since any action that theagent takes can only affect the first term, because its ac-tion has been eliminated from the second term. This benefitwhich measures the amount of alignment between two eval-uation functions has been dubbed “factoredness” in previousliterature [1, 20].

Substituting Equation 7 into Equation 9, we obtain

Dj′ =1

n

n∑i=1

∑j∈Ji

B log2

1.0 +

aPj

r2je−b

ri,jrj

∑j /∈Ji

aPj

r2je−b

ri,jrj + k

−

1

n

n∑i=1

∑j∈Ji,j 6=j′

B log2

1.0 +

aPj

r2je−b

ri,jrj

∑j /∈Ji,j 6=j′

aPj

r2je−b

ri,jrj + k

,

where we are calculating the difference evaluation for agentj′. Note that the second term of the difference evaluationboth removes the signal received from this agent’s UAV andalso removes its noise.

6. EXPERIMENTSTo test the effectiveness of evolving agents in this UAV

communication domain, we perform an extensive set of ex-periments in simulation. In these simulations, (unless other-wise specified, such as in the scaling experiments) 100 UAVs

are placed at an altitude of 20 miles (representing approxi-mately the maximum altitude a solar powered aircraft canachieve). These UAVs are placed above a 10x10 mile squarearea. The task of the UAVs is to transmit data to customerswithin this 10x10 mile area. In all experiments the channelbandwidth is B = 1Mhz and the noise floor is k = 0.2.In addition, the gain radii, rj , are distributed randomly,uniformly between 0.35 miles and 1.05 miles to representa heterogeneous set of UAVs. For evolution, α = 0.2 andε = 0.25. All experiments results are performed over 30 tri-als. In additional all of our major performance conclusionsare statistically significant with p < 0.05. In this setup wetest the performance of the evolving agents in cases where:

1. Agents control UAVs with no failures and full commu-nication.

2. UAVs (i) fail; (ii) do not coordinate; or (iii) are incom-patible.

3. UAVs have restrictions on their observation capabili-ties.

4. The number of UAVs is scaled to 1000 UAVs.

In all of our scenarios, the number of customers is the sameas the number of UAVs. We do this to model the situationwhere a ground-based “customer” is likely to be a hotspot orsignal repeater. In situations where customers were individ-uals, the number of customers would likely be considerablymore than the number of UAVs.

For each of the cases above, we report results on fourdifferent types of agents to control the power and orientationof the UAVs:

• Static agents always choose maximum power, and straitdown orientation (M).

• Random agents have random evaluation function (R).

• Evolving agents directly maximize system evaluationfunction (G).

• Evolving agents use difference evaluation function (D).

The first two form baselines to asses performance. The nexttwo compare learning rates of traditional agents maximizinga common system evaluation function, and agents indirectlymaximizing the system evaluation function by directly max-imizing the difference evaluation function.

6.1 UAV PerformanceIn the first set of experiments, we have a single agent con-

trol both power level and orientation for a UAV. The resultsdisplayed in Figure 5 show how the first baseline algorithmis able to achieve a data rate of 200Kbits/s to each customer.The results also show that evolving agents maximizing thesystem evaluation function directly (G) perform significantlybetter, performing up to 300Kbits/s. This result shows thatevolution can be very helpful in choosing control policiesin this large system. However, agents evolved using the Devaluation function are able to perform even better, nearlydoubling the performance of the system. The improved per-formance of the difference evaluation has to do with it beingmore specific to the actions of the agent. When an agentchooses a good policy that policy is likely to be evaluatedwell using the difference evaluation. In contrast when using

150

200

250

300

350

400

450

500

550

600

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Kbits

/s p

er C

usto

mer

Evolution Steps

DGRM

Figure 5: System of 100 UAVs where each agent op-timizes both power and orientation. Agents evolvedusing D evaluation function outperforms all othermethods by nearly 2-to-1. This is due to the D eval-uation eliminating irrelevant actions.

G to evaluate that policy it may not get a good evaluation,since the evaluation depends equally on the policy choicesof the 99 other agents.

6.2 Robustness to Failures, Non-coordination,and Incompatibility

For an adhoc UAV communications network to functionproperly, it must be robust to many types of failures. UAVsmay be of different ages, be in different states of repair andmay fail without notice. Even worse than failing completely,a UAV may still be transmitting at high power, but not com-municating with any customer, causing them to add noisewithout any benefit. In this section we show how robustan evolved agent based UAV network can be against thesevarious forms of failures.

0

100

200

300

400

500

600

0 20 40 60 80 100

Kbits

/s p

er C

usto

mer

Percent fail to Transmit

DGRM

Figure 6: Performance under transmission failures.D agents outperform all other methods with up to95% failures.

6.2.1 Failure to TransmitFirst we consider the case where UAVs must turn off all

transmissions due to failure or in order to conserve power.As agents fail, other agents must find ways to adapt to makeup for the loss. As seen in Figure 6, agents using D are ableto perform well, even under high failure rates. While agentsusing the baseline policy as well as agents using G are nothurt significantly from the failures, they still perform worsethan agents evolved using D up to a 95% failure rate.

150

200

250

300

350

400

450

500

550

600

0 20 40 60 80 100Kb

its/s

per

Cus

tom

erPercent fail to Coordinate

DGRM

Figure 7: Performance when agents fail to coordi-nate. D agents outperform all other methods.

0

100

200

300

400

500

600

0 20 40 60 80 100

Kbits

/s p

er C

usto

mer

Percent Incompatible

DGRM

Figure 8: Performance under “incompatible” agents.Agents evolved using D evaluation function outper-form all other methods with up to 90% failures.

6.2.2 Failure to CoordinateNext we consider the case where some UAVs fail to co-

ordinate with the rest of the system and end up taking un-coordinated locally greedy actions. To maximize local per-formance, these agents simply maximize their power level,without regard for the noise they are adding to the system.These agents still contribute to the overall system band-width, but they can potentially harm the system by raisingthe noise levels of other agents. As shown in Figure 7, agentsevolved using D are able to outperform all other methods.

In fact even when 100% of agents greedily maximize theirpower levels, agents evolved using the difference evaluationstill perform better since they are still able to efficientlychoose good orientations.

6.2.3 Failure of CompatibilityIn our final case, we consider the situation where some

of the UAVs are incompatible with the current network. Inthis case, incompatible UAVs still send out noise, but donot actually send data to any of the customers in the sys-tem. Such cases may be common, when protocols change,software is not written to specification or there are multipledifferent networks communicating in the same channel. Fig-ure 8 shows that this type of failure can be very harmful.Still, with a moderate number of incompatible UAVs, agentsevolved using the difference evaluation are able to performwell.

6.3 Observation RestrictionsIn order to demonstrate the concept and the suitability

of the multiagent evolutionary approach to this domain, theresults reported above assume that there are no restrictionsin the observational capabilities of the agents (for example incomputing or receiving the system objective and differenceevaluation functions, or in communicating signal-to-noise ra-tios for all of the customers). Because such assumptions arenot realistic, in this section we explore the impact of suchlimitation on the performance of each of the algorithms (notethat “observation restrictions” here also includes inter-UAVcommunication, but we use the term “observation” to differ-entiate from the main application of UAV to ground com-munication).

150

200

250

300

350

400

450

500

550

600

0 2 4 6 8 10

Kbits

/s p

er C

usto

mer

Observation Radii

DGRM

Figure 9: Performance of a 100 UAV system in thepresence of observation restrictions. Agents canonly observe actions of other agents that lie withina given radii from their location. Agents using Dare able to effectively use additional observationalinformation to coordinate and improve system per-formance, whereas agents using G can be negativelyimpacted by an increased amount of information.

As seen in Figure 9, agents evolved using D and G per-form equally when no observations are possible (agents haveaccess to only their own local information). As the levelof observation increases and agents receive more informa-

tion, an increase in performance could be expected. Thatis not what happens however; agents evolved using G actu-ally experience a decrease in performance as they gain ad-ditional information. This is because the agents don’t knowwhat to do with the extra information they receive. Agentsevolved using D are able to handle this information in a waythat benefits the system performance, allowing them to co-ordinate and make decisions that positively improve systemperformance. Agents evolved using G begin to improve per-formance when the observation radius is below 2.25. Thishappens since the agents are receiving enough informationto make more informed decisions, but not enough informa-tion to negatively impact their performance (too much in-formation causes noise on the agents’ evaluation function).After this point however, the amount of information eachagent receives becomes overwhelming and they are unableto coordinate their actions.

6.4 ScalabilityWhile all of the previous experiments have been performed

with 100 UAVs, we expect future UAV systems to be muchlarger. In this section we test the scalability of our approachby measuring the system performance when the number ofUAVs is scaled from 10 to 1000. To make scaling compa-rable, we also scale the number of customers and the areaof the land serviced by the same amount. The results showthat for more than 100 UAVs, the amount of data each cus-tomer receives is highly stable (see Figure 10). This re-sult suggests that agents evolved using difference evaluationfunctions should be able to efficiently control this system,even when there are a very large number of UAVs.

100

200

300

400

500

600

700

800

900

1000

0 100 200 300 400 500 600 700 800 900 1000

Kbits

/s p

er C

usto

mer

Number of Agents

DGRM

Figure 10: Performance versus scalability where thenumber of customers, agents, and world size arescaled proportionally. As seen, agents evolved us-ing D are able to outperform all other methods bynearly 50% to 100% for any given system size. Agentsusing D with a system size of 1000 UAVs performnearly as well as agents evolved using G when thesystem size is 50 UAVs. This is a twenty fold increasein system size while maintaining good performance.

7. DISCUSSION AND CONCLUSIONIn this paper, we present an important application of mul-

tiagent evolution: Coordinating large numbers of UAVs to

form an air-to-ground communication network. Due to ad-vances in solar, battery and motor technology, such largeUAV networks are becoming an attractive alternative toboth Earth-based and satellite based communication sys-tems. Indeed, such networks will be increasingly powerful,allowing for voice and data networks anywhere in the worldwithout need for expensive and brittle ground-based infras-tructure, or the need for expensive-to-launch and maintainsatellite systems. In addition, with minor modifications, theUAV coordination algorithms can be adapted for other typesof large scale UAV applications, ranging from observationsystems to microwave power delivery systems.

The application chosen in this paper exhibits a numberof salient features that we expect large UAV networks tohave. Due to their high altitude, long-range point-to-pointcommunication is easy. However, long-range signal conges-tion is prevalent. These properties contrast with terrestrialnetworks where point-to-point communication tends to beshort-range and congestion is localized. Such topologicaldifference make UAV-based networks much different thanground-based networks, and necessitates a higher level ofcoordination.

This paper shows how multiagent evolution can be usedto effectively coordinate these UAV networks. In our exper-iments even basic evolution (i.e. evolved using G) is shownto be helpful, performing considerably better than simplebaseline algorithms. However, agents evolving to maximizethe “difference evaluation function” achieve twice the level ofperformance. In addition agents using the difference evalua-tion are able to scale effectively to systems with large num-bers of UAVs. These results are also shown to be robust withrespect to numerous types of failures, incompatibilities andobservational restrictions that will be common in real-worldadhoc networks. The key to these results is that they arebased on large scale UAV coordination, and will extend toother domains where similar congestion exists. For example,two way communication and UAV-to-UAV communicationare natural extensions.

8. ACKNOWLEDGMENTSThis research was partially supported by NSF grants CNS-

0931591 and IIS-0910358.

9. REFERENCES[1] A. K. Agogino and K. Tumer. Analyzing and

visualizing multiagent rewards in dynamic andstochastic environments. Journal of AutonomousAgents and Multi Agent Systems, 17(2):320–338, 2008.

[2] Y. Altshuler, V. Yanovsky, I. Wagner, andA. Bruckstein. The cooperative hunters - efficientcooperative search for smart targets using UAVswarms. Second International Conference onInformatics in Control, Automation and Robotics(ICINCO), 2005.

[3] N. Andre. Design of solar powered airplanes forcontinuous flight. PhD thesis, TechnischeWissenschaften, ETH Zurich, 2008.

[4] G. J. Barlow, C. K. Oh, and S. F. Smith. Evolvingcooperative control on sparsely distributed tasks foruav teams without global communication. InGECCO’08, Atlanta, Ga, July 2008.

[5] O. Barnes, editor. Air Power: UAVs: The Wider

Context. Royal Air Force Directorate of DefenceStudies, 2009.

[6] A. Bell. Learning-based control and coordination ofautonomous UAVs. Master’s thesis, Oregon StateUniversity, 2011.

[7] H. Choi, Y. Kim, and H. Kim. Genetic algorithmbased decentralized task assignment for multipleunmanned aerial vehicles in dynamic environments.International Journal of Aeronautical and SpaceScience, 12(2):163–174, 2011.

[8] J. M. de la Cruz, E. Besada-Portas, L. Torre-Cubillo,B. Andres-Toro, and J. A. Lopez-Orozco. Evolutionarypath planner for uavs in realistic environments. InGECCO’08, pages 1477–1484, Atlanta, Ga, July 2008.

[9] P. Gaudiano, B. Shargel, E. Bonabeau, andB. Clough. Control of UAV swarms - what the bugscan teach us. In AIAA, 2003.

[10] S. Herwitz, L. Johnson, J. Arvesen, R. Higgins,J. Leung, and S. Dunagan. Precision agriculture as acommercial application for solar-powered unmannedaerial vehicles. In American Institute of Aeronauticsand Astronautics (AIAA), 2002.

[11] S. Karaman, T. Shima, and E. Frazzoli. Taskassignment for complex uav operations using geneticalgorithms. In AIAA Guidance, Navigation, andControl Conference, August 2009.

[12] D. Miller, P. Dasgupta, and T. Judkins. Distributedtask selection in multi-agent based swarms usingheuristic strategies. Lecture Notes in ComputerScience, 2007.

[13] J. Perron, J. Hogan, B. Moulin, J. Berger, andM. Belanger. A hybrid approach based on multiagentgeosimulation and reinforcement learning to solve auav patrolling problem. In Proceedings of the 2008Winter Simulation Conference, 2008.

[14] N. Picon. Effects of wireless power beaming in thespace industry: Modern applications and futurepossibilities. Triple Helix - Explorations in Science,Society and Law, 2011.

[15] QinetiQ. www.qinetiq.com, 2011.

[16] M. D. Richards, D. Whitley, and J. R. Beveridge.Evolving cooperative strategies for uav teams. InGECCO ’05, pages 1721–1728, Washington, D.C.,June 2005.

[17] G. Romeo, G. Frulla, and E. Cestino. Design of solarhigh altitude long endurance aircraft for multi payloadand operations. Aerospace Science and Technology,10(6):541 – 550, 2006.

[18] H. Runge, W. Rack, A. Ruiz-Leon, and M. Hepperle.A solar powered hale-uav for arctic research. In 1stCEAS European Air and Space Conference, 2007.

[19] T. Shima, S. J. Rasmussen, and A. G. Sparks. Uavcooperative multiple task assignments using geneticalgorithms. In 2005 American Control Conference,pages 2989–2994, Portland, OR, June 2005.

[20] K. Tumer and D. Wolpert, editors. Collectives and theDesign of Complex Systems. Springer, New York, 2004.

evolving large scale uav communication system

Documents