a predator-prey particle swarm optimization...
TRANSCRIPT
IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11
A Predator-prey Particle Swarm Optimization
Approach to Multiple UCAV Air Combat
Modeled by Dynamic Game TheoryHaibin Duan, Pei Li, and Yaxiang Yu
Abstract—Dynamic game theory has received considerableattention as a promising technique for formulating control actionsfor agents in an extended complex enterprise that involves anadversary. At each decision making step, each side seeks the bestscheme with the purpose of maximizing its own objective func-tion. In this paper, a game theoretic approach based on predator-prey particle swarm optimization (PP-PSO) is presented, and thedynamic task assignment problem for multiple unmanned combataerial vehicles (UCAVs) in military operation is decomposed andmodeled as a two-player game at each decision stage. The optimalassignment scheme of each stage is regarded as a mixed Nashequilibrium, which can be solved by using the PP-PSO. Theeffectiveness of our proposed methodology is verified by a typicalexample of an air military operation that involves two opposingforces: the attacking force RedRedRed and the defense force BlueBlueBlue.
Index Terms—Unmanned combat aerial vehicle (UCAV), gametheory, air combat, predator-prey, particle swarm optimization(PSO), Nash equilibrium.
I. INTRODUCTION
COMPARED to unmanned combat aerial vehicles(UCAVs) that perform solo missions, greater efficiency
and operational capability can be realized from teams ofUCAVs operating in a coordinated fashion[1−5]. DesigningUCAVs with intelligent and coordinated action capabilities toachieve an overall objective is a major part of multiple UCAVscontrol in a complicated and uncertain environment[6−10].Actually, a military air operation involving multiple UCAVsis a complex dynamic system with many interacting decision-making units which have even conflicting objectives. Modelingand control of such a system is an extremely challenging task,whose purpose is to seek a feasible and optimal scheme toassign the limited combat resource to specific units of theadversary while taking into account the adversary′s possibledefense strategies[8, 11]. The difficulty lies not only in that it isoften very difficult to mathematically describe the underlying
Manuscript received July 24, 2013; accepted July 18, 2014. This workwas supported by National Natural Science Foundation of China (61425008,61333004, 61273054), Top-Notch Young Talents Program of China, andAeronautical Foundation of China (2013585104). Recommended by AssociateEditor Changyin Sun
Citation: Haibin Duan, Pei Li, Yaxiang Yu. A predator-prey particle swarmoptimization approach to multiple UCAV air combat modeled by dynamicgame theory. IEEE/CAA Journal of Automatica Sinica, 2015, 2(1): 11−18
Haibin Duan, Pei Li, and Yaxiang Yu are with the Science andTechnology on Aircraft Control Laboratory, School of Automation Sci-ence and Electrical Engineering, Beihang University (BUAA), Beijing100191, China (e-mail: [email protected]; [email protected];[email protected]).
processes and objectives of the decision maker but also in thatthe fitness of one decision maker depends on both its owncontrol input and the opponent′s strategies as well.
Dynamic game theory has received increasingly intensiveattention as a promising technique for formulating actionstrategies for agents in such a complex situation, which in-volves competition against an adversary. The priority of gametheory in solving control and decision-making problems withan adversary opponent has been shown in many studies[12−15].A game theory approach was proposed for target trackingproblems in sensor networks in [14], where the target isassumed to be an intelligent agent who is able to maximizefiltering errors by escaping behavior. The pursuit-evasion gameformulations were employed in [16] for the development ofimproved interceptor guidance laws. Cooperative game theorywas used to ensure team cooperation by Semsar-Kazerooni etal.[13], where a team of agents aimed to accomplish consensusover a common value for their output.
Although finding the Nash equilibrium in a two-player gamemay be easy since the zero-sum version can be solved inpolynomial time by linear programming, this problem has beenproved to be indeed PPAD-complete[17−18]. So the problemof computing Nash equilibria in games is computationallyextremely difficult, if not impossible. Based on the analogyof the swarm of birds and the school of fish, Kennedy andEberhart developed a powerful optimization method, particleswarm optimization (PSO)[19−20], addressing the social inter-action, rather than purely individual cognitive abilities. As oneof the most representative method aiming at producing com-putational intelligence by simulating the collective behaviorin nature, PSO has been seen as an attractive optimizationtool for the advantages of simple implementation procedure,good performance and fast convergence speed. However, ithas been shown that this method is easily trapped into localoptima when coping with complicated problems, and varioustweaks and adjustments have been made to the basic algorithmover the past decade[20−22]. To overcome the aforementionedproblems, a hybrid predator-prey PSO (PP-PSO) was firstlyproposed in [21] by introducing the predator-prey mechanismin the biological world to the optimization process.
Recently, bio-inspired computation in UCAVs have attractedmuch attention[23−25]. However, the game theory and solutionsto the problem of task assignment have been studied indepen-dently. The main contribution of this paper is the development
12 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015
of a game theoretic approach to dynamic UCAV air combatin a military operation based on the PP-PSO algorithm. Thedynamic task assignment problem is handled from a gametheoretic perspective, where the assignment scheme is obtainedby solving the mixed Nash equilibrium using PP-PSO at eachdecision step.
The remainder of the paper is organized as follows: SectionII describes the formulation of the problem, including the at-trition model of a military air operation and its game theoreticrepresentation. Subsequently, we propose a predator-prey PSOfor the mixed Nash equilibrium computing of two-player, non-cooperative game in Section III. An example of an adversaryscenario of UCAV combat involving two opposing sides ispresented in Section IV to illustrate the effectiveness andadaption of the proposed methodology. Concluding remarksare offered in the last section.
II. A DYNAMIC GAME THEORETIC FORMULATION FORUCAV AIR COMBAT
A. Dynamic Model of UCAV Air Combat
There are two combat sides in the UCAV air combat model.Specifically, the attacking side is labeled as Red and the de-fending force as Blue. Each side consists of different combatunits, which are made up of different numbers of combatplatforms armed with weapons. Each unit is fully described byits location, number of platforms and the average number ofweapons per platform. Thus, the state of each unit at time k isdefined as ξX
i (k) =[xX
i (k) , yXi (k) , pX
i (k)wXi (k)
], where
X denotes the Red force or Blue force,[xX
i (k) , yXi (k)
]represents the unit location, pX
i (k) corresponds to the numberof platforms of the ith unit at time k, and wX
i (k) to the numberof weapons on each platform in the ith unit of X . The numberof platforms for the moving units changes according to thefollowing attrition equations
pXi (k + 1) = pX
i (k) Aki (k) . (1)
The term in (1) represents the percentage of platforms in theith unit of X force, which survive the transition from time k tok+1. For each unit in force X , this percentage is dependent onthe identities of the attacking and the attacked units determinedby the choice of target control, and is expressed as
AXi (k) = 1−
Nd∑
j=1
QXYij (k)PXY
ij (k). (2)
It is assumed in (2) that Nd units of Y fire at the ith unitof X . The engagement factor QXY
ij (k) of the jth unit of Yattacking the ith unit of X at time k is computed from
QXYij = βXY
ij
[1− exp
(−pY
j (k)pX
i (k)
)], (3)
where βXYij represents the probability that the jth unit of Y
acquires the ith unit of X as a target, and is calculated by
βXYij =
{1, if pY
j − pXi ≥ 0,
exp(pY
j − pXi
), if pY
j − pXi < 0.
(4)
The attrition factor PXYij (k) in (2) represents the probability
of the platforms in the ith unit of X being destroyed by the
salvo of sYj (k) fired from the jth unit of Y at time k, and is
computed as follows
PXYij (k) =
[1− (
1− βwPKXYij
)SYj (k)
], (5)
where the term 0 ≤ βw ≤ 1 represents the weather impactwhich reduces the kill probability according to the weathercondition, i.e., 1 corresponds to ideal weather condition while0 corresponds to the worst weather condition. PKXY
ij is theprobability of ith unit of X being completely destroyed by thejth unit of Y under ideal weather and terrain conditions.
In the equation mentioned above sYi is the average effective
kill factor when the jth unit of X attacks the ith unit of Ywith salvo cY
i (k), and is calculated from
sYj =
cYj (k) pY
j (k)pX
i (k)
(pY
j (k)pX
i (k)
)c−1
, (6)
where cYi is the salvo size of jth combat unit of Y and c is a
constant referred to as Wes coefficient.The control vector for each unit for both sides is chosen as
uXi (k) =
[V X
xi(k) V X
yi(k) cX
xi(k) dX
xi(k)
], (7)
where V Xxi
(k) and V Xyi
(k) are respectively the relocatingcontrol corresponding to the x-coordinate and y-coordinate,dX
xi(k) is the number of units that fire at ith unit of X , and
cXxi
(k) is the salvo size control variable that decides howmany weapons should fire. The number of weapons is updatedaccording to
wXi (k + 1) = wX
i (k)− cXi (k) . (8)
The state equations of each unit engaged in an air combatare defined as
xXi (k + 1) = xX
i (k) + V Xxi
(k) ,yX
i (k + 1) = yXi (k) + V X
yi(k) ,
pXi (k + 1) = pX
i (k) Aki (k) ,
wXi (k + 1) = wX
i (k) + cXi (k) .
(9)
B. Game Theoretic Formulation for UCAV Air Combat
The problem of dynamic task assignment in the air combatis modeled from a game theoretic perspective in this paper.Suppose Red consists of NR units (UCAVs) and it fires CRmissiles during each attack or defense. There are NB units inthe Blue force, whose salvo size is also a constant CB. Ateach decision making step k, both sides decide on which unitsof its own side should be chosen to attack and which units ofthe opponent should be chosen as targets, with the purpose ofmaximizing its own objective function. Each combination ofattacking and attacked units is seen as a pure strategy in thegame. For each side, the number of pure strategies is calculatedas
NRS = CCRNR · CCR
NB · CR!, (10)
NBS = CCBNR · CCB
NB · CB!. (11)
DUAN et al.: A PREDATOR-PREY PARTICLE SWARM OPTIMIZATION APPROACH TO MULTIPLE UCAV AIR COMBAT MODELED BY · · · 13
The payoff matrix for both sides is an NRS ×NBS matrix,expressed as
MRNRS×NBS=
J1,1R(k) J1,2
R(k) · · · J1,NBS
R(k)J2,1
R(k) J2,2R(k) · · · J2,NBS
R(k)...
.... . .
...JNRS ,1
R(k) JNRS ,2R(k) · · · JNRS ,NBS
R(k)
,
(12)
MBNRS×NBS=
J1,1B(k) J1,2
B(k) · · · J1,NBS
B(k)J2,1
B(k) J2,2B(k) · · · J2,NBS
B(k)...
.... . .
...JNRS ,1
B(k) JNRS ,2B(k) · · · JNRS ,NBS
B(k)
.
(13)
Each entry JRi,j (k) in MR corresponds to the payoff of
Red when it takes the ith pure strategy against the jth purestrategy of Blue. For the attacking force of Red, the objectivefunction JR
i,j (k) is calculated as
JR(k) =NR∑
i=1
εipi
R(k)pi
R(0)−
NB∑
i=1
τipi
B(k)pi
B(0), (14)
where ε and τ are weight coefficients. The objection functionfor the defense of Blue is calculated, by the same token, as
JB(k) = −NR∑
i=1
a′ipi
R(k)pi
R(0)+
NB∑
i=1
b′ipi
B(k)pi
B(0). (15)
From a game theoretic point of view, the cooperative UCAVtask assignment problem is for the tagged side, Red, tomaximize its own payoff at each decision step, by calculatinga mixed Nash equilibrium for the NRS ×NBS matrix game.
III. PREDATOR-PREY PSO FOR THE MIXED NASHSOLUTION
A. Predator-prey PSO
In the gbest-model of PSO, each particle has informationof its current position and velocity in the solution space[21].And it has the best solution found so far of itself as pbest andthe best solution of a whole swarm as gbest. The gbest-modelcan be expressed as
vij(k + 1) = ωvij(k) + c1r1[pi(k)− xij(k)]+c2r2[gi(k)− xij(k)], (16)
xij(k + 1) = xij(k) + vij(k + 1), (17)
where vij(k) and xij(k) respectively denote the velocity andposition of the ith particle in the jth dimension at stepk, and c1 and c2 are weight coefficients, r1 and r2 arerandom numbers between 0 and 1 to reflect the stochasticalgorithm nature. The personal best position pi correspondsto the position in the search space where particle i has theminimum fitness value. The global best position denoted by
gi represents the position yielding the best fitness value amongall the particles.
Unfortunately, the basic PSO algorithm is easy to fall intolocal optima. In this condition, the concept of predator-preybehavior is introduced into the basic PSO to improve theoptima finding performance[26−28]. This adjustment takes acue from the behavior of schools of sardines and pods of killerwhales. In this model, particles are divided into two categories,predator and prey. Predators show the behavior of chasing thecenter of preys′ swarm; they look like chasing preys. Andpreys escape from predators in the multidimensional solutionspace. After taking a tradeoff between predation risk andtheir energy, escaping particles would take different escapingbehaviors. The velocities of the predator and the prey in thePP-PSO can be defined by
vdij(k + 1) = ωdvdij(k) + c1r1[pdij(k)− xdij(k)]+c2r2[gdj(k)− xdij(k)] + c3r3[gj(k)− xdij(k)], (18)
vrij(k + 1) = ωrvrij(k) + c4r4[prij(k)− xrij(k)]+c5r5[grj(k)− xrij(k)] + c6r6[gj(k)− xrij(k)]−Pasign[xdIj(k)− xrij(k)] exp[−b|xdIj(k)− xrij(k)|],
(19)
where d and r denote the predator and prey, respectively, pdi
is the best position of predators, pri is the best position ofpreys, g is the best position which all the particles have everfound. And ωd and ωr are defined as
ωd = 0.2 exp(−10
iteration
iterationmax
)+ 0.4, (20)
ωr = ωmax − ωmax − ωmin
iterationmaxiteration, (21)
where ωd and ωr are the inertia weights of predators and preys,which regulate the trade-off between the global (wide-ranging)and local (nearby) exploration abilities of the swarm andare considered critical for the convergence behavior of PSO.iterationmax represents the maximum number of iterationsand ωmax and ωmin denote the maximum and minimum valueof ωr, respectively. And the definition of I is given by thefollowing expression
I = {k|mink
(|xdk − xri|)}. (22)
Then I denotes the number of the ith prey’s nearest predator.In (18), P is used to decide if the prey escapes or not (P = 0or P = 1), and a and b are the parameters that determines thedifficulty of the preys escaping from the predators. The closerthe prey and the predator, the harder the prey escapes fromthe predator. Moreover, a and b are shown as
a = xspan, b =100
xspan, (23)
where xspan is the span of the variable.
14 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015
B. Nash Equilibrium
As a competitive (non-cooperative) strategy of multi-objective multi-criterion system first proposed by Nash[29],Nash equilibrium is basically a local optimum: a strategyprofile (s1, s2, · · · , sn) such that no player can benefit fromswitching to a different strategy if nobody else switches,∀i,∀s′i ∈ Sj
UPj(s1, · · · ,si−1, si, si+1, · · · , sn) ≥
UPj(s1, · · · , si−1, s
′i, si+1, · · · , sn), (24)
where UPjdenotes the expected payment of person j, sj and
Sj respectively denote the ith strategy of player j and the setof strategies. Note that every dominant strategy equilibriumis a Nash equilibrium, but not vice versa. Every game hasone Nash equilibrium at least. In this paper, the expectedpayment is substituted by the objective function which isused to calculate the payoff matrixes denoted by MAm×n
and MBm×n. We define the vector of mixed strategies forboth sides as Xi = (xi1, · · · , xim, yi1, · · · , yin), such thatxik ≥ 0, yik ≥ 0,
∑mk=1 xik = 1,
∑nk=1 yik = 1, where
m = NRS and n = NBS . So, for each mixed strategy Xi, theNash equilibrium solution (X∗, Y ∗) must satisfy the givenconditions{
MA(Y ∗)′ ≥ MA(Xi (m + 1 : m + n))′
X∗MB ≥ Xi (1 : m) MB(25)
C. Proposed Approach for the Mixed Nash Equilibrium
For utilizing the proposed algorithm to compute Nashequilibrium here, we give the fitness functions as
f [Xdit] =
max1≤j≤m
{MR(j, :)(Xtdi,m+1:m+n)′−
Xtdi,1:mMR(Xt
di,m+1:m+n)′}+max
1≤j≤n{Xt
di,1:mMB(:, j)−Xt
di,1:mMB · (Xtdi,m+1:m+n)′}, (26)
f [Xrit] =
max1≤j≤m
{MR(j, :)(Xtri,m+1:m+n)′−
Xtri,1:mMR(Xt
ri,m+1:m+n)′}+max
1≤j≤n{Xt
ri,1:mMB(j, :)−Xt
ri,1:mMB(Xtri,m+1:m+n)′}. (27)
In the last two expressions, Xdi,1:m(k) means the mixedstrategies which are produced by the ith predator for theA force and the B force, respectively. Similarly, Xri,1:m(k)and Xri,m+1:m+n(k) denote the mixed strategies which areproduced by the ith prey for the A and B forces. Note thatthe proposed variables must satisfy the following conditions:
Xdi,j(k) ≥ 0, Xri,j(k) ≥ 0, (28)
m∑j=1
Xdi,j = 1,m+n∑
j=m+1
Xdi,j = 1,
m∑j=1
Xri,j = 1,m+n∑
j=m+1
Xri,j = 1.(29)
Importantly, the mixed Nash equilibrium corresponds tothe minimum of the fitness function and the optimal orthe sub-optimal solution will be the closest to zero. Thedetailed procedure of PP-PSO for the mixed Nash equilibriumcomputing is demonstrated in Fig. 1.
PROCEDURE Mixed Nash equilibrium computing based on the PP-PSO
BEGIN
Step 1: Initialize
Set m nMA
and m nMB
. Set the maximum iteration number maxN , the number of the
predators dm and the number of the preys rm . Randomly initialize the positions and
velocities of the predators dx and dv respectively, and both have the same dimensions
which are dm by m n . So are rx and rv .
Step 2:
(1) Let 1k ;
(2) Calculate the fitness value of all the particles in iteration k , and then find out the
minimum fitness value of the predators as !dpbest k , that of the preys as !rpbest k , and
that of all the particles as !pbest k .
Step 3:
(1) Let 1k k ! ;
(2)Update all the positions and the velocities according to (13) and (14). Then repeat (2) in
Step2.
Step 4: maxk N ?
(1)Yes: stop and output results;
(2)No: go to step3.
End
Fig. 1 Procedure of Nash equilibrium computing based on the PP-PSO.
To validate the effectiveness of the proposed method, herewe illustrated the Nash equilibrium computing both for zero-sum game and non-zero-sum game using two simple examples.For a fair comparison among these two method, they usethe same maximum iteration number Nmax = 100, the samepopulation size m = 30, and the same up and lower boundsfor inertia weights ωmax = 0.9, ωmin = 0.2. Besides, in ourproposed PP-PSO, the numbers of predators and preys are setmd = 10, mr = 20, respectively.
Example 1. Consider two-person, zero-sum game and non-zero-sum game illustrated by Tables I and II[30].
TABLE IIPAY-OFF MATRIX OF A AND B IN A TWO PLAYER,
NON-ZERO-SUM GAME
HHHHHA
BI II III IV
I (1, 1) (235, 0) (0, 235) (0.1, 1.1)
II (0, 235) (1, 1) (235, 0) (0.1, 1.1)
III (235, 0) (0, 235) (1, 1) (0.1, 1.1)
IV (1.1, 0.1) (1.1, 0.1) (1.1, 0.1) (0, 0)
DUAN et al.: A PREDATOR-PREY PARTICLE SWARM OPTIMIZATION APPROACH TO MULTIPLE UCAV AIR COMBAT MODELED BY · · · 15
As we can see from the above two tables, the first columnand the first row represents the strategies of player A and B,respectively. For example, in the zero-sum game, each playerhas three strategies, which are specified by the number ofrows and the number of columns. The payoffs are providedin the interior. The first number is the payoff received by thecolumn player; the second is the payoff for the row player.To reduce statistical errors, each algorithm is tested 100 timesindependently for these two games. Evolution curves for thetwo-player, zero sum game are depicted in Figs. 2∼ 4. Besides,the simulation results are illustrated from the perspective ofaverage fitness value, best fitness value ever found (Tables IIIand IV), the minimum error and times that the results satisfiedthat error ≤ 0.01, where error is defined as the followingexpressions:
eh(k) =md∑i=1
‖Xdi,1:m+n(k)− E S‖+mr∑i=1
‖Xri,1:m+n(k)− E S‖md + mr
,
(30)
eb(k) =
mb∑i=1
‖Xi,1:mb(k)− E S‖
mb, (31)
Fig. 2. Comparison results of average fitness values for the twoplayer, zero-sum game.
Fig. 3. Comparison results of average errors for the two player, zero-sum game.
Fig. 4. Comparison results of global best solutions for the twoplayer, zero-sum game.
where eh(k) and eb(k) denote the error of the basic PSOand our proposed PP-PSO, E S represents the mixed Nashequilibrium solution of the game that the players participatein. Note that for the zero-sum game shown in Table I,E S =
[2152 , 12
52 , 1952 , 2
13 , 313 , 8
13
]. And for the non-zero-sum
game shown in Table II, E S =[13 , 1
3 , 13 , 0, 1
3 , 13 , 1
3 , 0].
TABLE IIICOMPARISON RESULTS FOR THE TWO PLAYER, ZERO-SUM
GAME
Average Best Minimum Successfulfitness fitness error hits
PP-PSO 0.2439 5.92E−4 0.0091 79Basic PSO 0.3153 0.0074 0.0145 2
TABLE IVCOMPARISON RESULTS FOR THE TWO PLAYER,
NON-ZERO-SUM GAME
Average Best Minimum Successfulfitness fitness error hits
PP-PSO 0.1352 3.836E−13 0.0069 91Basic PSO 0.6358 0.2897 0.0085 14
It is reasonable to conclude from the simple exampledemonstrated above that the proposed PP-PSO outperformsthe basic PSO in terms of solution accuracy, convergencespeed, and reliability for Nash equilibrium computing. So it isappropriate to use this method to solve the problem of multipleUCAV air combat modeled by dynamic game theory in thefollowing section.
IV. GAME THEORETIC APPROACH TO UCAV AIR COMBATBASED ON PP-PSO
A. Experimental Settings
To validate the effectiveness of the dynamic game theoreticformulation for UCAV air combat, a computational exampleis performed based on Matlab 2009b using our proposed PP-PSO. Consider an adversary scenario involving two opposingforces here. The attacking force is labeled as Red team, whilethe defending force is labeled as Blue team. The mission
16 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015
of the Blue force is transporting military supplements fromits base to the battlefront while the task of the Red force isattacking and destroying the aerotransports of the Blue forceat least 80 % and then returning to their air base.
As shown in Fig. 5, the Blue force consists of one trans-portation unit, which is represented by the solid square, andtwo combat units. They are on the way back to the base afteraccomplishing a military mission. The Red force consists ofthree combat units and aims to destroy the Blue transportationunit. The Red force is also programmed to return the baseafter the mission. For simplification of the problem, each unitof both sides is assumed to consist of the same type of UCAVs,and each UCAV is equipped with a certain number of air-to-air missiles. Assume that the speed of Red force is nearly0.25 km/s while the speed of Blue force is nearly 0.2 km/s,and the state variables will be updated every 2 minutes. Sothe positions of the Red force and the Blue force will change30 km and 24 km, respectively at each step. The configurationparameters used in the simulation for the Red force and theBlue force are listed in Table V and Table VI, respectively.
Fig. 5. Scenario of cooperative UCAV task assignment.
TABLE VINITIAL CONFIGURATION OF Red FORCE(xA
i (0), yAi (0)
)pA
i (0) wAi (0) PKa−bi rmA
UAV1 (200, 300) km 10 4 0.620 kmUAV2 (201, 299) km 10 4 0.4
UAV3 (202, 298) km 10 4 0.4
TABLE VIINITIAL CONFIGURATION OF Blue FORCE
(xB
i (0), yBi (0)
)pB
i (0) wBi (0) PKb−ai rmB
UAV1 (2, 2) km 10 0 020 kmUAV2 (3, 2) km 7 3 0.7
UAV3 (4, 2) km 7 3 0.7
In the simulation, the objective functions of the two forcesare chosen as
JR(k) =3∑
i=1
0.4pi
R(k)pi
R(0)− p1
B(k)p1
B(0)−
3∑
i=2
0.3pi
B(k)pi
B(0), (32)
JB(k) = −3∑
i=1
0.6pi
R(k)pi
R(0)+ 0.5
p1B(k)
p1B(0)
+3∑
i=2
0.5pi
B(k)pi
B(0).
(33)
B. Experimental Results and Analysis
Fig. 6 presents the flying trajectories for both sides in theair military operations, which result from the proposed gametheoretic formulation of task assignment in a dynamic combatenvironment and the PP-PSO based solution methodology.The Red force starts from near its base and launches attacksto eliminate the Blue transportation force, which is on theway returning to its base after a military mission. The taskassignment scheme for both sides are calculated based on theproposed approach described above.
Fig. 6. Resulting trajectories of both sides from the proposed ap-proach.
The detailed evolution and convergence behavior with timeof platform numbers in combating units are shown in Fig. 7.The combating units of both sides start to fight at the 9thtime step. After 3 time steps of engagement, this air militaryoperation ends up at the 11th time step, with Red defeatingthe Blue force and the surviving forces of both sides returningto their own bases. At the end of the combat, the Red forcemanages to inflict more than 90 % of the platforms in Blue′stransportation unit, 78 % of platforms in B2, and 70 % ofplatforms in B3. Meanwhile, the Red team pays a price forthe victory. The first unit R1 and the third one R3 suffer aslight damage and 30 % platforms are destroyed in the attack.However, the second unit R2 of the Red force suffer a seriousdamage, with more than 60 % of the platforms being destroyedin the engagement with the Blue force. The result can beexplained from Fig. 8, where snapshots of the dynamic taskassignment results at Steps 9, 10, and 11 are given. It illustratesthat the engagement of both sides has a different pattern ateach time step and proves the task assignment process to be adynamic process with time. At the 9th and 10th time steps, thesecond unit R2 of the Red force takes actions in accordancewith the resulting mixed Nash equilibrium and chooses the
DUAN et al.: A PREDATOR-PREY PARTICLE SWARM OPTIMIZATION APPROACH TO MULTIPLE UCAV AIR COMBAT MODELED BY · · · 17
(a) Platforms in R1 (b) Platforms in R2
(c) Platforms in R3 (d) Platforms in transportation unit of Blue force
(e) Platforms in B2 (f) Platforms in B3Fig. 7. Number of platforms.
first and second units of Blue force as targets, respectively.However, the Blue force insists on attacking R2 by the firstunit B1, which is the most powerful unit of its 10 combatplatforms. Consequently, R2 suffers the most serious damagein the three units of Red.
It is important to note that both the attacking side andthe defense side take advantage of the proposed approach toacquire the assignment scheme over the engagement duration.Therefore, the engagement outcome mainly depends on theinitial configuration of each force. The Red force has anadvantage in performances and numbers of weapons, which
Fig. 8. Snapshots of dynamic task assignment results.
18 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015
implies the possible result of Blue′s defeat. The experimentalresults are coincident with the theoretical analysis and verifythe effectiveness and feasibility of the proposed approach.
V. CONCLUSIONS
This paper developed a game theoretic method for UCAVcombat, which is based on the PP-PSO model. By consideringboth the adversary side and the attacking side as rationalgame participants, we represented the task allocation schemeas an optional policy set of both sides, and the cooperativetask allocation results of both sides were achieved by solvingthe mixed Nash equilibrium using PP-PSO. An example ofmilitary operation involving an attacking side Red and adefense side Blue was presented to verify the effectivenessand adaptive ability of the proposed method. Simulation resultsshow that the combination of game theoretic representationof the task assignment and the application of PP-PSO for themixed Nash solutions can effectively solve the UCAV dynamictask assignment problem involving an adversary opponent.
REFERENCES
[1] Richards A, Bellingham J, Tillerson M, How J. Coordination and controlof multiple UAVs. In: Proceedings of the 2002 AIAA Guidance, Navi-gation, and Control Conference. Monterey, CA: AIAA, 2002. 145−146
[2] Alighanbari M, Kuwata Y, How J P. Coordination and control of multipleUAVs with timing constraints and loitering. In: Proceedings of the2003 American Control Conference. Denver, Colorado: IEEE, 2003.5311−5316
[3] Li C S, Wang Y Z. Protocol design for output consensus of port-controlled Hamiltonian multi-agent systems. Acta Automatica Sinica,2014, 40(3): 415−422
[4] Duan H, Li P. Bio-inspired Computation in Unmanned Aerial Vehicles.Berlin: Springer-Verlag, 2014. 143−181
[5] Duan H, Shao S, Su B, Zhang L. New development thoughts on the bio-inspired intelligence based control for unmanned combat aerial vehicle.Science China Technological Sciences, 2010, 53(8): 2025−2031
[6] Chi P, Chen Z J, Zhou R. Autonomous decision-making of UAV basedon extended situation assessment. In: Proceedings of the 2006 AIAAGuidance, Navigation, and Control Conference and Exhibit. Colorado,USA: AIAA, 2006.
[7] Ruz J J, Arelo O, Pajares G, de la Cruz J M. Decision making amongalternative routes for uavs in dynamic environments. In: Proceedingsof the 2007 IEEE Conference on Emerging Technologies and FactoryAutomation. Patras: IEEE, 2007. 997−1004
[8] Jung S, Ariyur K B. Enabling operational autonomy for unmanned aerialvehicles with scalability. Journal of Aerospace Information Systems,2013, 10(11): 516−529
[9] Berger J, Boukhtouta A, Benmoussa A, Kettani O. A new mixed-integerlinear programming model for rescue path planning in uncertain adver-sarial environment. Computers & Operations Research, 2012, 39(12):3420−3430
[10] Duan H B, Liu S. Unmanned air/ground vehicles heterogeneous coop-erative techniques: current status and prospects. Science China Techno-logical Sciences, 2010, 53(5): 1349−1355
[11] Cruz Jr J B, Simaan M A, Gacic A, Jiang H, Letelliier B, Li M, LiuY. Game-theoretic modeling and control of a military air operation.IEEE Transactions on Aerospace and Electronic Systems, 2001, 37(4):1393−1405
[12] Dixon W. Optimal adaptive control and differential games by reinforce-ment learning principles. Journal of Guidance, Control, and Dynamics,2014, 37(3): 1048−1049
[13] Semsar-Kazerooni E, Khorasani K. Multi-agent team cooperation: agame theory approach. Automatica, 2009, 45(10): 2205−2213
[14] Gu D. A game theory approach to target tracking in sensor networks.IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernet-ics, 2011, 41(1): 2−13
[15] Duan H, Wei X, Dong Z. Multiple UCAVs cooperative air combatsimulation platform based on PSO, ACO, and game theory. IEEEAerospace and Electronic Systems Magazine, 2013, 28(11): 12−19
[16] Turetsky V, Shinar J. Missile guidance laws based on pursuit-evasiongame formulations. Automatica, 2003, 39(4): 607−618
[17] Porter R, Nudelman E, Shoham Y. Simple search methods for find-ing a Nash equilibrium. Games and Economic Behavior, 2008, 63(2):642−662
[18] Chen X, Deng X, Teng S-H. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM, 2009, 56(3): Article No.14
[19] Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedingsof the 1st IEEE International Conference on Neural Networks. Perth,Australia: IEEE, 1995. 1942−1948
[20] Eberhart R, Kennedy J. A new optimizer using particle swarm theory.In: Proceedings of the 6th International Symposium on Micro Machineand Human Science. Nagoya: IEEE, 1995. 39−43
[21] Higashitani M, Ishigame A, Yasuda K. Particle swarm optimizationconsidering the concept of predator-prey behavior. In: Proceedings ofthe 2006 IEEE Congress on Evolutionary Computation. Vancouver, BC,Canada: IEEE, 2006. 434−437
[22] Liu F, Duan H B, Deng Y M. A chaotic quantum-behaved particleswarm optimization based on lateral inhibition for image matching.Optik-International Journal for Light and Electron Optics, 2012, 123(21):1955−1960
[23] Edison E, Shima T. Genetic algorithm for cooperative UAV task as-signment and path optimization. In: Proceedings of the 2008 AIAAGuidance, Navigation and Control Conference and Exhibit. Honolulu,Hawaii: AIAA, 2008. 340−356
[24] Duan H, Luo Q, Shi Y, Ma G. Hybrid particle swarm optimizationand genetic algorithm for multi-UAV formation reconfiguration. IEEEComputational Intelligence Magazine, 2013, 8(3): 16−27
[25] Liu G, Lao S Y, Tan D F, Zhou Z C. Research status and progress onanti-ship missile path planning. Acta Automatica Sinica, 2013, 39(4):347−359
[26] Duan H B, Yu Y X, Zhao Z Y. Parameters identification of UCAVflight control system based on predator-prey particle swarm optimization.Science China Information Sciences, 2013, 56(1): 1−12
[27] Duan H, Li S, Shi Y. Predator-prey based brain storm optimization forDC brushless motor. IEEE Transactions on Magnetics, 2013, 49(10):5336−5340
[28] Pan F, Li X T, Zhou Q, Li W X, Gao Q. Analysis of standard particleswarm optimization algorithm based on Markov chain. Acta AutomaticaSinica, 2013, 39(4): 381−389
[29] Nash J F. Equilibrium points in n-person games. Proceedings of theNational Academy of Sciences of the United States of America, 1950,36(1): 48−49
[30] Yu Qian, Wang Xian-Jia. Evolutionary algorithm for solving Nashequilibrium based on particle swarm optimization. Journal of WuhanUniversity (Natural Science Edition), 2006, 52(1): 25−29 (in Chinese)
Haibin Duan Professor at the School of Automa-tion Science and Electrical Engineering, BeihangUniversity, China. He received his Ph. D. degreefrom Nanjing University of Aeronautics and Astro-nautics in 2005. His is the Head of Bio-inspired Au-tonomous Flight Systems (BAFS) Research Group.His research interests include multiple UAVs coop-erative control, biological computer vision and bio-inspired computation. Corresponding author of thispaper.
Pei Li Ph. D. candidate at the School of Automa-tion Science and Electrical Engineering, BeihangUniversity, China. He received his bachelor degreefrom Harbin Engineering University in 2012. He is amember of BUAA Bio-inspired Autonomous FlightSystems (BAFS) Research Group. His research in-terests include multiple UAV cooperative control andgame theory.
Yaxiang Yu Master student at the School ofAutomation Science and Electrical Engineering, Bei-hang University, China. She received her bachelordegree from Beihang University in 2007. She wasonce a technician at the Changhe Aircraft IndustriesGroup Co., Ltd. from 2007 to 2008. She is a memberof BUAA Bio-inspired Autonomous Flight Systems(BAFS) Research Group. Her research interests in-clude multiple UAV cooperative control and bio-inspired computation.