defending an asset: a linear quadratic game approach
TRANSCRIPT
Defending an Asset: A Linear
Quadratic Game Approach
DONGXU LI
General Motors Company
JOSE B. CRUZ, JR.
The Ohio State University
Techniques based on pursuit-evasion (PE) games are often
applied in military operations of autonomous vehicles (AV) in
the presence of mobile targets. Recently with increasing use of
AVs, new scenarios emerge such as surveillance and persistent
area denial. Compared with PE games, the actual roles of the
pursuer and the evader have changed. In these emerging scenarios
the evader acts as an intruder striking at some asset; at the same
time the pursuer tries to destroy the intruder to protect the asset.
Due to the presence of an asset, the PE game model with two
sets of players (pursuers and evaders) is no longer adequate.
We call this new problem a game of defending an asset(s) (DA).
In this paper we study DA games under the framework of a
linear quadratic (LQ) formulation. Three different DA games are
addressed: 1) defending a stationary asset, 2) defending a moving
asset with an arbitrary trajectory, and 3) defending an escaping
asset. Equilibrium game strategies of the players are derived for
each case. A repetitive scheme is proposed for implementation of
the LQ strategies, and we demonstrate with simulations that the
LQ strategies based on the repetitive implementation can provide
good control guidance laws for DA games.
Manuscript received July 7, 2008; revised August 1, 2009; released
for publication November 22, 2009.
IEEE Log No. T-AES/47/2/940827.
Refereeing of this contribution was handled by V. Krishnamurthy.
Authors’ addresses: D. Li, General Motors Company, R&D, 30500
Mound Rd., Warren, MI 48090, E-mail: ([email protected]); J. B.
Cruz, Jr., Dept. of Electrical and Computer Engineering, The Ohio
State University, 205 Dreese Lab, 2015 Neil Ave., Columbus, OH
43202.
0018-9251/11/$26.00 c° 2011 IEEE
I. INTRODUCTION
The increasing use of autonomous vehicles (AV)
in modern military operations has recently led to
renewed interest in pursuit-evasion (PE) differential
games [1—4]. AVs deployed in a battlefield often
face intelligent opponents with mobility who can
usually make strategic decisions in response to AVs’
movements [5, 6]. A formulation based on one-sided
optimization is no longer adequate to model such AV
operations. As a tool for modeling decision-making
in conflict situations, game theory has advantages in
resolving the action-counter-action issues involved. PE
game models have been used in target tracking, strike
and object avoidance, etc.
In a typical PE game, one or a group of players
called pursuers go after another one or group of
players called evaders. To study the players’ optimal
strategies, the problem is usually formulated as a
zero-sum game in such a way that the pursuer(s)
tries to minimize a prescribed cost function while
the evader(s) tries to maximize the same function
[7, 8]. Dynamic programming (DP) is the only
general method for solving differential games. In
the literature a number of formal solutions regarding
optimal strategies in particular PE problems have been
achieved [7—10]. Due to the development of linear
quadratic (LQ) optimal control theory, a large portion
of the literature focuses on PE differential games with
a performance criterion in a quadratic form and linear
dynamics of the players [9, 11].
With the development of AV technologies and the
continued interest in AV applications, new scenarios
emerge such as surveillance and persistent area denial
[12]. A typical scenario usually involves an attacker,
who is about to attack some friendly assets. To protect
the assets, AVs are often tasked to destroy the attacker.
In this attack-defense problem, the attacker and the
AV are counterparts of the pursuer and the evader
in the sense that the AV wants to catch the attacker
while the attacker wants to avoid being caught. It
is also distinguished from PE games because the
players’ roles have changed. To chase or to escape
is no longer the mere task of the AV or the attacker.
Besides escaping, the attacker also needs to reach the
vicinity of the asset to attack. For the AV, it is crucial
that interception occurs before the attacker reaches the
asset. By nature this is a game played by the attacker
and the AV with involvement of an asset. We call it
the “game of defending an asset(s)” (DA). To the
authors’ knowledge, this is a first attempt to address
such a problem. As a first step, we focus on problems
where only three entities are present, i.e., an attacker,
an interceptor, and an asset.
Solution techniques that have been used for PE
games are not directly applicable to DA games. In
contrast to PE games, interception of the attacker here
is time sensitive and must occur before the potential
1026 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
attack. The term “capturability,” used in PE games
to indicate the pursuer’s capability of catching the
evader, is not adequate in a DA game when an
additional asset is taken into account. Taking an
arbitrarily long time in capturing is apparently not
acceptable. As we see in an example later, an optimal
pursuit strategy may not be a valid interception
strategy in a DA game.
The literature on DA games is very limited.
Only a special case has been discussed in [13]
and [7, p. 144], where the game was referred to
as the “game of defending a target.” In these two
references a problem in R2 was considered witha fixed target, and players have simple motion
dynamics. A geometric approach has been used to
determine optimal control strategies of the players.
Although the study in this paper is motivated by
AV applications, readers may find it relevant to the
areas of missile guidance and navigation [14]. In a
conventional navigation problem, control laws are
designed for an interceptor to track a moving target.
Proportional navigation guidance law and its variants
have been the most widely employed techniques for
nonmaneuvering target due to their simplicity and
ease of implementation [14]. However, they are not
adequate for DA games because of the involvement
of an extra asset and the fact that the attacker is
highly maneuverable. Another large class of relevant
guidance laws are designed under the framework of
optimal control theory, quite a large portion of which
are based on LQ optimal control theory [14].
For a DA game with an attacker, an interceptor,
and an asset, the main difficulty lies in the additional
terminal constraint of the game. In a (two-player) PE
game, the game terminates as long as the pursuer
is sufficiently close to the evader, while a DA
game may also end if the attacker and the asset
are sufficiently close. With this additional game
terminal constraint, meaningful game formulation
and theoretical development of solutions as well as
solution techniques become very difficult.
To satisfy the practical need for solving DA games
and avoid theoretical challenges, we approach DA
games with an approximate LQ formulation to make
use of the ample results in the LQ game theory. As
a practical approach, terminal penalty terms, also
referred to as soft constraints, are adopted to replace
the inherent hard constraints at the game terminal.
Three types of DA games are considered with respect
to the asset’s maneuverability, i.e., 1) the asset is
stationary; 2) the asset moves along an arbitrary
but known trajectory; and 3) the asset is escaping.
These cover all possible DA scenarios, and each has
a unique formulation and will be treated separately. To
apply the resulting LQ game strategies to “real-world”
DA problems, a practical algorithm based on repetitive
implementation is proposed. The performance of the
algorithm is demonstrated through simulations.
The paper is organized as follows. In Section II,
we introduce the DA problem and discuss difficulties
in problem formulation and solution techniques. In
Section III, a DA game is formulated with linear
dynamics and a quadratic objective based on soft
constraints. In Section IV, equilibrium strategies of
the players are derived for each of the three different
DA games regarding the asset’s maneuverability.
An implementation scheme is then introduced in
Section V to fill the gap between the LQ formulation
and “real-world” problems. In Section VI, we evaluate
the performance of the proposed strategies with
simulations of different DA games with simple
dynamics. Concluding remarks are provided in
Section VII.
II. THE PROBLEM OF DEFENDING ASSET
We first describe a general DA game and discuss
the theoretical difficulties. Consider an attacker, an
interceptor, and an asset in an nS-dimensional space
S μ RnS with nS 2 N. Let xa 2 Rna , xi 2 Rni , and xt 2 Rntbe the state variables of the attacker, the interceptor,
and the asset, respectively, with na,ni,nt ¸ nS. Thedynamics of the players are given in the following
differential equations in general terms:
_xa(t) = fa(xa(t),ua(t)) with xa(0) = xa0
_xi(t) = fi(xi(t),ui(t)) with xi(0) = xi0
_xt(t) = ft(xt(t),ut(t)) with xt(0) = xt0:
(1)
Here ua(t) 2Ua μ Rma , ui(t) 2Ui μ Rmi , and ut(t) 2Ut μ Rmt are the control inputs. Suppose that the firstnS elements of xa (or xi,xt) stand for the physical
position of the attacker (or interceptor, asset) in S.
In a DA game the asset (or attacker) is considered as
attacked (or intercepted) if it is within an " vicinity
of the attacker (or interceptor) for a small " > 0. Now
we define a projection operator P :Rna 7! S for the
attacker as
P(xa) = [xa1, : : : ,xanS ]T 2 S: (2)
That is P(xa) returns the attacker’s position in S.
A similar operator can also be defined for both the
interceptor and the asset, and for simplicity, we use
the same notation P. Given some " > 0, we define the
sets ¤1 and ¤2 as
¤1 = f(xi,xa,xt) 2 Rni £Rna £Rnt j kP(xa)¡P(xt)k · "g(3)
¤2 = f(xi,xa,xt) 2 ¤c1 j kP(xi)¡P(xa)k · "gwhere ¤c1 is the complementary set of ¤1; k ¢ k is thestandard Euclidean norm. Here ¤1 denotes the set of
the game states where the attacker has successfully
reached the vicinity of the asset to perform attacks;
whereas ¤2 stands for the states where the attacker
has been intercepted before it reaches the asset. Note
that in mathematical terms, attack and interception are
LI & CRUZ, JR.: DEFENDING AN ASSET: A LINEAR QUADRATIC GAME APPROACH 1027
abstracted to be the distances between the relevant
players. In general the value of " for attack and
interception may be different. Let ¤= ¤1 [¤2. Clearlythe set ¤ defines the terminal set of a DA game,
i.e., the game terminates at time T, T =minft > 0 j(xi(t),xa(t),xt(t)) 2 ¤g. In other words the gameterminates when either attack or interception occurs.
Here T is regarded as the terminal time of the game.
With the notations introduced above, a DA game
problem can be described as follows:
Given the initial states xa 2Rna , xi 2Rni , andxt 2Rnt , the intercepter (or with the asset in case ofa manoeuvrable asset) needs to find a proper control
input ui(t) (or with ut(t)) as a function of time based
on its accessible information, such that the game ends
in ¤2, i.e., (xi(T),xa(T),xt(T)) 2 ¤2; while the attackertries to drive the state trajectory into ¤1 by choosing a
proper input ua(t).
Generally speaking the game described above is a
game of kind1 [8], for which an analytical solution
is usually derived by introducing an auxiliary cost
function as follows. Assume that the game ends in
¤ in a finite time, and then we can define
J =
½0 if (xa(T),xi(T),xt(T)) 2 ¤21 if (xa(T),xi(T),xt(T)) 2 ¤1:
In this way, the game is converted into a differential
game of degree. The value function V (if it exists) can
only take two values: V(xa,xi,xt) = 1 and V(xa,xi,xt)
= 0, which is not differentiable. Please note that the
description above is not a rigorous definition of
a DA game, where no information structure is
specified, so the type of solutions has not yet been
determined.
Rigorous study of a DA game and possible
solution techniques involves tremendous challenges.
Optimal strategies of the players can possibly
be derived only when it is known where the
game state ends given an initial state (xa,xi,xt).
However, knowledge of terminal states implicitly
requires the existence of a value function, i.e.,
V(xa,xi,xt) = 1 or V(xa,xi,xt) = 0, as defined above.
However, rigorous demonstration of the existence
of solutions is not yet available and far from trivial.
Even for a less complicated two-player PE game
[15], rigorous definition and treatment involve
many theoretical difficulties. In the literature
establishment of the existence of solutions usually
requires certain players’ information structures
(such as the nonanticipative strategy in [15]).
Lipschitz continuity of players’ dynamic equations
1When we speak of differential game of kind, we mean a game
with finitely many, usually two, outcomes; the counterpart is called
differential game of degree, which has a continuum of possible
payoffs. The latter is the concept mostly used in the field of
differential games.
and continuity of the value V are necessary, and
the viscosity solution of the Hamilton-Jacobi-Isaacs
equation is adopted [16]. These assumptions are not
satisfied in a DA game.
Another difficulty lies in solution development,
which is related to dynamic programming. Suppose
that a value V exists, and we define the sets
Sj = f(xa,xi,xt) 2 Rni £Rna £Rnt j V(xa,xi,xt) = jgwith j 2 f0,1g. Here the set S0 (or S1) contains all theinitial states from which the interceptor (or attacker)
can successfully force the game to end in ¤2 (or
¤1). Only with knowledge of Si may an optimalstrategy of the interceptor (or attacker) be derived.
However, a formal derivation of Si is very difficult.Compared to PE games, the dimension of the state
space increases dramatically with the additional asset.
The numerical method [17] based on a reachable set
calculation is applicable, but actual implementation
can be problematic because the time horizon can
be arbitrarily large, and the method suffers from
the exponential growth in the number of states.
In summary a general DA game is a very difficult
problem.
III. LINEAR QUADRATIC FORMULATION WITHSOFT CONSTRAINTS
Theoretical exploration of the challenges discussed
in the previous section is out of the scope of this
paper. To satisfy the practical needs in solving
DA games, we take a practical approach by using
the LQ game theory. As can be seen, most of the
difficulties in a DA game are related to the hard
constraints imposed on the game terminal. In an LQ
formulation, hard constraints are often approximated
by soft constraints with weighting parameters in
the optimal control and differential game literature
[9, 18]. The soft constraints mean that the penalty
terms are introduced in the objective function with
a fixed optimization horizon. By this approximation
a problem with a fixed horizon can be formulated.
The key is to choose a proper optimization horizon.
Following the idea we reformulate a DA problem
as an LQ game using soft constraints with a fixed
horizon.
Suppose that each player in the game has an
independent dynamics
_xa(t) = Aaxa(t) +B0aua(t) with xa(t0) = xa0
(4a)
_xi(t) = Aixi(t)+B0iui(t) with xi(t0) = xi0 (4b)
_xt(t) = Atxt(t)+B0tut(t) with xt(t0) = xt0: (4c)
Here xa(t) 2 Rna , xi(t) 2 Rni , and xt(t) 2 Rnt forna,ni,nt ¸ nS and t¸ t0; ua(t) 2Ua, ui(t) 2Ui andut(t) 2Ut are control inputs; Aa,Ai,At,B0a,B0i ,B0t are realmatrices with proper dimensions. For simplicity, we
1028 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
write an aggregate dynamic equation as
_x(t) = Ax(t)+Baua(t) +Biui(t)+Btut(t) (5)
where
x=
264xaxixt
375 , A=
264Aa 0 0
0 Ai 0
0 0 At
375
Ba =
264B0a
0
0
375 , Bi =
264 0B0i0
375 and
Bt =
264 00B0t
375 :Here x 2 Rn with n= ni+ na+ nt. We assume thateach player can access the state x at any time t, and
feedback strategies are considered, i.e., °a : Rn£R 7!Ua, °i :Rn£R 7!Ui and °t : Rn£R 7!Ut for the
attacker, the interceptor, and the asset, respectively.
Namely, given x 2Rn and time 0· t < T, °a(x, t) 2Ua,°i(x, t) 2Ui, and °t(x, t) 2Ut. Denote by ¡a, ¡i, or ¡tthe set of admissible feedback strategies for each of
the players. In this paper, three different DA games
regarding the asset’s maneuverability are discussed,
and the asset’s strategy is only relevant in one case.
We consider the objective function of the
following form
J(°a,°i,°t;x0)
=
Z T
0
(ua(¿)Tua(¿) +¸ut(¿)
Tut(¿)¡ uTi (¿)ui(¿)
+wIakP(xa(¿))¡P(xt(¿ ))k2¡wIi kP(xi(¿ ))¡P(xa(¿ ))k2)d¿ +wakP(xa(T))¡P(xt(T))k2
¡wikP(xi(T))¡P(xa(T))k2: (6)
In (6), °a,°i,°t are the feedback strategies; ¸ 2f0,¡1g is a binary variable2; ua, ui, ut are the controlinputs associated with each player’s strategy; wa,
wi, wIa, and w
Ii > 0, which are the weighting scalars
corresponding to the relevant costs induced by the
distance between the attacker and the asset and that
between the interceptor and the attacker. Instead of
hard constraints imposed on the game terminal, soft
constraint (penalty) terms with a fixed time duration T
are used. In this formulation, T, wa, wi, wIa,w
Ii are the
design parameters.
The objective J can be written in a quadratic
form with respect to x, ua, ui, and ut. First kP(xa)¡P(xt)k2 = xTQax and kP(xi)¡P(xa)k2 = xTQixwith Qa and Qi specified as
2Here ¸=¡1 indicates that the game is played among all the threeplayers while ¸= 0 is associated with the case where the asset’s
control is not relevant.
Qa =2666664InS 0
0 0na¡nS
0nS£ni0
¡InS 0
0 0(na¡nS )£(nt¡nS )
0ni£na 0ni 0ni£nt¡InS 0
0 0(nt¡nS )£(na¡nS )
0nS£ni0
InS 0
0 0nt¡nS
3777775and
Qi =2666664InS
0
0 0na¡nS
¡InS 0
0 0(na¡nS )£(ni¡nS )
0nS£nt0
¡InS 0
0 0(ni¡nS )£(na¡nS )
InS0
0 0ni¡nS
0
0
0nt£na 0 0
3777775where 0p£q is the zero matrix of dimension p£ q; 0p(or Ip) is the p£p zero (or identity) matrix; 0 is azero matrix of a proper dimension. Then J in (6) isequivalent to
J =
Z T
0
(uTa ua+¸uTt ut¡uTi ui+ xT(¿)Qx(¿))dt
+ xT(T)Qfx(T) (7)
where matrix Q = Q(wIa,wIi ), and Qf = Q(wa,wi) with
function Q(¢, ¢) defined asQ(wa,wi) = waQ
a¡wiQi:This is a zero-sum game between the attacker and
the interceptor (or the interceptor with the asset).The attacker seeks a strategy °a 2 ¡a to minimize J ,while the intercepter (or along with the asset) triesto maximize J with °i 2 ¡i (or °t 2 ¡t). This can beviewed as a dual tracking problem where the attackerwants to track the asset but to avoid the inceptor, andat the same time, the interceptor tries to follow theattacker closely.By casting the DA problem into an LQ game,
we can circumvent the technical difficulties, andderivation of game strategies becomes possible. Underthe LQ game framework, existence of solutions isequivalent to the existence of the underlying Riccatiequation, which may be checked under certainconditions [8, 19].In what follows, we discuss solution techniques
for each of the three different DA games regarding themobility of the asset: a DA game with 1) a fixed asset,2) a moving asset following an arbitrary trajectory,and 3) an escaping asset.
IV. GAME SOLUTIONS FOR DA GAMES WITHDIFFERENT TYPES OF ASSETS
A. Review of LQ Game Theory
Let us first review the LQ game theory that willbe the major tool used in this paper. Consider a
LI & CRUZ, JR.: DEFENDING AN ASSET: A LINEAR QUADRATIC GAME APPROACH 1029
game involving two players with the following linear
dynamics:
_x(t) = Ax(t) +B1u1(t) +B2u2(t): (8)
The objective function is
J =
Z T
0
(u1(¿ )Tu1(¿)¡ uT2 (¿)u2(¿) + xT(¿)Qx(¿))d¿
+ xT(T)Qfx(T): (9)
Suppose that both of the players have access to the
state variables, and state feedback strategies are of
interest. The following theorem specifies saddle-point
equilibrium strategies.
THEOREM 1 The game with the players’ dynamics
in (8) and the objective J in (9) admits a feedback
saddle-point solution given by u¤1(t) = °¤1(x(t), t) =
K¤1 (t)x(t), and u¤2(t) = °
¤2(x(t), t) =K
¤2 (t)x(t) with K
¤1 (t) =
¡BT1Z(t) and K¤2 (t) = BT2Z(t), where Z(t) is bounded,symmetric, and satisfies
_Z =¡ATZ ¡ZA¡Q+Z(B1BT1 ¡B2BT2 )Z with
Z(T) =Qf: (10)
Readers are referred to [8] and [20] for a detailed
proof.
Theorem 1 provides solutions to a two-player
zero-sum LQ game. The implicit assumption here
is that a bounded solution for (10) exists over the
time horizon [0,T]. This nonlinear matrix differential
equation is called a Riccati equation. In the control
literature it is often required that matrices Q and Qfare (positive) semi-definite for the Riccati equation
to admit a bounded solution [8]. However in the DA
game formulated above, Q and Qf in (6) are neither
positive nor negative semidefinite. Thus existence of
solutions for the Riccati equation may be an issue.
We will address this issue later in Section V when
implementation is discussed.
B. Game of Defending a Stationary Asset
We first consider a DA game with a fixed asset.
Without loss of generality we assume that the asset
is at the origin. Ignore the asset’s dynamics, and
the aggregate dynamic equation of the game in (5)
becomes
_x(t) = Ax(t) + Baua(t) + Biui(t) (11)
where
x=
·xa
xi
¸, A=
·Aa 0
0 Ai
¸
Ba =
·B0a0
¸, and Bi =
·0
B0i
¸:
Here x 2 Rn with n= ni+ na. The objective of thegame given in (6) with xt(¿) = 0 for all 0· ¿ · T and
¸= 0 turns into
J(°i,°a;x0)
=
Z T
0
(uTa ua¡ uTi ui +wIakP(xa)k2¡wIi kP(xi)¡P(xa)k2)d¿
+wakP(xa(T))k2¡wikP(xi(T))¡P(xa(T))k2: (12)
Similar to (7), we rewrite the objective in a quadraticform as follows.
J(°i,°a; x0) =
Z T
0
(ua(¿)Tua(¿)¡ uTi (¿ )ui(¿) + xT(¿)Qx(¿ ))d¿
+ xT(T)Qf x(T) (13)
where Q = Q1(wIa,w
Ii ) and Qf = Q1(wa,wi) with the
mapping Q1 :R£R 7!Rn£n defined as
Q1(wa,wi) =
·(wa¡wi)InSna£na wiI
nSna£ni
wiInSni£na ¡wiInSni£ni
¸(14)
where In0n1£n2 (n1,n2 ¸ n0) is an n1£ n2 matrix in whichthe submatrix formed by the first n0 rows and n0columns is an identity matrix, and all the remainingentries are zeros. Here the attacker is the minimizerand the interceptor is the maximizer. The formulationof this game is similar to that used in [9], where anLQ PE differential game is formulated. Theorem 1 fora general two-player LQ game is applicable.
C. Game of Defending a Moving Asset with anArbitrary Trajectory
In this section we consider a DA game involvingan asset that moves along an arbitrary trajectory. Inpractice this game is relevant to a warfare scenariowhere assets need to be transported with protection.Suppose that the asset moves in RnS following apredetermined trajectory xt(¢). The movement of theasset is known to both the attacker and the interceptor.Consider the dynamics of the attacker and the
interceptor in (4a)—(4b). The asset’s control in (4c) isknown due to its known trajectory. Again the game isplayed between the interceptor and the attacker. With¸= 0, the objective function in (6) becomes
J(°a,°i;x0) =
Z T
0
(ua(¿)Tua(¿ )¡ uTi (¿ )ui(¿)
+wIakP(xa(¿))¡P(xt(¿))k2
¡wIi kP(xi(¿ ))¡P(xa(¿ ))k2)d¿+wakP(xa(T)¡P(xt(T))k2
¡wikP(xi(T))¡P(xa(T))k2: (15)
By inspection of the objective (15), we find thatthis DA game is closely related to an LQ regulatorproblem with a reference state trajectory [21], basedon which we derive the LQ game strategies. Thefollowing theorem provides saddle-point strategies ofthe players.
1030 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
THEOREM 2 Suppose that the asset’s trajectory xt(t) is
known. The DA game with the dynamics in (4a) and
(4b) and the objective J in (15) admits a feedback
saddle-point solution under the strategies
u¤a = °¤a(x, t) =¡BTa Z11x¡ BTa b (16)
u¤i = °¤i (x, t) = B
Ti Z11x+ B
Ti b (17)
where Ba, Bi, x are defined in (11); the n£ n (n=na+ ni) matrix Z11 is bounded and satisfies
_Z11 + ATZ11 +Z11A+Q11¡Z11(BaBTa ¡ BiBTi )Z11
= 0 with Z11(T) =Qf11 : (18)
Here Q11, Q12, Qf11 , Qf12 are the corresponding
submatrices of the matrices Q and Qf in (9) partitioned
as
Q =
·Q11 Q12
QT12 Q22
¸and Qf =
·Qf11 Qf12
QTf12 Qf22
¸where Q11 and Qf11 are n£ n matrices; A is given in(11); the time-varying vector b is specified by
_b(t) = [¡AT +Z11(BaBTa ¡ BiBTi )]b(t)¡Q12xt(19)
with b(T) =Qf12xt(T).
PROOF For the time being, we first assume that
the trajectory xt(¢) of the asset is generated by anautonomous linear system (without control input) as
_xt = Atxt with xt0 given: (20)
Later we show that this assumption is not necessary.
Combining the dynamic equations (4a)—(4b) with
(20), we can write an aggregate dynamic equation as
_x(t) = Ax(t) +Baua(t) +Biui(t) (21)
where x, A, Ba, Bi are exactly the terms defined in (5).
Equation (21) is almost the same as (5) except for the
lack of the input term from the asset. The objective
(15) can be rewritten as
J(°i,°a;x0)
=
Z T
0
(ua(¿)Tua(¿)¡ uTi (¿)ui(¿) + xT(¿)Qx(¿))d¿
+ xT(T)Qfx(T) (22)
where Q and Qf are defined in (7). Note that Q =
Q2(wIa,w
Ii ) and Qf = Q2(wa,wi) with
Q2(wa,wi)
=
264(wa¡wi)InSna£na wiI
nSna£ni ¡waInSna£nt
wiInSni£na ¡wiInSni£ni 0ni£nt
¡waInSnt£na 0nt£ni waInSnt£nt
375(23)
where In0n1£n2 is defined in (14).
Apply Theorem 1 to the game with the objective in
(22) and the players’ dynamics in (21). Clearly if the
Riccati equation
_Z =¡ATZ ¡ZA¡Q+Z(BaB
Ta ¡BiBTi )Z with Z(T) =Qf
(24)
admits a solution Z over the interval [0,T], the
saddle-point strategies of the intercepter and the
attacker can be specified as
u¤a(t) =¡BTa Z(t)x(t) and u¤i (t) = BTi Z(t)x(t):
(25)
Next we partition the n£ n matrix Z in (25) as
Z =
·Z11 Z12
ZT12 Z22
¸where Z11 is an n1£ n1 matrix with n1 = na+ ni; Z12is a n1£ nt matrix; and accordingly, Z22 is an nt£ ntmatrix. Matrices Q and Qf are also partitioned in the
same way into submatrices Qij and Qfij (i,j 2 f1,2g).By inspection of Q1 in (14) and Q2 in (23), the
submatrices Q11 and Qf11 satisfy that Q11 = Q and
Qf11 = Qf . Furthermore note the relationship between
A in (5) and A in (11) as well as those between
Ba,Bi and Ba, Bi. With the submatrices defined above,
the Riccati equation (24) can be presented in three
separate equations in terms of Zij , Qij , and Qfij as
_Z11 + ATZ11 +Z11A+ Q¡Z11(BaBTa ¡ BiBTi )Z11 = 0
with Z11(T) = Qf (26a)
_Z12 +Z12At+ ATZ12 +Q12¡Z11(BaBTa ¡ BiBTi )Z12 = 0
with Z12(T) =Qf12 (26b)
_Z22 +Z22At+ATt Z22 +Q22¡ZT12(BaBTa ¡ BiBTi )Z12 = 0
with Z22(T) =Qf22 : (26c)
It should be noted that the existence of solutions
for (24) over the interval [0,T] is equivalent to that for
(26a). This partition of the Riccati equation allows
us to decompose the saddle-point strategy of the
interceptor (or the attacker) into two parts. Note that
xT = [xT,xTt ], and the interceptor’s optimal control in
(25) is
u¤i = BTi Z11x+ B
Ti Z12xt: (27)
Here the first term is associated with the asset and
the interceptor, and the second term is driven by the
motion of the asset.
We then define b¢=Z12xt. Taking the time
derivative of b and making use of _Z12 in (26b), we
obtain that b(t) satisfies the following differential
LI & CRUZ, JR.: DEFENDING AN ASSET: A LINEAR QUADRATIC GAME APPROACH 1031
equation
_b(t) =d
dt(Z12xt)
= _Z12xt+Z12 _xt
= Z12(Atxt)¡ (Z12At+ ATZ12 +Q12)xt+Z11(BaB
Ta ¡ BiBTi )Z12xt
= [¡AT +Z11(BaBTa ¡ BiBTi )]Z12xt¡Q12xt= [¡AT +Z11(BaBTa ¡ BiBTi )]b(t)¡Q12xt: (28)
Here the final condition of b(T) is easily determined
as b(T) = Z12(T)xt(T) =Qf12xt(T). Integrating the
differential equation backwards, we obtain function
b(t). Substitute b into (27), and we attain the
saddle-point strategy
u¤i = BTi Z11x+ B
Ti b: (29)
Note that replacing Z12xt with b helps remove the
dependence of Z12 in control u¤i and eventually
eliminates the dependence of At through (26b). It
becomes clear that the saddle-point equilibrium
strategy actually does not depend on the assumption
of the asset’s linear dynamics given in (20). The
saddle-point game strategy °¤i can be solved given anarbitrary trajectory of xt through b.
Finally the saddle-point equilibrium strategy °¤a ofthe attacker can be derived similarly, i.e.,
u¤a =¡BTa Z11x¡ BTa b:
The saddle-point strategies of the attacker and the
interceptor have two terms. For the interceptor, the
first term in (29) is feedback of the state variables,
through which its strategy is coupled with the
attacker’s strategy. Note that (26a) here is the same
as the Riccati equation in (10), associating the
game of DA with a stationary asset. This feedback
term here is exactly the saddle-point strategy of the
interceptor when a stationary asset is considered. The
second term in (29) is a feed-forward term that solely
depends on the asset’s motion. Considering the DA
game with a fixed asset, this term here represents the
continuous transition of the origin (where the fixed
asset is located) due to the movement of the asset.
D. Game of Defending an Escaping Asset
In this section we consider the case where the
asset can control its motion to avoid the attacker.
Consider the dynamics of the asset in (4c) and the
objective function in (7) with ¸=¡1. To escapeattack, the asset acts as a maximizer with feedback
strategy °t. In this case, both the interceptor and the
asset are maximizers, and the attacker is a minimizer.
Note that the interceptor and the asset are independent
players, but they both perform maximization over
TABLE I
The LQRHA Algorithm at Each Time tk
1. Input state x at time tk2. Select the parameters wa, wi (w
Ia,w
Ii), and Tk
3. Solve the saddle equilibrium feedback strategies °¤a , °¤i
(or °¤t ) over the time interval [tk , tk +Tk)4. Output °¤a , °
¤i(or °¤t ) for implementation over the next ¢t
interval
a common objective. Furthermore, they both have
access to the same information. With the same
information base, maximization performed by each
of them individually is equivalent to the simultaneous
maximization over an augmented decision space that
combines the control spaces from the both players.
Thus this three-player game can actually be reduced
to a two-player zero-sum game, where the interceptor
and the asset are viewed as one player.
To make use of the two-player game theory, we
rewrite the aggregate dynamic equation as
_x(t) = Ax(t)+Baua(t) +Bituit(t) (30)
where
Bit =
264 0 0
B0i 0
0 B0t
375 , uTit = [uTi ,u
Tt ]:
Here A, Ba, x are those defined in (5). Note that it
becomes a standard two-player LQ differential game,
and Theorem 1 applies.
V. IMPLEMENTATION OF THE LQ GAMESTRATEGIES
Although the formulation of an LQ DA game
with a fixed horizon enjoys the availability of
analytical solutions, its practical usefulness in solving
real-world DA games with inherent hard constraints
remains to be tested. The main challenge lies in the
approximation introduced when hard constraints are
replaced by soft constraints with a fixed terminal time.
To extend the usefulness of the LQ formulation to
practical problems, we propose a repetitive algorithm.
We first choose ¢t > 0 as the sampling time
interval. At each sampling time tk = t0 + k¢t for k 2f0,1,2, : : :g, select Tk >¢t as an optimization horizonused with the quadratic objective (6). Saddle-point
equilibrium strategies °¤a ,°¤i (or with °
¤t ) are solved
over the horizon interval [tk, tk +Tk). But for actual
implementation, these strategies are only used within
the next ¢t interval: [tk, tk +¢t). Then this procedure
will be repeated at the next sampling time tk + k¢t.
The detailed procedure at each time tk is described
in Table I, and we call the implementation scheme
LQ receding horizon algorithm (LQRHA). In this
algorithm, wa, wi, wIi , w
Ia, ¢t, and Tk are the design
parameters.
1032 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
With the LQRHA algorithm, the real-world DA
problems may be approached by the LQ game design.
To successfully apply the LQRHA algorithm, the
parameters ¢t and Tk need to be chosen properly.
In practice a good understanding of the scenario
and simulation studies can provide insights. Here
we only provide general guidelines on how to
choose them. As we know, Tk is used as an estimated
duration of a game or a time horizon of interest to
the decision-maker. If Tk is too large, part of the
resulting game strategies would be less relevant to
the current game situation, or the strategies may be
over planned. On the other hand, if Tk is too small, the
resulting strategies tend to be myopic since decision
is made only based on the prediction of the very
immediate future. Regarding ¢t repetition every ¢t
provides a self-correction or feedback mechanism.
It is important for players to constantly update their
strategies according to emerging situations because
in a game the other player’s behavior is hard to
predict, and the estimated planning horizon Tk may
be inaccurate. In practice a proper ¢t can be chosen
based on the player’s ability to predict the evolution
of the game.
Choice of Tk may face a practical constraint
from the corresponding Riccati equation. In the LQ
game design, it is necessary that the Riccati equation
(10) admits a bounded solution over the interval
[0,Tk]. In other words the interval [0,Tk] contains no
“escape time” [19]. The existing literature on optimal
control and dynamic game theory normally requires
the matrices Q and Qf to be positive or negative
semi-definite to ensure the existence of solutions to
the Riccati equation. Unfortunately this is not the
case in the DA games discussed above. Thanks to
the following theorem, existence of solutions can be
checked.
THEOREM 3 The Riccati differential equation (RDE)
(10) has a bounded solution over [0,T] if and only if
the following matrix linear differential equation· _X(t)_Y(t)
¸=
·A ¡S¡Qi ¡AT
¸·X(t)
Y(t)
¸·X(T)
Y(T)
¸=
·In
Qf
¸ (31)
has a solution on [0,T] with X(¢) nonsingular over[0,T]. In (31), A, Q, and S = B1B
T1 ¡B2BT2 are those in
(10). Moreover, if X(¢) is invertible, Z(t) = Y(t)X¡1(t) isa solution of (10).
Readers are referred to [20, p. 194], or [19, p. 354]
for a proof.
In what follows we use Theorem 3 to determine a
proper Tk in the LQRHA algorithm. First we choose
an tentative horizon Tk solely based on the problem
nature and a simulation study. The things to consider
include the players’s speeds, the area to be covered
by the interceptor and the time horizon of interest.
Suppose that Tk is determined as a function of the
state variables Tk = T (xk). Then we check if thereexists a finite escape time Te where Te 2 [0, Tk] bysolving (31). If Te =2 [0, Tk], then simply choose Tk asTk; otherwise set Tk = (Tk ¡Te)¡ ± for some ± > 0.Following this procedure, the Riccati equation (10)
has a bounded solution over [0,Tk]. Since the
equation is time invariant, Te only needs to be
calculated once.
VI. NUMERICAL EXAMPLES
In this section, we demonstrate the usefulness of
the LQ strategies with simulations. Three DA games
in R2 are considered with different maneuverabilitiesof the asset.
A. Simple Motion Dynamics of the Players
Consider the following players’s dynamics in an
x-y coordinate: (_xa = vaua cos(μa)
_ya = vaua sin(μa)(_xi = viui cos(μi)
_yi = viui sin(μi)(_xt = vtut cos(μt)
_yt = vtut sin(μt)
(32)
with proper initial conditions. Define the aggregate
state as x& = [x& , y&]T with the subscript & 2 fa, i, tg
standing for attacker, interceptor, or asset. In (32),
x& , y& are the displacements along the x and y axes; v&is the speed, which is a constant; u& ,μ& are the control
inputs where u& 2 [0,1] is a scaling factor and μ& is
the moving orientation. Next we convert the nonlinear
dynamic equations in (32) into a linear form."_xa
_ya
#=
·va 0
0 va
¸·uax
uay
¸"_xi_yi
#=
·vi 0
0 vi
¸·uix
uiy
¸"_xt
_yt
#=
·vt 0
0 vt
¸·utx
uty
¸:
(33)
In (33), (u&x,u&y) are the control inputs with the
constraintqu2&x+ u2
&y· 1. Since (u& ,μ&) and (u&x,u&y)
forms a one-to-one mapping with μ& 2 [0,2¼), theequations in (32) and (33) are equivalent.
Despite the constraint on the boundedness of the
control inputs in (33), the LQ approach is still used
to design the feedback control law °& . To ensure the
LI & CRUZ, JR.: DEFENDING AN ASSET: A LINEAR QUADRATIC GAME APPROACH 1033
Fig. 1. Researchable set and optimal strategy. (a) Capture at coincidence. (b) Capture with radius ".
boundedness, the following nonlinear function '(¢) isused
'(r) =
½r if krk · v&v&r=krk if krk> v&for r 2Rm with m¸ 1 (34)
Given an LQ strategy °& , the actual control u& applied
in the simulations is u& = '(°&(x)).
B. Defending a Stationary Asset
1) Existence of Solutions for the Riccati Equation:
We first consider a DA game with a fixed asset.
In what follows we first show existence of the
solutions for the corresponding Riccati equation (10).
The existence of solutions for the Riccati equation
associated with the DA game is presented in a general
RnS space with nS > 0. Consider the simple lineardynamics, and the matrices in (11) can be specified
as Aa = Ai = 0nS , B0a = vaInS , B
0i = viInS , and ua,ui 2RnS
with va,vi > 0. The game objective is given in (13).
For this DA game, the corresponding Riccati equation
(10) has a bounded solution under certain conditions.
THEOREM 4 For a game of defending a stationary
asset with the objective in (13) and simple linear
dynamics, the corresponding Riccati equation (10)
has a bounded solution if one of the following is true:
1) va < vi; 2) wIi < w
Ia and wi = 0; 3) w
Ii = w
Ia and
va = vi.
PROOF Refer to Appendix I.
2) An Optimal Strategy: In the DA game above
if va = vi, the players’ optimal strategies can be
derived using a geometric approach [7, p. 144].
For readers’ convenience, we summarize the
approach as follows. First we define reachable set
as all the initial positions of the asset in R2 thatthe attacker can successfully reach without being
intercepted under certain strategy no matter how
the interceptor moves. We denote by R(xa,xi) thereachable set given the initial positions xa,xi. Fig. 1(a)
illustrates the set R(xa,xi) when the attacker isconsidered intercepted only by coincidence with the
interceptor: xi(t) = xa(t) for some t > 0. When va = vi,
R(xa,xi) is half of the plane on the attacker’s side.The plane is divided by the line L passing through the
midpoint M of the interval between the interceptor
and the attacker and perpendicular to the interval.
In this case where xt =2R(xa,xi), an optimal strategyshould drive the attacker (or interceptor) towards the
point O, which is the point on L that is closest to
the asset. If xt 2R(xa,xi), the attacker may simplymove towards the asset, where nothing can be done
to enforce an interception. On the other hand, if the
attacker is considered intercepted when falling into
an " vicinity of the interceptor, the splitting line L
in Fig. 1(a) turns into a hyperbola passing through
the midpoint N between the attacker and the capture
circle ¹. The hyperbola has the asymptotes passing
through the midpoint M between the attacker and
the interceptor, which are also perpendicular to the
tangents of ¹ that pass through the attacker. The
reachable set is the region that contains the attacker,
which is shown in Fig. 1(b). Similarly an optimal
strategy drives the attacker (or interceptor) moving
towards the point O.
3) Simulation Results: In the following
simulations the asset is at the origin. Let va = vi = 1,
and the initial positions are xa0 = [3,3]T and xi0 =
[3,¡1]T. Clearly the asset is not in R(xa0,xi0). Toillustrate the performance of the LQ game strategy
with the LQRHA implementation, the interceptor
uses the LQ game strategy against the attacker who
exploits the optimal strategy in Fig. 1(b). In the
LQRHA algorithm the planning horizon Tk is chosen
as Tk = (kxak¡ ")=va at each tk, which equals theminimum time possible for the attacker to reach the
asset. Choose the sampling time step ¢t= 0:1, and the
capture radius "= 0:5. For simplicity the same radius
" is used for both the attacker and the interceptor.
Note that all the parameters chosen above are unitless.
In the LQ game design, the weights wi,wa,wIi ,w
Ia in
the objective (13) are specified as follows. We set
wi = wa = 10, and for comparison two sets of wIi ,w
Ia
are used, i.e., wIi = wIa = 0 and w
Ii = w
Ia = 10. By
1034 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
Fig. 2. Interception trajectories under LQ strategies with different wIi, wIa.
Fig. 3. Interception trajectories under interceptor's direct and optimal strategies.
Theorem 4, the corresponding Riccati equation (10)
has a bounded solution.
The simulated game trajectories with the strategies
specified above are shown in Fig. 2. With the LQRHA
implementation of the LQ strategies, the interceptor
can successfully intercept the attacker against its
optimal strategy. Regarding the difference induced
by the parameters wIi ,wIa, the strategy resulting from
larger wIi ,wIa performs better, and the attacker is
further away from the asset when intercepted. From
Table II, the interception also takes less time. With
positive wIi ,wIa, the interceptor appears to be more
aggressive and tracks the attacker more closely. This
is because of the effective penalty terms in the integral
in (7).
A suboptimal feedback strategy of the interceptor
udi =¡vi(xa¡ xi)=(kxa¡ xik) (xa 6= xi) is alsoconsidered. We call it “direct strategy.” This strategy
is referred to as a line-of-sight guidance law in missile
guidance, which drives the interceptor moving straight
towards the attacker. This is also an optimal pursuit
strategy in the PE game without the asset.
In the following simulations the interceptor uses
both the direct and the optimal strategy against the
attacker’s optimal strategy. The resulting players’
TABLE II
Inception/Attack Time Comparison
LQ LQ
Strategy Optimal (wIi,wIa = 10) (wI
i,wIa = 0) Direct
Interception/
Attack Time(s) 3.1 3.5 3.8 3.9
trajectories are illustrated in Fig. 3. The interceptor
fails to intercept the attacker under the direct strategy
because no prediction of the attacker’s movement
is available without taking into account the asset.
Since time is critical in a DA game, direct strategy is
disadvantageous. From the right plot of Fig. 3, when
both players use the optimal strategies, the players’
trajectories are straight lines. The interception time or
the time for the attacker to reach the asset is given in
Table II.
To better understand the impact from the design
parameters, we show by simulation how the design
parameters wa,wIa,wi,w
Ii can affect the performance
of the LQ strategies. To illustrate the influence
exclusively from the parameters wa,wi in (12), the
DA game with exactly the same initial conditions
and the simulation parameters is considered. The
LI & CRUZ, JR.: DEFENDING AN ASSET: A LINEAR QUADRATIC GAME APPROACH 1035
Fig. 4. Game trajectories under interceptor's LQ game strategies with different wi.
attacker still uses the optimal strategy. We first set
wIi = wIa = 0, wa = 1 and use different values of wi.
The corresponding players’ trajectories under different
wi are shown in Fig. 4. When wi is small, i.e., wi = 0:1
or 1 in Fig. 4, the interceptor fails to intercept the
attacker. This is because the penalty on the distance
is relatively small compared to that on the control
energy. When wi becomes larger, the focus is shifted
to interception. When wi increases to 1 from 0.1, the
interceptor gets closer to the attacker. When wi further
increases to 10 and 100, the penalties on control
consumption become negligible, and the interceptor
can successfully intercept the attacker. Little difference
can be observed between the two trajectories because
of the boundedness constraint in (34), where the
energy use in the feedback control cannot increase
indefinitely.
The influence from the weight on the accumulative
penalty term wIi is illustrated as follows. Here we
set wi = wa = 0 and wIa = 1. Different values of w
Ii
are used, and the resulting players’ trajectories are
shown in Fig. 5. It can be seen that a larger wIi leads
to a more efficient interception strategy. With a large
wIi , the accumulative penalty term on the distance
becomes dominant, such that the interceptor can track
the attacker more closely. However in a DA game,
better tracking of the attacker does not necessarily
translate into a better interception performance.
Without the properly positioned interceptor, blindly
tracking the attacker can undermine its interception
effectiveness.
In addition to the monotonicity in the weighting
parameters above, the impacts from the parameters
wi,wa,wIi ,w
Ia can be better understood through the
Riccati equation (24). The weighting parameters wi,waappear in the terminal condition Qf , while w
Ii ,w
Ia
appear in Q, which can influence how the Riccati
equation evolves backwards. The planning horizon
Tk is also an important factor, which determines the
relative importance of wi,wa and wIi ,w
Ia to the final
solution matrix. When Tk is relatively small (zero
in the extreme case), wi,wa becomes a dominant
factor through the terminal condition Qf . As TKincreases, the influence from wIi ,w
Ia will grow through
integration, and eventually they become dominant,
especially when the Riccati solution converges.
In the following simulations, let the attacker use
its optimal strategy and the interceptor exploit
different LQ game strategies from a different set
of wi,wa,wIi ,w
Ia. Once chosen these parameters are
fixed throughout the simulations. Different Tk are
used, but once chosen, Tk is fixed. To demonstrate
the impact from Tk on the game result, the attacker’s
position at the game terminal relative to the asset
is used as a measure of the effectiveness of a game
strategy. Fig. 6 illustrates the results of the LQ game
strategies with three sets of weighting parameters
with a changing planning horizon Tk. In all cases
when Tk is small (less than 0.1), the attacker can
successfully reach the asset. When Tk = 0, the LQ
strategy becomes exactly the direct strategy since its
guidance solely comes from the instantaneous penalty
1036 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
Fig. 5. Game trajectories under interceptor's LQ game strategies with different wIi.
Fig. 6. Distances between attacker and asset at game terminal under different Tk .
on distance. Once Tk is greater than 0.1, interception
becomes successful. As Tk increases, the interceptor
behaves differently and the distance between the
attacker and the asset at the game terminal varies
until it converges when Tk > 1. This is due to the
convergence of the solution to the Riccati equation
(10). In this case since the parameters wIi ,wIa in
the integral terms are same in all three cases, they
converge to the same point. This implies that if Tk is
chosen sufficiently large, when the solution matrix
converges, the parameters wi,wa have no effect on the
resulting strategy since it is only implemented within
a ¢t interval. In summary the terminal condition
Qf , equation term Q, and evolution time Tk all have
an impact on the solution matrix Z in the Riccati
equation, and in turn influence the linear gain in the
resulting game strategy. Also additional complication
from Tk might be involved when an escape time
LI & CRUZ, JR.: DEFENDING AN ASSET: A LINEAR QUADRATIC GAME APPROACH 1037
Fig. 7. Interception region I under different intercepter's strategies.
exists, which imposes a practical constraint on Tk.
In practice wi,wa,wIi ,w
Ia, and Tk need to be carefully
chosen jointly to achieve a desired game
performance.
Next we show the interception capability of
the LQ game strategy. Let the initial position of
the attacker be (3,3). We focus on a set of the
interceptor’s initial positions, from which it can
successfully intercept the attacker under different
strategies. Let us call the set “interception region” and
denote it by I. The interception set depends on thestrategies of both the attacker and the interceptors.
Suppose that the attacker exploits the optimal
geometric strategy. Under the LQ game strategies
associated with wa = wi = 10 and two different
sets of wIa,wIi (w
Ia = w
Ii = 0 and w
Ia = w
Ii = 10), we
calculate the interception regions. For comparison
the interception regions associated with the optimal
and the direct strategy from the interceptor are also
calculated. In Fig. 7, the perimeters of I underdifferent interceptor’s strategies are depicted. By
definition each interception region I (inside theperimeter) contains all the points from which the
interceptor can successfully intercept the attacker.
Clearly I associated with the interceptor’s optimalstrategy is the largest set. The sets I associated withthe LQ game strategies are only slightly smaller.
Again the direct strategy has the worst interception
performance. The results suggest that the LQ game
design with LQRHA algorithm can determine a fairly
good practical interception guidance law.
TABLE III
Simulation Parameters
Attacker Interceptor Asset
Speed 1.5 2
Initial Position (¡4:5,6) (¡9,¡9) (2,2)
Finally the LQRHA algorithm has also been
applied to determine the attacker’s strategies against
the interceptor’s optimal strategy, and similar results
have been observed.
C. Defending a Mobile Asset with an ArbitraryTrajectory
In this section we consider a DA problem with a
mobile asset that moves following a known trajectory.
The dynamic equations in (33) are considered, and the
asset’s movement is described as follows:
_xt = 0:5t
_yt =¡0:5t¡ 2sin³¼5t´:
The speeds and the initial positions of the players are
specified in Table III.
The LQ game approach with the LQRHA
algorithm is used to determine both the attacker’s
and the interceptor’s strategies. Note that the Riccati
equation associated with this DA game has a bounded
solution. Let the sampling time interval ¢t= 0:1
and "= 0:5. At each sampling time tk = t0 + k¢t, the
1038 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
Fig. 8. Interception trajectories under LQ game strategies with different wIi.
optimization horizon Tk is chosen as
Tk =minf(kxa¡ xtk¡ ")=va, (kxi¡ xak¡ ")=vig:The weighting parameters in the objective function
(15) are chosen as wa = wIa = wi = 10. We simulate the
game under different values of wIi such as wIi = 1 and
wIi = 10.
Fig. 8 depicts the interception trajectories under
the LQ game strategies associated with different
wIi s with the LQRHA algorithm. With a relatively
small wIi , wIi = 1, the attacker succeeds in reaching
the asset, while when wIi = 10, the interceptor
can closely track and successfully intercept the
attacker. Similar to the previous games, a bigger wIitranslates into better tracking. Based on a number
of simulations, the LQ strategy with the LQRHA
algorithm can provide fairly good guidance laws
for both the attacker and the interceptor in this DA
game with a mobile asset moving along an arbitrary
trajectory.
D. Defending an Escaping Asset
In the following DA game with an escaping
asset, the players are moving with the simple motion
dynamics given in (33). Consider the objective in
(22). We first present a theorem on the existence of
solutions for the associated Riccati equation, which is
a counterpart of Theorem 4.
THEOREM 5 In the game of defending an escaping
asset with the objective function (22) and simple linear
dynamics (33), if va < vi and vt < va, the corresponding
Riccati equation (10) has a bounded solution for any
T > 0.
PROOF Readers are referred to Appendix II for a
proof.
TABLE IV
Simulation Parameters
Attacker Interceptor Asset
Speed 1 1 0.5
Initial Position (0,4) (0,¡4) (4,2:5)
Fig. 9. Illustration of asset's escaping strategy.
Before simulation we first analyze the possible
strategies of the players. The game parameters are
in Table IV. Recall the players’ optimal strategies
derived in Section VIB2. With the given initial
conditions, the reachable set of the attacker R(xa,xi)is in the half plane above the x axis. Thus if the
asset tries to escape, its best chance is to move
south to cross the x axis as shown in Fig. 9. In
this example, it takes at least 5 s for the asset to
cross the x axis at the point C. Considering the
attacker’s speed and its distance to point C, we
can see that the asset can barely escape from the
attacker.
LI & CRUZ, JR.: DEFENDING AN ASSET: A LINEAR QUADRATIC GAME APPROACH 1039
Fig. 10. Interception trajectories under LQ game strategies with different wi.
In the following simulation, we choose ¢t= 0:1
and Tk as
Tk =minf(kxa¡ xtk¡ ")=(va¡ vt), (kxi¡ xak¡ ")=vig:Let wIi = w
Ia = 0 and wa = 10, and we simulate the
game with different values of wi: wi = 1 and wi = 10.
The simulation results are illustrated in Fig. 10.
The left plot is associated with wi = 1. The interceptor
falls short because the penalty on control energy is
relatively large. Without sufficient support from the
interceptor, the asset moves to the right to avoid the
attacker. The interception performance is improved
with a larger wi. In this case, the trajectories of all the
three players under the LQ game strategies closely
resemble the strategies illustrated in Fig. 9.
In addition to the examples presented above, we
have simulated a number of scenarios. The results
suggest that with proper choice of design parameters,
this LQ game approach can determine good guidance
laws in practical DA games, which are better than
existing control design methods.
For practical problems the choice of the design
parameters wI& , w& , and Tk is important. Depending on
the application, the parameters can be chosen from
the interceptor’s or the attacker’s perspective. One
issue that we have observed from simulations is that
towards the end of a game, the choice of Tk becomes
much more crucial, and an improper Tk can impact
the game outcome negatively. One guideline here
is to choose a small Tk near the end of a DA game.
Besides, a different strategy may also be adopted.
For instance near the game terminal, the interceptor
and the attacker may be closely engaged, i.e., the
interceptor can successfully intercept the attacker
under the direct strategy. If so the interceptor can
simply use the direct pursuit strategy for the rest of
the game. This may suggest a two-phase interception
process: 1) during the DA phase, the LQ strategy
with the LQRHA algorithm is used; 2) during the
interception phase, the interceptor goes straight
after the attacker. This multi-phase implementation
is similar to that used in an LQ optimal control
problem with a hard terminal constraint [18]. Therein
open-loop control is proposed near the problem
terminal to drive the system to a desired state instead
of using state feedback to avoid an arbitrarily large
feedback gain.
VII. CONCLUSIONS
In this paper we have studied DA game problems
under a LQ game framework. From the practical
point of view, the inherent hard constraint has been
approximated and replaced by soft constraints with
a fixed optimization horizon. Three possible DA
games regarding the mobility of the asset have
been addressed: 1) defending a stationary asset; 2)
defending a moving asset with an arbitrary trajectory;
3) defending an escaping asset. Equilibrium strategies
of the players have been derived. We have proved the
existence of solutions for the corresponding Riccati
equation associated with the LQ games with simple
linear dynamics. For implementation a receding
horizon-based LQRHA algorithm has been proposed.
The usefulness of this LQ game approach has been
demonstrated by simulations of selected scenarios,
and an optimal strategy is used for comparison.
Overall the LQ strategies based on the LQRHA
implementation algorithm can provide good control
guidance laws for the players in DA games.
1040 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
APPENDIX I. PROOF OF THEOREM 4
We first introduce some necessary notations.
Define H¢=fA,Q, Ba, Big, and denote by Ric[H](W)
the right-hand side of the Riccati equation in (10)
with matrix W of a proper dimension in place of
matrix Z. Define the set RH¢=fW 2Rn£n jW =WT
and Ric[H](W) 0g with 2 f¸,·,=g. We denote byW(¢,X0) the solution of (10) with W(t0,X0) = X0 forsome arbitrary and fixed t0. The following lemma is
needed to prove Theorem 4.
LEMMA 1 Suppose that there exist
W1 2 RH· and W2 2 RH¸ with W1 ·W2:Then W1 ·W0 ·W2 for some W0 2 Rn£n, and W0 =WT
0
yields that W(t,W0) exists for t 2 (¡1, t0] with W1 ·W1(t,W1)·W0(t,W0)·W2(t,W2)·W2 for t 2 (¡1, t0].PROOF Refer to Theorem 3.1 in [22] for a proof.
PROOF OF THEOREM 4 The proof of Theorem 4 is an
adaptation of Lemma 1. First of all the linear simple
dynamics in RnS can be rewritten in the form of (11)
with matrices A, Bi, Ba specified as
A= 02nS , Ba =
·vaInS
0
¸and Bi =
·0
viInS
¸:
Matrix Q1 in (14) is
Q1(wa,wi) =
·(wa¡wi)InS wiInS
wiInS ¡wiInS
¸such that Q,Qf in (13) can be determined accordingly.
Now we construct matrices W1 and W2 with the
properties specified in Lemma 1. Define
S¢= BaB
Ta ¡ BiBTi =
·v2aInS 0
0 ¡v2i InS
¸:
Let W1 be
W1¢=W11 +W12
¢=!11
·¡InS InS
InS ¡InS
¸+!12
·0 0
0 ¡InS
¸with parameters !11,!
12 ¸ 0 to be determined.
Substitute W1 into Ric[H] (with H¢=fA,Q, Bi, Bag), and
we obtain
Ric[H](W1) =¡Q+W1SW1 =¡Q+W11SW11 +W12SW12 +W11SW12 +W12SW11
=¡·(wIa¡wIi )InS wIi InS
wIi InS ¡wIi InS
¸+!112 (v
2a ¡ v2i )
·InS ¡InS¡InS InS
¸+!122
·0 0
0 ¡v2i InS
¸+!11!
12
·0 v2i InS
v2i InS ¡2v2i InS
¸: (35)
For any x= [xTa ,xTi ]T,
xTRic[H](W1)x= wIi kxi¡ xak2¡wIakxak2
+!121 (v2a ¡ v2i )kxi¡ xak2¡!122 v2i kxik2
¡ 2!11!12v2i kxik2 +2!11!12v2i xTi xa:(36)
Next we show that for each of the three cases in
Theorem 4, it is true that xTRic[H](W1)x· 0 forcertain !11,!
12. First if va < vi, we choose !
12 = 0, and
then (36) becomes
xTRic[H](W1)x
= wIi kxi¡ xak2¡wIakxak2 +!121 (v2a ¡ v2i )kxi¡ xak2:Note that va < vi, and clearly, we can choose !
11
sufficiently large such that xTRic[H](W1)x· 0 for anyx 2 R2nS . Secondly if wIi < wIa, we can choose !11 = 0,and similarly (36) becomes
xTRic[H](W1)x= wIi kxi¡ xak2¡wIakxak2¡!122 v2i kxik2:
Since wIi < wIa, we can choose !
12 sufficiently large
such that xTRic[H](W1)x· 0 for any x 2 R2nS . Thirdlyif wIi = w
Ia and va = vi, (36) becomes
xTRic[H](W1)x= wIi kxik2¡ 2wIi xTi xa¡!122 v2i kxik2
¡2!11!12v2i kxik2 +2!11!12v2i xTi xa:Now we choose !11,!
12 such that !
11!
12v2i = w
Ii , and
clearly xTRic[H](W1)x· 0. So far we have shown thatthere exists some W1 such that W1 2 RH· for each case.Now we check whether Qf ¸W1. Given any
x 2 R2nS ,xT(Qf ¡W1)x= (!11 ¡wi)kxi¡ xak2 +wakxak2 +!12kxak2:
(37)
It can be easily checked that for each of the cases
mentioned above, xT(Qf ¡W1)x¸ 0 with a proper !11.That is, Qf ¸W1.Next we construct matrix W2 as
W2 =
·!2InS 0nS
0nS 0nS
¸(38)
with parameter !2 > 0 to be determined. For the
matrix Ric[H](W2) =¡Q+W2SW2, given any
LI & CRUZ, JR.: DEFENDING AN ASSET: A LINEAR QUADRATIC GAME APPROACH 1041
x= [xTa ,xTi ]T,
xTRic[H](W2)x
= (!22v2a ¡wIa+wIi )kxak2 +wIi kxik2¡ 2wIi xTa xi:
Clearly there exists some C1 > 0, such that when
!2 ¸ C1, xTRic[H](W2)x¸ 0 for any x 2R2nS , i.e.,W2 2 RH¸ . Furthermore for any x= [xTa ,xTi ]T,xT(W2¡Qf)x= (!2¡wa+wi)kxak2¡ 2wixTa xi+wikxik2:
There exists some C2 > 0 such that when !2 ¸ C2,xT(W2¡Qf)x¸ 0 for any x 2R2nS , i.e., W2 ¸Qf . Let!2 ¸maxfC1,C2g. Hence W1 ·Qf ·W2. By Lemma 1,a bounded solution exists for all t· T with T <1.
APPENDIX II. PROOF OF THEOREM 5
PROOF The proof is similar to that for Theorem 4 in
Appendix I. We can construct matrices
W1 = !1
264¡InS InS 0
InS ¡InS 0
0 0 0nS
375
W2 = !2
264 InS 0 ¡InS0 0nS 0
¡InS 0 InS
375with some !1,!2 ¸ 0 to be determined. It can be easilyshown, as in the proof for Theorem 4, Lemma 1
applies with proper !1 and !2. Thus a bounded
solution exists for all t · T with T <1.REFERENCES
[1] Hespanha, J., Prandini, M., and Sastry, S.
Probabilistic pursuit-evasion games: A one-step nash
approach.
In Proceedings of the 39th IEEE Conference on Decision
and Control, Sydney, Australia, 2000, 2272—2277.
[2] Schenato, L., Oh, S., and Sastry, S.
Swarm coordination for pursuit evasion games using
sensor networks.
In Proceedings of the International Conference on Robotics
and Automation, Barcelona, Spain, 2005, 2493—2498.
[3] Li, D. and Cruz, Jr., J. B.
Improvement with look-ahead on cooperative pursuit
games.
In Proceedings of the 44th IEEE Conference on Decision
and Control, San Diego, CA, Dec. 2006.
[4] Li, D., Cruz, Jr., J. B., and Schumacher, C.
Stochastic multi-player pursuit-evasion differential games.
International Journal of Robust and Nonlinear Control, 18
(2008), 218—247.
[5] Cruz, Jr., J. B., Simaan, M., Gacic, A., Jiang, H.,
Letellier, B., and Li, M.
Game-theoretic modeling and control of a military air
operation.
IEEE Transactions on Aerospace and Electronic Systems,
37, 4 (2001), 1393—1405.
[6] Cruz, Jr., J. B., Simaan, M., Gacic, A., and Liu, Y.
Moving horizon nash strategies for a military air
operation.
IEEE Transactions on Aerospace and Electronic Systems,
38, 3 (2002), 989—999.
[7] Isaacs, R.
Dfferential Games: A Mathematical Theory with
Applications to Warfare and Pursuit.
New York: Wiley, 1965.
[8] Basar, T. and Olsder, G.
Dynamic Noncooperative Game Theory (2nd ed.).
Philadelphia: SIAM, 1998.
[9] Ho, Y. C., Bryson, Jr., A. E., and Baron, S.
Differential games and optimal pursuit-evasion strategies.
IEEE Transactions on Automatic Control, AC-10, 4 (1965),
385—389.
[10] Willman, W.
Formal solutions for a class of stochastic pursuit-evasion
games.
IEEE Transactions on Automatic Control, 14, 5 (1969),
504—509.
[11] Turetsky, V. and Shinar, J.
Missile guidance laws based on pursuit-evasion game
formulations.
Automatica, 39, 4 (2003), 607—618.
[12] Liu, Y., Cruz, Jr., J. B., and Schumacher, C.
Pop-up threat models for persistent area denial.
IEEE Transactions on Aerospace and Electronic Systems,
43, 2 (2007), 509—521.
[13] Pachter, M.
Simple-motion pursuit-evasion in the half plane.
Computers & Mathematics with Applications, 13, 1—3
(1987), 69—82.
[14] Zarchan, P.
Tactical and Strategic Missile Guidance.
Reston, VA: American Institute of Aeronautics and
Astronautics, 2007.
[15] Evans, L. and Souganidis, P.
Differential games and representation formulas for
solutions of Hamilton-Jacobi-Isaacs equations.
Indiana University Mathematics Journal, 33, 5 (1984),
773—797.
[16] Fleming, W. and Souganidis, P.
On the existence of value functions of two-player,
zero-sum stochastic differential games.
Indiana University Mathematics Journal, 38, 2 (1989),
293—314.
[17] Mitchell, I., Bayen, A., and Tomlin, C.
A time-dependent Hamilton-Jacobi formulation of
reachable sets for continuous dynamic games.
IEEE Transactions on Automatic Control, 50, 7 (2005),
947—957.
[18] Bryson, Jr., A. E.
Dynamic Optimization.
Menlo Park, CA: Addison-Wesley Longman, 1999.
[19] Basar, T. and Bernhard, P.
H 1-optimal Control and Related Minimax DesignProblems (2nd ed.).
Boston: Birkhauser, 1995.
[20] Engwerda, J.
LQ Dynamic Optimization and Differential Games.
Hoboken, NJ: Wiley, 2005.
[21] Anderson, B. and Moore, J.
Optimal Control: Linear Quadratic Methods.
Upper Saddle River, NJ: Prentice-Hall, 1989.
[22] Freiling, G. and Jank, G.
Existence and comparison theorems for algebraic
Riccati equations and Riccati differential and difference
equations.
Journal of Dynamical and Control Systems, 2, 4 (1996),
529—547.
1042 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011
Dongxu Li received the B.S. degree in engineering physics from Tshinghua
University in 2000, P.R. China, the M.S. degree in mechanical engineering,
and the Ph.D. degree in electrical engineering from The Ohio State University,
Columbus, OH, in 2002 and 2006, respectively.
He is currently working as a researcher in R&D at General Motors Company.
Before joining GM, he was a postdoctoral researcher in the Department
of Electrical and Computer Engineering at The Ohio State University. His
research interests include control theory and applications in automotive systems,
cooperative control of networked systems, and dynamic game theory.
LI & CRUZ, JR.: DEFENDING AN ASSET: A LINEAR QUADRATIC GAME APPROACH 1043
Jose B. Cruz, Jr. received the B.S. in electrical engineering from the University
of the Philippines (UP), April 1953, S.M. in electrical engineering from the
Massachusetts Institute of Technology (MIT), June 1956, and the Ph.D. in
electrical engineering from the University of Illinois, Urbana—Champaign,
October 1959.
He is a Distinguished Professor of Engineering and a Professor of Electrical
and Computer Engineering at The Ohio State University (OSU). Previously,
he served as Dean of the College of Engineering at OSU from 1992 to 1997,
and as a Professor of Electrical and Computer Engineering at the University of
California in Irvine (UCI) from 1986 to 1992 and at the University of Illinois
from 1965 to 1986. He was a visiting professor at MIT and Harvard University
in 1973 and visiting associate professor at the University of California, Berkeley
in 1964—1965. He served as instructor at UP 1953—1954 and research assistant at
MIT, 1954—1956.
Dr. Cruz, Jr., is the author or coauthor of six books, 23 chapters in
research books, and more than 250 articles in research journals and refereed
conference proceedings. He was elected a member of the National Academy
of Engineering (NAE) in 1980, Chair of the NAE Electronic Engineering Peer
Committee 2003, NAE Membership Committee 2003—2007, and he was elected
a corresponding member of the National Academy of Science and Technology
(NAST, Philippines) in 2003. He is a Fellow of the American Association for
the Advancement of Science (AAAS), elected 1989; a Fellow of the American
Society for Engineering Education (ASEE), elected in 2004; a Fellow of IFAC,
elected in 2007; recipient, Curtis W. McGraw Research Award of ASEE,
1972; recipient, Halliburton Engineering Education Leadership Award, 1981;
Distinguished Member, IEEE Control Systems Society, designated in 1983;
recipient, IEEE Centennial Medal, 1984; recipient, IEEE Richard M. Emberson
Award, 1989; recipient, ASEE Centennial Medal, 1993; and recipient, Richard E.
Bellman Control Heritage Award, American Automatic Control Council (AACC),
1994.
In addition to membership in ASEE, AAAS, NAE, NAST, he is a member
of the Institute for Operations Research and Management Science (INFORMS),
the Philippine American Association for Science and Engineering (PAASE,
Founding member 1980, Founding President-Elect 1981, Chairman of the Board
1998—2000), Philippine Engineers and Scientists Organization (PESO), National
Society of Professional Engineers (NSPE), Sigma Xi, Phi Kappa Phi, and Eta
Kappa Nu.
He has served on various professional society boards and editorial boards,
and he served as an officer of professional societies, including IEEE wherein he
was President of the IEEE Control Systems Society in 1979, Editor of the IEEE
Transactions on Automatic Control, a member of the IEEE Board of Directors
from 1980 to 1985, IEEE Vice President for Technical Activities in 1982 and
1983, and IEEE Vice President for Publication Activities in 1984 and 1985. He
served as Chair of the Engineering Section of AAAS in 2004—2005. He served as
a member of the Board of Examiners for Professional Engineers for the State of
Illinois, 1984—1986.
1044 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 47, NO. 2 APRIL 2011