ieee transactions on automatic control, vol. 56, no. 8

13
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8, AUGUST 2011 1849 Hamilton–Jacobi Formulation for Reach–Avoid Differential Games Kostas Margellos, Student Member, IEEE, and John Lygeros, Fellow, IEEE Abstract—A new framework for formulating reachability problems with competing inputs, nonlinear dynamics, and state constraints as optimal control problems is developed. Such reach–avoid problems arise in, among others, the study of safety problems in hybrid systems. Earlier approaches to reach–avoid computations are either restricted to linear systems, or face numerical difficulties due to possible discontinuities in the Hamil- tonian of the optimal control problem. The main advantage of the approach proposed in this paper is that it can be applied to a general class of target-hitting continuous dynamic games with nonlinear dynamics, and has very good properties in terms of its numerical solution, since the value function and the Hamil- tonian of the system are both continuous. The performance of the proposed method is demonstrated by applying it to a case study, which involves the target-hitting problem of an underactuated underwater vehicle in the presence of obstacles. Index Terms—Differential game theory, Hamilton–Jacobi equations, hybrid systems, optimal control, reachability. I. INTRODUCTION R EACHABILITY for continuous and hybrid systems has been an important topic of research in the dynamics and control literature. Numerous problems regarding safety of air traffic management systems [1]–[3], flight control [4]–[7] ground transportation systems [8], [9], etc., have been formu- lated in the framework of reachability theory. In most of these applications, the main aim was to design suitable controllers to steer or keep the state of the system in a “safe” part of the state space. The synthesis of such safe controllers for hybrid systems relies on the ability to solve target problems for the case where state constraints are also present. The sets that represent the so- lution to those problems are known as capture basins [10]. One direct way of computing these sets was proposed in [11] and [12] and was formulated in the context of viability theory [10]. Following the same approach, the authors of [13] and [14] formulated viability, invariance, and pursuit-evasion gaming problems for hybrid systems and used nonsmooth analysis tools to characterize their solutions. Computational tools to support this approach have been developed by [15]. Manuscript received November 23, 2009; revised June 13, 2010 and November 09, 2010; accepted January 05, 2011. Date of publication January 13, 2011; date of current version August 03, 2011. This work was supported by the European Commission under the project CATS, FP6-TREN-036889. Recommended by Associate Editor P. Shi. The authors are with the Automatic Control Laboratory, Department of Electrical Engineering and Information Technology, Swiss Fed- eral Institute of Technology (ETH), 8092 Zürich, Switzerland (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAC.2011.2105730 An alternative, indirect way of characterizing such problems is through the level sets of the value function of an appropriate optimal control problem. By using dynamic programming, for reachability/invariant/viability problems without state constraints, the value function can be characterized as the viscosity solution to a first-order partial differential equation in the standard Hamilton–Jacobi form [16]–[18]. Numerical algorithms based on level set methods have been developed by [19] and [20], have been coded in efficient computational tools by [18] and [21], and can be directly applied to reachability computations. In the case where state constraints are also present, this target-hitting problem is the solution to a reach–avoid problem in the sense of [1]. The authors of [1] and [22] developed a reach–avoid computation, whose value function was charac- terized as a solution to a pair of coupled partial differential equations. In [21], [23], and [24], the authors proposed another characterization, which involved only one Hamilton–Ja- cobi-type partial differential equation together with an in- equality constraint. These methods, however, are hampered both from a theoretical and a numerical point of view by the fact that the Hamiltonian of the system is in general discontin- uous [22]. In this case, there is no theoretical characterization of the value functions as viscosity solutions of variational equations/inequalities. In [25] and [26], a scheme based on ellipsoidal techniques to compute reachable sets for control systems with constraints on the state was proposed. This approach was restricted to the class of linear systems. In [27], this approach was extended to a list of interesting target problems with state constraints. The calcu- lation of a solution to the equations proposed in [25]–[27] is in general not easy apart from the case of linear systems, where duality techniques of convex analysis can be used. In this paper, we propose a new framework of characterizing reach–avoid sets of nonlinear control systems as the solution to an optimal control problem. Related problems in the absence of competing inputs have recently been treated in [28]. We con- sider the case where we have competing inputs and hence adopt the gaming formulation proposed in [17]. We first restrict our at- tention to a specific reach–avoid scenario, where the objective of the control input is to make the states of the system hit the target at the end of the time horizon and without violating the state constraints. We then generalize our approach to the case where the controller aims to steer the system toward the target not nec- essarily at the terminal, but at some time within the specified time horizon. Both problems could be treated as pursuit–eva- sion games. The contribution of this paper is that it provides a clear characterization of two nonlinear reach–avoid problems, and a proof that the corresponding reach–avoid sets are deter- mined by the level sets of nonsmooth value functions similar to [27], which in turn are the unique continuous viscosity solu- 0018-9286/$26.00 © 2011 IEEE

Upload: others

Post on 10-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8, AUGUST 2011 1849

Hamilton–Jacobi Formulation for Reach–AvoidDifferential Games

Kostas Margellos, Student Member, IEEE, and John Lygeros, Fellow, IEEE

Abstract—A new framework for formulating reachabilityproblems with competing inputs, nonlinear dynamics, and stateconstraints as optimal control problems is developed. Suchreach–avoid problems arise in, among others, the study of safetyproblems in hybrid systems. Earlier approaches to reach–avoidcomputations are either restricted to linear systems, or facenumerical difficulties due to possible discontinuities in the Hamil-tonian of the optimal control problem. The main advantage ofthe approach proposed in this paper is that it can be applied toa general class of target-hitting continuous dynamic games withnonlinear dynamics, and has very good properties in terms ofits numerical solution, since the value function and the Hamil-tonian of the system are both continuous. The performance of theproposed method is demonstrated by applying it to a case study,which involves the target-hitting problem of an underactuatedunderwater vehicle in the presence of obstacles.

Index Terms—Differential game theory, Hamilton–Jacobiequations, hybrid systems, optimal control, reachability.

I. INTRODUCTION

R EACHABILITY for continuous and hybrid systems hasbeen an important topic of research in the dynamics

and control literature. Numerous problems regarding safety ofair traffic management systems [1]–[3], flight control [4]–[7]ground transportation systems [8], [9], etc., have been formu-lated in the framework of reachability theory. In most of theseapplications, the main aim was to design suitable controllers tosteer or keep the state of the system in a “safe” part of the statespace. The synthesis of such safe controllers for hybrid systemsrelies on the ability to solve target problems for the case wherestate constraints are also present. The sets that represent the so-lution to those problems are known as capture basins [10]. Onedirect way of computing these sets was proposed in [11] and[12] and was formulated in the context of viability theory [10].Following the same approach, the authors of [13] and [14]formulated viability, invariance, and pursuit-evasion gamingproblems for hybrid systems and used nonsmooth analysis toolsto characterize their solutions. Computational tools to supportthis approach have been developed by [15].

Manuscript received November 23, 2009; revised June 13, 2010 andNovember 09, 2010; accepted January 05, 2011. Date of publication January13, 2011; date of current version August 03, 2011. This work was supportedby the European Commission under the project CATS, FP6-TREN-036889.Recommended by Associate Editor P. Shi.

The authors are with the Automatic Control Laboratory, Departmentof Electrical Engineering and Information Technology, Swiss Fed-eral Institute of Technology (ETH), 8092 Zürich, Switzerland (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TAC.2011.2105730

An alternative, indirect way of characterizing such problemsis through the level sets of the value function of an appropriateoptimal control problem. By using dynamic programming,for reachability/invariant/viability problems without stateconstraints, the value function can be characterized as theviscosity solution to a first-order partial differential equationin the standard Hamilton–Jacobi form [16]–[18]. Numericalalgorithms based on level set methods have been developed by[19] and [20], have been coded in efficient computational toolsby [18] and [21], and can be directly applied to reachabilitycomputations.

In the case where state constraints are also present, thistarget-hitting problem is the solution to a reach–avoid problemin the sense of [1]. The authors of [1] and [22] developed areach–avoid computation, whose value function was charac-terized as a solution to a pair of coupled partial differentialequations. In [21], [23], and [24], the authors proposed anothercharacterization, which involved only one Hamilton–Ja-cobi-type partial differential equation together with an in-equality constraint. These methods, however, are hamperedboth from a theoretical and a numerical point of view by thefact that the Hamiltonian of the system is in general discontin-uous [22]. In this case, there is no theoretical characterizationof the value functions as viscosity solutions of variationalequations/inequalities.

In [25] and [26], a scheme based on ellipsoidal techniques tocompute reachable sets for control systems with constraints onthe state was proposed. This approach was restricted to the classof linear systems. In [27], this approach was extended to a listof interesting target problems with state constraints. The calcu-lation of a solution to the equations proposed in [25]–[27] is ingeneral not easy apart from the case of linear systems, whereduality techniques of convex analysis can be used.

In this paper, we propose a new framework of characterizingreach–avoid sets of nonlinear control systems as the solution toan optimal control problem. Related problems in the absence ofcompeting inputs have recently been treated in [28]. We con-sider the case where we have competing inputs and hence adoptthe gaming formulation proposed in [17]. We first restrict our at-tention to a specific reach–avoid scenario, where the objective ofthe control input is to make the states of the system hit the targetat the end of the time horizon and without violating the stateconstraints. We then generalize our approach to the case wherethe controller aims to steer the system toward the target not nec-essarily at the terminal, but at some time within the specifiedtime horizon. Both problems could be treated as pursuit–eva-sion games. The contribution of this paper is that it provides aclear characterization of two nonlinear reach–avoid problems,and a proof that the corresponding reach–avoid sets are deter-mined by the level sets of nonsmooth value functions similarto [27], which in turn are the unique continuous viscosity solu-

0018-9286/$26.00 © 2011 IEEE

Page 2: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

1850 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8, AUGUST 2011

tions to variational equations of a form similar to [29] and [30].In addition to theoretical support for the use of computationaltools, the numerical advantage of this approach is that the prop-erties of the value function and the Hamiltonian (both of themare continuous) enable the use of existing tools based on LevelSet Methods [23], or other tools for solving variational equa-tions [29], to compute the solution of the problem numerically.Another advantage of this paper is that it provides a theoreti-cally solid formulation for the reach–avoid operator, which isthe core of the hybrid algorithm of [1] and [22], and consists analternative approach for the viability-based algorithm of [14].

To illustrate our approach, we consider the motion of an au-tonomous underwater vehicle in the presence of a disturbancecurrent, whose mathematical modeling was studied in detail in[31]. The objective in this case is to determine the set of initialstates from which, for any disturbance, the underwater vehiclecan hit a target set while avoiding some fixed obstacles.

In Section II, we pose two reach–avoid problems for contin-uous systems with competing inputs and state constraints andformulate them in the optimal control framework. Section IIIprovides the characterization of the value functions of theseproblems as the viscosity solution to two variational equations.In Section IV, we present an application of this approach to thenavigation of an underactuated underwater vehicle in the pres-ence of obstacles. Finally, in Section V, we provide some con-cluding remarks and directions for future work.

II. DIFFERENTIAL GAMES AND REACH–AVOID PROBLEMS

A. Differential Game Problem Formulation

Consider the continuous time control system ,and an arbitrary time horizon , with ,

, , and . Let ,denote the set of Lebesgue measurable functions from

the interval to , and respectively. Consider also twobounded, Lipschitz continuous functions ,

to be used to encode the target and state constraintsrespectively.

Assumption 1: The sets and are compact.The functions , and are bounded, Lipschitzcontinuous in , and continuous in and .

Under Assumption 1, the system admits a unique solutionfor all , ,

and for each initial state . For , this solutionwill be denoted by

(1)

Let be a bound such that for all ,and , and for all and

Let also and be such that

and

and

In a game setting, it is essential to define the information patternsthat the two players use. Following [17] and [32], we restrict thefirst player to play nonanticipative strategies. A nonanticipativestrategy is a function such that for all

and for all , if for almost every, then for almost every . We

then use to denote the class of nonanticipative strategies.Consider the sets , related to the level sets ofand , respectively. For technical purposes,

assume that is closed, whereas is open. Then, andcould be characterized as

B. Reach–Avoid at the Terminal Time

Let represent a set that we would like to reach whileavoiding a set . One would like to characterize the set ofthe initial states from which trajectories can start and reach theset at the terminal time without passing through the setover the time horizon . To answer this question, one needsto determine whether there exists a choice of suchthat for all , the trajectory satisfiesand for all . The set of initial conditionsthat have this property is then

(2)

Now, introduce the value function

(3)

1 can be thought of as the value function of a differential game,where is trying to minimize, whereas is trying to maximizethe maximum between the value attained by at the end of thetime horizon and the maximum value attained by along thestate trajectory over the horizon . Based on [16], [17], and[29], we will show that the value function defined by (3) is theunique viscosity solution of the following variational equation:

(4)

with terminal condition . It is theneasy to link the set of (2) to the level set of thevalue function defined in (3).

Proposition 1: .Proof: We first show that

holds. Consider a point, and for the sake of contradiction assume

that . The latter implies that such that

. Equiva-lently, there exists , such that for all ,

1Note that this � is different from the set that � takes values from, which wasdefined in Section II-A. Throughout the paper, it will always be clear from thecontext to which � we refer.

Page 3: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

MARGELLOS AND LYGEROS: HAMILTON–JACOBI FORMULATION FOR REACH–AVOID DIFFERENTIAL GAMES 1851

there exists , so that. The

last statement is equivalent to there exists , such thatfor all , there exists , such that

or there existssuch that . Or in other words,there exists , such that for all , there exists

, so that or there exists. The last statement is

equivalent to , which is a contradiction.We now show that .

Consider such that , and for the sake ofcontradiction assume that . This impliesthat for all , there exists , such that

or there exists such that. Then, there exists such

that for all , there exists , such that, or there exists such

that . However,implies that for all there exists a strategy

such that. Hence for all

, and also for all, . The last argument

implies that , and for all ,and so also for , . Bychoosing , the last statement establishes a contradictionand completes the proof.

C. Reach–Avoid at Any Time

Another related problem that one might need to characterizeis the set of initial states from which trajectories can start, and forany disturbance input can reach the set not at the terminal, butat some time within the time horizon , and without passingthrough the set until they hit . In other words, we would liketo determine the set

(5)

Based on [33], define the augmented input asand consider the dynamics

(6)

In Assumption 1, is assumed to be continuous inand , and Lipschitz continuous in . Hence, since is the aug-mented input, is not binary but takes values in [0, 1], and

is affine in , if satisfies Assumption 1

so will . Let denote the solutionof the augmented system, and define , and similarly tothe previous case. Following [33], for every , thepseudo-time variable is given by

(7)

Consider , as it was defined in [33], such that .In [33, Lemma 6], was proven to be the limit of a conver-gent sequence of functions, its existence was verified, and it wasshown that

(8)

for any . Based on the analysis of [33], (8) implies thatthe trajectory of the augmented system visits only the subsetof the states visited by the trajectory of the original system inthe time interval .

Define now the value function

One can then show that is related to the set .Proposition 2: For ,

.The proof of this proposition is given in Appendix A.

III. CHARACTERIZATION OF THE VALUE FUNCTION

A. Basic Properties of

We first establish the consequences of the principle of opti-mality for .

Lemma 1: For all and all ,we have (9), shown at the bottom of the page. Moreover, for all

.The proof for the second part is straightforward and follows

from the definition of . The proof for the first part is given inAppendix B.

Next, we show that is a bounded, Lipschitz continuousfunction.

Lemma 2: There exists a constant such that for all

The proof of this Lemma is given in Appendix B.

(9)

Page 4: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

1852 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8, AUGUST 2011

B. Variational Equation for

We now introduce the Hamiltonian ,defined by

Lemma 3: There exists a constant such that for all, and all

The proof of this fact is straightforward (see [16] or[34, Lemma 2]). We are now in a position to state and prove thefollowing theorem, which is the main result of this section.

Theorem 1: The function is the unique viscosity solutionover of the variational equation

with terminal condition .Proof: Uniqueness follows from [29, Lemmas 2 & 3 and

Proposition 1]. Note also that bydefinition of the value function. Therefore, it suffices to showthe following.

1) For all and for all smooth, if attains a local maximum at ,

then

2) For all and for all smooth, if attains a local minimum at ,

then

The case is automatically captured by [35, p. 546].Part 1: Consider an arbitrary and a

smooth such that has a localmaximum at . Then, there exists such that for all

with

We would like to show that

Since by Lemma 1 , eitheror, . For the former, the claim

holds, whereas for the latter, it suffices to show that there existssuch that for all

For the sake of contradiction, assume that for all , thereexists such that for some

Since is smooth and is continuous, then based on [17], wehave that

for all and some , where denotesa ball centered at with radius . Because is compact, thereexist finitely many distinct points

, and such that and for

Define by setting for , if. Then

Since is smooth and is continuous, there existssuch that for all with

Finally, define by forall . It is easy to see that is now nonanticipativeand hence . Thus for all and all

such that

Page 5: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

MARGELLOS AND LYGEROS: HAMILTON–JACOBI FORMULATION FOR REACH–AVOID DIFFERENTIAL GAMES 1853

By continuity, there exists such thatfor

all . Therefore, for all

Let be such that.

Case 1.1: If , then for we have

(10)Then, by the dynamic programming argument of Lemma 1, wehave

We can choose such that

and set . Since by Lemma 1for all , we

have that.

Hence

Since (10) holds for all , it will also hold for ,and hence the last argument establishes a contradiction.

Case 1.2: If then for we have that for all

Since by Lemma 1

then if

we can choose such that

which establishes a contradiction.If

then we can choose such that

or equivalently , since . Basedon our initial hypothesis that , there exists a

such that . If we take ,we establish a contradiction.

Part 2: Consider an arbitrary and asmooth such that has a localminimum at . Then, there exists such that for all

with

We would like to show that

Since we have , it suf-fices to show that

.This implies that for all , there exists a such that

For the sake of contradiction, assume that there existssuch that for all there exists such that

Since is smooth, there exists such that for allwith

Page 6: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

1854 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8, AUGUST 2011

Hence, following [17], for and any

By continuity, there exists such thatfor all

. Therefore, for all

However, by the dynamic programming argument of Lemma 1,we can choose a such that we have the equationshown at the bottom of the page. The last statement establishesa contradiction and completes the proof.

C. Variational Equation for

Consider the value function defined in the previous section.The following theorem proposes that is the unique viscositysolution of another variational equation.

Theorem 2: is the unique viscositysolution of the variational equation

(11)

with terminal condition .Proof: By Theorem 1, is the unique viscosity so-

lution of (4), subject to . If we

let , then, following theproof of [33, Theorem 2], we have that

Consequently, the two variational equations (4) and (11) areequivalent, and so is the viscosity solution of (11).

Since the solution to (11) is unique [29], one could easilyshow that

IV. CASE STUDY: UNDERWATER VEHICLE MOTION IN THE

PRESENCE OF OBSTACLES

To illustrate the theoretical formulation of Section II, we con-sider the motion of an underactuated underwater vehicle in thepresence of a disturbance current. Based on the modeling ap-proach of [31], we focus on the problem of steering the vehicletoward a specified target set while avoiding fixed obstacles inthe navigation space.

A. Mathematical Modeling

Following the detailed derivation of [31], we consider the mo-tion of a three-degrees-of-freedom underwater vehicle with twoback thrusters but no side thruster. For the kinematic equationsof the vehicle, the following model was assumed:

Page 7: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

MARGELLOS AND LYGEROS: HAMILTON–JACOBI FORMULATION FOR REACH–AVOID DIFFERENTIAL GAMES 1855

The state variables in represent the cartesiancoordinates and the orientation of the vehicle, whereas ,are the components of the linear velocity vector, and is theangular velocity. Variable denotes the amplitude and thedirection of the disturbance current.

In [31], the dynamic and kinematic equations of motion werestudied separately, and a forward reachability analysis for thedynamical subsystem was performed, in order to determine anestimate for the bounds of , and . By adopting theseresults, we further consider, as in [31], that isthe control input, which consists of the velocities along the twodegrees of freedom, and to act as a bounded disturbance,since it is the velocity along the unactuated degree of freedom.

B. Reach–Avoid Formulation

The objective in this problem is to identify the set of ini-tial states for which there exists a control input , such thatfor any disturbance , the vehicle can reach a target withinsome specified time interval, while avoiding some fixed obsta-cles denoted by . This is a reach–avoid at any time problem,and based on the analysis of Section II-B, the value function

, that characterizes the desired set, is the viscosity solution of(11). The target set is characterized by constraints of the form

and . Similarly,represents the obstacles in the motion space and could be ex-

pressed as , where, and denotes the number

of obstacles. To encode these constraints in the reach–avoid set-ting of Section II, we define functions and

, such that characterizes the set and, where determines

the obstacle . A natural choice is to choose to be the signeddistance to the set . Then

ifif

where stands for the usual distance tothe set . Similarly, is defined to be the signed distance tothe set respectively. The functions and will then beLipschitz by construction; to keep them bounded, we saturatedthem at the Lipschitz constants and , respectively.

The Hamiltonian of the system, as defined in Section III-B,is given by

The input values that optimize can be then easily com-puted as

ifif

ifif

ifif

where for (see [31] for a detailedderivation). Although these inputs depend in general on the state

of the system (through the costate vector ), they are not neces-sarily feedback, but nonanticipative strategies. For a single inputsetting though, the optimal control inputs would also be feed-back.

To enforce the constraints represented by numerically, aprocedure called “masking” is used in the level set methods toensure that the value function will not enter in the obstacle re-gion . Alternatively, numerical tools of [29] for solving varia-tional equations could be used. In both methods, as also statedin [21], at each time-step and for all grid points , the valuefunction is computed as , where

is the numerical solution of the partial differential equa-tion, which appears as the second term in (11). A similar pro-cedure is followed for , where the second term of (4) issolved instead.

C. Simulation Results

For the numerical computation, we used four fixed obstaclesand considered m/s, (aligned with the -axis)to be the current disturbance amplitude and orientation respec-tively. The orientation of the underwater vehicle can vary inthe interval . For the simulations, m/s,

m/s, rad/s, rad/s,m/s, and m/s were chosen from

[31] to be the extrema of the control and disturbance inputs.Contour slices of the resulting set , for anddifferent values of , are shown in Fig. 1(a)–(c). The reachablesets include all states inside the area determined by the solidlines and, as expected, do not include points inside the fixed ob-stacles denoted by rectangles. The filled square represents thetarget set that the vehicle aims to reach, whereas the dashedsquare indicates the boundary of the motion space. For com-parison purposes, the dashed lines depict the reachable sets at

.So far, the disturbance current was assumed to have constant

magnitude and direction. In a worst-case setting, the angleof the current can be considered as an additional disturbanceinput , which is also trying to maximize the Hamiltonianof the system. The maximum value of is attained for

. The numerically computed reachable set forthis case is depicted in Fig. 2(a). It implies that only the pointsthat belong to this set can reach the target for any disturbancedirection and for any value of . The transparent cube indicatesthe boundary of the motion space.

For a more realistic implementation of the worst-case sce-nario, the state space could be augmented with

, and so the derivative of the current’s angle, in-stead of the actual angle, could be treated as an additional distur-bance input. Fig. 2(b) depicts a 3-D projection of the 4-D reach-able set for s. This projection represents a union over thereachable sets that correspond to each disturbance angle, and asexpected, it is a superset of the one of Fig. 2(a) (conservativecase) since the disturbance does not change direction instanta-neously any more, and subset of that of Fig. 1(d), where thedisturbance was assumed to have constant direction. The mainpurpose of this example was to illustrate the proposed formula-tion numerically, and from an application point of view, furtherinvestigation is required.

All simulations were performed on an Intel Core 2 Duo2.66-GHz processor running Windows 7 and using the Level

Page 8: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

1856 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8, AUGUST 2011

Fig. 1. Contour plots of ����� �� �� for (a) � � ����� rad, (b) � ������ rad, and (c)� � � rad. (d) 3-D representation of ����� �� ��.

Set Method Toolbox [23] (ver. 1.1) on MATLAB 7.10. Sincethe Level Set Method Toolbox is based on gridding the statespace, the memory and computational cost grow exponentiallywith the dimension of the system, and hence the algorithm

Fig. 2. (a) Worst-case analysis with � as additional disturbance input.(b) Worst-case analysis with �� as additional disturbance input.

TABLE INUMERICAL STATISTICS OF THE REACHABILITY COMPUTATIONS

suffers from the “curse of dimensionality” [21]. On the otherhand, assuming that an accurate enough grid is used, tightapproximations of the (in general irregular and nonconvex)reachable sets can be achieved. The details for the numericalimplementation of the specific example are summarized inTable I. As expected, the first two cases, which were performedon the same grid, required similar time and memory usage,whereas the 4-D implementation led to a significant increaseboth in memory and computational time.

V. CONCLUSION

A new framework of controlling nonlinear systems with stateconstraints and competing inputs was presented, and a proof thatthe value function of the resulting reach–avoid problem is theviscosity solution to a variational equation was provided. Theformulation was based on reachability and game theory and hasthe advantage of maintaining the continuity in the value func-tion and the Hamiltonian of the system. As a consequence, ithas very good numerical properties in the sense that standard nu-merical tools can be now formally used. The effectiveness of theproposed approach was verified numerically in the target-hitting

Page 9: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

MARGELLOS AND LYGEROS: HAMILTON–JACOBI FORMULATION FOR REACH–AVOID DIFFERENTIAL GAMES 1857

problem of an underactuated underwater vehicle in the presenceof obstacles. For the numerical implementation, standard toolsbased on Level Set Methods were used.

In future work, we plan to extend the proposed approach toformulate games in the case where the obstacle function is time-and/or control-dependant. Another issue would be to provide asystematic methodology in order to construct numerically thetheoretically optimal control policy. This is in general difficultto construct since it requires the computation of derivatives ofthe value function, which is a process very sensitive to numericalerrors. Finally, the developed reach–avoid operator provides analternative formulation for the viability-based approaches andcould be extended and incorporated in existing algorithms forverification of hybrid systems.

APPENDIX A

A. Proof of Proposition 2

Proof:Part 1: Following [33, Lemma 8], we first show that

. Consider, and for the sake of contradiction, as-

sume that . Then, there exists such that forall ,

. This in turnimplies that for all , there exists suchthat either , or there exists

such that .Consider now the implications of . Equa-

tion (5) implies that there exists a such thatfor all , and so also for , we can define

. Then, for this and , there existssuch that and for all

. Choose the freezing inputsignal as

forfor .

If we combine with , we can get the input , whichwill generate a trajectory

ifif

(12)Case 1.1: Consider first the case where for all

. For , we havethat

Since , we showed before that, i.e. .

Thus, from (12) we have that .Since is already nonanticipative, and anonanticipative strategy for can be designed, willalso be nonanticipative. Therefore, the previous statementestablishes a contradiction.

Case 1.2: Consider now the case where for allthere exists such that

. Since we showedthat for all , we canconclude from (12) that for all

If , we have that

Thus, .Hence, for all , we have that

. Since in Case 1.1 was shown to be nonanticipative, wehave a contradiction.

Part 2: Next, we show that. Consider such that and as-

sume for the sake of contradiction that . Then,for all there exists such that ei-ther for all , , or thereexists such that

.Consider the strategy (note that

follows from the implications of and will be de-fined in the sequel), which consists of a strategy ,as defined in Section II-B, and an additional scalar componentthat corresponds to . Following [33, Lemma 8], by eliminatingthis scalar component, we can extract from

. By the implications of , we can thenchoose the that corresponds to that . In [33,Lemma 4], it was proven that the set of states visited by the aug-mented trajectory is a subset of the states visited by the originalone. We therefore have that for all

(13)or there exists such that

(14)and similarly, . By(13) and (14), and based on the definition of and , we con-clude that there exists a such that either for all

(15)

or for some

(16)

Since , then for all , there ex-ists a nonanticipative strategy suchthat

. Hence, for all, , and for all

Page 10: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

1858 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8, AUGUST 2011

, . For , thelast argument implies that

and for

If we choose , the last statements contradict (15) and(16) and complete the proof.

APPENDIX B

A. Proof of Lemma 1

Proof: Following [17, Theorem 3.1], we can define

We will then show that for all ,and . Then, since is arbitrary,

.Case 1: . Fix and choose

such that

Similarly, choose such that we have the firstequation shown at the bottom of the page. For any ,we can define and such that

for all and for all. Define also by

ifif .

It is easy to see that is nonanticipative.By uniqueness, incase , and also

if .Hence, we have the second equation shown at the bottom of

the page. Therefore, .Case 2: . Fix and choose now

such that

(17)

By the definition of

Page 11: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

MARGELLOS AND LYGEROS: HAMILTON–JACOBI FORMULATION FOR REACH–AVOID DIFFERENTIAL GAMES 1859

Hence, there exists a such that

(18)

Let for all , and forall . Let also to be the restrictionof the nonanticipative strategy over . Then, for all

, we define . Hence, we have

and so there exists a such that

(19)

We can define

ifif .

Therefore, from (18) and (19)

which together with (17) implies .

B. Proof of Lemma 2

Proof: Since and are bounded, is also bounded. Forthe second part fix and . Let andchoose such that

By definition

We can choose such that

and hence

For all

where is the Lipschitz constant of . By theGronwall–Bellman Lemma [36, p. 86], there exists aconstant such that for all

Let be such that

Then

Page 12: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

1860 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8, AUGUST 2011

Case 1:

Case 2:

Thus, in any case. The same argument with the roles of , reversed

establishes that .Since is arbitrary

Finally, consider and . Without loss ofgenerality, assume that . Let and choosesuch that

By definition

Therefore, we can choose such that

where is the restriction of over . Then,for all , we define , and

. By uniqueness, for all we have that

Case 1: Assume that

where is the Lipschitz constant of .Case 2: Assume that

Let be such that

Then

where is the Lipschitz constant of . In any case we havethat

A symmetric argument shows that, and since is arbitrary, this

concludes the proof.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers fortheir constructive comments.

Page 13: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 8

MARGELLOS AND LYGEROS: HAMILTON–JACOBI FORMULATION FOR REACH–AVOID DIFFERENTIAL GAMES 1861

REFERENCES

[1] C. Tomlin, J. Lygeros, and S. Sastry, “A game theoretic approach tocontroller design for hybrid systems,” Proc. IEEE, vol. 88, no. 7, pp.949–969, Jul. 2000.

[2] C. Tomlin, G. Pappas, and S. Sastry, “Conflict resolution for air trafficmanagement: A study in multiagent hybrid systems,” IEEE Trans.Autom. Control, vol. 43, no. 4, pp. 509–521, Apr. 1998.

[3] K. Margellos and J. Lygeros, “Air traffic management with target win-dows: An approach using reachability,” in Proc. IEEE Conf. DecisionControl, 2009, pp. 145–150.

[4] C. Tomlin, J. Lygeros, and S. Sastry, “Aerodynamic envelope protec-tion using hybrid control,” in Proc. Amer. Control Conf., 1998, pp.1793–1796.

[5] J. Lygeros, C. Tomlin, and S. Sastry, “Controllers for reachability spec-ifications for hybrid systems,” Automatica, vol. 35, pp. 349–370, 1999.

[6] A. M. Bayen, I. M. Mitchell, M. Oishi, and C. Tomlin, “Aircraft au-tolander safety analysis through optimal control-based reach set com-putation,” J. Guidance, Control, Dynam., vol. 30, no. 1, pp. 68–77,2007.

[7] I. Kitsios and J. Lygeros, “Final glide-back envelope computation forreusable launch vehicle using reachability,” in Proc. IEEE Conf. Deci-sion Control, 2005, pp. 4059–4064.

[8] C. Livadas and N. Lynch, “Formal verification of safety-critical hy-brid systems,” Hybrid Syst., Comput. Control, vol. 1386, LNCS, pp.253–272, 1998.

[9] J. Lygeros, D. N. Godbole, and S. Sastry, “Verified hybrid controllersfor automated vehicles,” IEEE Trans. Autom. Control, vol. 43, no. 4,pp. 522–539, Apr. 1998.

[10] J. Aubin, Viability Theory. Boston, MA: Birkhauser, 1991.[11] P. Cardaliaguet, “A differential game with two players and one target,”

SIAM J. Control Optimiz., vol. 34, no. 4, pp. 1441–1460, 1996.[12] P. Cardaliaguet, M. Quincampoix, and P. Saint-Pierre, “Pursuit differ-

ential games with state constraints,” SIAM J. Control Optimiz., vol. 39,no. 5, pp. 1615–1632, 2001.

[13] J. P. Aubin, J. Lygeros, M. Quincampoix, S. Sastry, and N. Seube, “Im-pulse differential inclusions: A viability approach to hybrid systems,”IEEE Trans. Autom. Control, vol. 47, no. 1, pp. 2–20, Jan. 2002.

[14] Y. G. Gao, J. Lygeros, and M. Quincampoix, “On the reachabilityproblem for uncertain hybrid systems,” IEEE Trans. Autom. Control,vol. 52, no. 9, pp. 1572–1586, Sep. 2007.

[15] P. Cardaliaguet, M. Quincampoix, and P. Saint-Pierre, “Set valuednumerical analysis for optimal control and differential games,” inAnnals of the International Society of Dynamic Games, M. Bardi, T.Raghaven, and T. Papasarathy, Eds. Boston, MA: Birkhauser, 1999,pp. 177–247.

[16] J. Lygeros, “On reachability and minimum cost optimal control,” Au-tomatica, vol. 40, no. 6, pp. 917–927, 2004.

[17] L. Evans and P. Souganidis, “Differential games and representationformulas for solutions of Hamilton–Jacobi–Isaacs equations,” IndianaUniv. Math. J., vol. 33, no. 5, pp. 773–797, 1984.

[18] I. Mitchell, A. M. Bayen, and C. Tomlin, “Validating a Hamilton Jacobiapproximation to hybrid reachable sets,” in Hybrid Systems: Compu-tation and Control, M. Di Benedetto and A. Sangiovanni-Vincentelli,Eds. New York: Springer-Verlag, 2001, pp. 418–432.

[19] S. Osher and J. Sethian, “Fronts propagating with curvature-depen-dent speed: Algorithms based on Hamilton–Jacobi formulations,” J.Comput. Phys., vol. 79, pp. 12–49, 1988.

[20] J. A. Sethian, Level Set Methods: Evolving Interfaces in Geometry,Fluid Mechanics, Computer Vision, and Materials Science. NewYork: Cambridge Univ. Press, 1996.

[21] I. Mitchell and C. Tomlin, “Level set methods for computations inhybrid systems,” in Hybrid Systems: Computation and Control, M.Di Benedetto and A. Sangiovanni-Vincentelli, Eds. New York:Springer-Verlag, 2000, pp. 310–323.

[22] C. Tomlin, “Hybrid control of air traffic management systems,” Ph.D.dissertation, Dept. Elect. Eng. Comput. Sci., University of California,Berkeley, CA, 1998.

[23] I. M. Mitchell, “Application of level set methods to control and reacha-bility problems in continuous and hybrid systems,” Ph.D. dissertation,Stanford University, Stanford, CA, 2002.

[24] M. Oishi, “Recovery from error in flight management systems: Appli-cations of hybrid reachability,” presented at the Adv. Process ControlAppl. Ind. Workshop, 2006.

[25] A. B. Kurzhanski and P. Varaiya, “Ellipsoidal techniques for reacha-bility under state constraints,” SIAM J. Control Optimiz., vol. 45, no. 4,pp. 1369–1394, 2006.

[26] A. B. Kurzhanski, I. M. Mitchell, and P. Varaiya, “Optimization tech-niques for state-constrained control and obstacle problems,” J. Optimiz.Theory Appl., vol. 128, no. 3, pp. 499–521, 2006.

[27] A. B. Kurzhanski and P. Varaiya, “Optimization methods for targetproblems of control,” in Proc. Math. Theory Netw. Syst. Conf., 2002,pp. 1–10.

[28] O. Bokanowski, N. Forcadel, and H. Zidani, “Reachability and minimaltimes for state constrained nonlinear problems without any control-lability assumption,” 2009 [Online]. Available: http://hal.archives-ou-vertes.fr/docs/00/39/55/89/PDF/BFZ-Obstacle.pdf

[29] I. J. Fialho and T. Georgiou, “Worst case analysis of nonlinear sys-tems,” IEEE Trans. Autom. Control, vol. 44, no. 6, pp. 1180–1196, Jun.1999.

[30] E. Barron and H. Ishii, “The Bellman equation for minimizing the max-imum cost,” Nonlinear Anal, Theory, Methods Appl., vol. 13, no. 9, pp.1067–1090, 1989.

[31] D. Panagou, K. Margellos, S. Summers, J. Lygeros, and K. Kyri-akopoulos, “A viability approach for the stabilization of an underac-tuated underwater vehicle in the presence of current disturbances,” inProc. IEEE Conf. Decision Control, 2009, pp. 8612–8617.

[32] P. P. Varaiya, “On the existence of solutions to a differential game,”SIAM J. Control Optimiz., vol. 5, no. 1, pp. 153–162, 1967.

[33] I. Mitchell, A. M. Bayen, and C. Tomlin, “A time-dependentHamilton–Jacobi formulation of reachable sets for continuous dy-namic games,” IEEE Trans. Autom. Control, vol. 50, no. 7, pp.947–957, Jul. 2005.

[34] J. Lygeros, “Reachability, viability and invariance: An approach basedon minimum cost optimal control,” Dept. Eng., University of Cam-bridge, Cambridge, U.K., Tech. rep. CUED/F-INFENG/TR.444, 2002.

[35] L. Evans, Partial Differential Equations. Providence, RI: Amer.Math. Soc., 1998.

[36] S. Sastry, Nonlinear Systems: Analysis, Stability and Control. NewYork: Springer-Verlag, 1999.

Kostas Margellos (S’09) received the diplomain electrical and computer engineering from theUniversity of Patras, Patras, Greece, in 2008, andis currently pursuing the Ph.D. degree in automaticcontrol at ETH Zürich, Zürich, Switzerland.

His research interests include reachability analysisand control of nonlinear and hybrid systems, withapplications to complex networks such as powergrids and collision avoidance algorithms in air trafficmanagement.

John Lygeros (S’91–M’97–SM’06–F’11) receivedthe B.Eng. degree in electrical engineering and M.Sc.degree in systems control from Imperial College ofScience Technology and Medicine, London, U.K., in1990 and 1991, respectively, and the Ph.D. degree inelectrical engineering and computer sciences fromthe University of California (UC), Berkeley, in 1996.

From 1996 to 2000, he held a series of researchappointments at the National Automated HighwaySystems Consortium; the Massachusetts Institute ofTechnology, Cambridge; and UC Berkeley. In par-

allel, he also worked as a part-time Research Engineer with SRI International,Menlo Park, CA, and as a Visiting Professor with the Department Mathematics,Universite de Bretagne Occidentale, Brest, France. Between 2000 and 2003,he was a University Lecturer with the Department of Engineering, Universityof Cambridge, Cambridge, U.K., and a Fellow of Churchill College. Between2003 and 2006, he was an Assistant Professor with the Department of Electricaland Computer Engineering, University of Patras, Patras, Greece. In 2006, hejoined the Automatic Control Laboratory, ETH Zürich, Zürich, Switzerland,where he is currently serving as the Head of the laboratory. His researchinterests include modeling, analysis, and control of hierarchical, hybrid, andstochastic systems, with applications to biochemical networks, air trafficmanagement, power grids, and camera networks.