dynamic programming for computer based feedback control of ...dee07006/trp-jes.pdf · an infinite...
TRANSCRIPT
FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO
DOCTORAL PROGRAMME IN ELECTRICAL AND COMPUTER ENGINEERING
Dynamic programming for computer basedfeedback control of continuous-time systems
by
Jorge Manuel Estrela da Silva
A thesis proposal submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
(Electrical and Computer Engineering)
in the University of Porto, School of Engineering
October 2011
Doctoral Committee:
Professor Fernando M. F. Lobo Pereira, Chair (FEUP)
Professor Hasnaa Zidani (ENSTA - ParisTech)
Professor Fernando A. C. C. Fontes (FEUP)
Abstract
This document proposes the research directions to be followed by the author for the fulfilment of the requisites
of the Ph.D degree in the Doctoral Program in Electrical and Computer Engineering. The focus of this work is
on the development of efficient methodologies for the application of dynamic programming techniques on control
engineering problems. We work in the framework of deterministic differential games in order to encompass the
effect of disturbances and model uncertainty in the control design. The proposal is substantiated by preliminary
work. The main achievements of the preliminary work consist of a survey of numerical methods for the solution of
the dynamic programming equations, development of a numerical approach for the proposed control strategy and
application of the numerical approach to an engineering application. The main application consists of the motion
control of an Autonomous Underwater Vehicle (AUV). The AUV motion is modelled as a 4 dimensional system
of first order nonlinear ordinary differential equations. However, the adversarial input is assumed to be generated
by a discrete-time system at the same rate of the control cycle. The main line of future research will be the case
of mixed discrete-time (for the control input) and continuous-time (disturbances) inputs. We also plan to extend
the design methodology for the case of general Hybrid Systems.
ii
Acknowledgements
There is still some road ahead and I will leave the complete list of acknowledgements for the final thesis document.
However, I would like to start by thanking Professor Fernando Lobo Pereira for all the trust that he deposited in
me during this work. I would like to thank Professor Hasnaa Zidani and Professor Fernando Fontes for accepting
being members of my Doctoral committee.
I would also like to dedicate a special acknowledgement to Eng. Joao Borges de Sousa. His support and
guidance were of capital importance to some of the results reported on this document. The main incentive to
explore the application of numerical methods to optimal control engineering applications has come from him,
back when I was taking the Hybrid Systems course that he taught during my first year of the PDEEC. He was the
responsible to arrange the visit of Professor Ian Mitchell to FEUP, in November 2010, and to make my presence
at the BIRS Workshop on Advancing numerical methods for viscosity solutions and applications, Canada, in
February 2011, possible (I also acknowledge the financial support from Universita’ di Roma Tre for the travel
expenses and I thank, in particular, Professor Roberto Ferretti for arranging this support). Professor Ian Mitchell
gave me precious insights on the numerical methods for HJI PDEs during his stay at Porto. The BIRS workshop
gave me the chance to meet some very active members of this field. The suggestion of application of dynamic
programming techniques to AUV motion control, in the context of the C4C European project, has come from Eng.
Joao Borges de Sousa. He had the courtesy of providing all the necessary facilities for the experiments with the
AUV from the Underwater Systems and Technology Laboratory from FEUP.
I thank the financial support of the C4C European Project for the participation in the following conferences:
ECC2009, Physcon2009, AIS2010, CDC2010, IFAC2011, Physcon2011.
I thank my institution, Instituto Superior de Engenharia do Porto (ISEP), and Fundacao da Ciencia e Tec-
nologia for partial funding of this research, under the PROTEC program.
iii
iv
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Two mode strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Summary of preliminary work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 7
2.1 Dynamic Programming: a brief introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Dynamic programming for deterministic systems continuous-time systems . . . . . . . . . . . . . 7
2.3 Receding horizon control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Numerical methods for the dynamic programming equations . . . . . . . . . . . . . . . . . . . . 10
2.4.1 Control-theoretic schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.2 Level set methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.3 Efficient methods for boundary value problems . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.4 Applications to hybrid systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Synthesis of the Feedback Controller 19
3.1 Solution of the two mode strategy by dynamic programming . . . . . . . . . . . . . . . . . . . . 19
3.2 Numerical algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 The semi-Lagrangian scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Constrained finite and infinite horizon problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.1 Continuous-time LQR: unconstrained case . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.2 State and input constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.3 Sampled data system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.4 Minimum time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6 Semi-Lagrangian receding horizon scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.1 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.2 Numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 A Motivating Application 31
4.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 Six degree of freedom model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 Model simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Path following problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.2 Control design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Simulation and experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Formation control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Research Plan 41
Bibliography 44
v
A Appendix 51
A.1 Dijkstra’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.2 Semi-Lagrangian scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.3 Upwind difference scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.4 Essentially Non-oscillatory Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
vi
Chapter 1
Introduction
1.1 Motivation
The importance of dynamic optimization techniques has been recognized in several areas. From chemical to
aerospace engineering, from finance to biology, there are several niches where the employment of such techniques
was found to be profitable. In the context of feedback control of dynamic systems, one may be interested in the
design of a control law that makes the closed loop system meet a certain optimization criteria while satisfying a
set of state and input constraints.
The objectives of the control laws can be roughly divided in two main classes: output regulation, which - in
the context of dynamic optimization - may be translated into an infinite horizon problem; and reachability of some
given target (target problem). Common formulations of the reachability problem are in terms of time-optimality or
energy-optimality. For certain combinations of system dynamics and optimization criteria, it is possible to obtain
analytical expressions for such control laws; one classical example of that is the linear-quadratic regulator (LQR),
an infinite horizon optimal control problem (OCP). However, for general systems with input and state constraints,
it is generally impossible to derive an analytical expression for the control law.
The focus of this research is on dynamic programming (DP) techniques [16] for the feedback control of con-
tinuous time systems on a deterministic setting. The main goal is to develop an efficient control design procedure
for the deployment of computer based controllers with specific global robustness and performance properties.
For this purpose, the proposed control strategy is based on a set of dynamic optimization problems. We use the
framework of deterministic differential games [22, 55, 58] in order to ensure the robustness properties. Differen-
tial games generalize the dynamic programming techniques for systems with competing inputs. We will consider
the zero-sum case with two competing inputs. These inputs will be designated as control input and adversarial
input. In this sense, the effect of disturbances and model uncertainty can be modelled as a vectorial adversarial
input. Such disturbances and uncertainties will be handled in a worst-case basis. We assume computer controlled
systems and, therefore, the more usual discrete-time feedback control law.
The DP approach is based on the concept of value function. The value function gives the optimal cost of
the optimal control problem as a function of the current state and, eventually, time. The dynamic programming
principle (DPP) provides a constructive procedure to compute the value function. Finally, given the value function,
the optimal control can be determined as a state feedback.
However, traditionally, DP has been considered infeasible for practical problems. This is because, in general,
the value function must be determined using numerical methods. This offers two main challenges: the actual
computation of the value function; and efficient storage of the value function on the target system. Both challenges
are associated to the so called curse of dimensionality. In fact, for general approaches, the processing and storage
requirements grow exponentially with the system dimension. The computation of the value function is the most
demanding task in this process. However, it must be remarked that this computation just has to be performed
once for each given problem data. Assuming the problem data is constant (i.e., no change of state constraints,
optimization criteria or system parameters is expected during system operation) this computation can be performed
offline and, therefore, state of the art workstations, General Purpose Computing on Graphics Processing Units
(GPGPU), clusters of computers and more powerful solutions may be used to minimize the computation time.
On the other hand, given the value function, the computation of the optimal feedback control is many order
of magnitudes faster. The main question here is what kind of representation of the value function should be
1
employed in order to allow accurate computation of the optimal controls while minimizing the required storage
space. Notice that such optimization is also important for the computation of the value function. However, the later
avenue offers more delicate problems: it is harder to find the optimal discretization while still computing the value
function; loss of accuracy incurred by the discretization can propagate and accumulate during the computation of
the value function.
Given these difficulties, many applications of dynamic optimization consist of open loop solutions. For in-
stance, the target problem is usually solved either as a open loop problem or by decoupling it into two different
sub-problems: optimal trajectory (or path) planning and trajectory (or path) tracking. The open loop consists
of computing the optimal control sequence for the given problem data and feeding that sequence to the system.
This approach clearly suffers from lack of robustness with respect to model uncertainties and disturbances. The
trajectory tracking approach, while in general ensuring target reachability, may require trajectory replanning in
order to account for large deviations from the expected trajectory; otherwise, optimality may be severely impaired
and state constraints may not be met. Moreover, this approach still requires the design of an appropriate feedback
controller. The same applies to path tracking (also known as path following); the main difference to trajectory
tracking is the relaxation of the dependence with respect to time in the later case.
In the past decades, Model Predictive Control (MPC), also known as receding horizon control, has been one
of the most active fields of research on dynamic optimization based feedback control. The key idea of MPC
is to solve a finite horizon OCP at each control instant and use the first value of the optimal control sequence.
This technique requires high processing capabilities for the target system. Moreover, ensuring stability of a MPC
regulator is not a straightforward task; the choice of the prediction horizon and terminal cost play an essential role
in this subject. The main application of MPC is the optimal regulation of systems with input and state constraints.
For piecewise-affine systems with input and state constraints, efficient solutions can be found, for certain classes
of optimization criteria. For more general systems and optimization criteria, the research for efficient solutions is
still an open field, sometimes coined as Nonlinear MPC (NMPC) (see, e.g., [50]).
In spite of the steady evolution of computer systems, finding a sufficiently accurate numerical solution for an
optimal control problem in real time (even in the finite horizon case) is still an infeasible task for many applica-
tions, even if only due to economical aspects. In order to overcome this problem, the MPC community developed
the concept of explicit MPC (eMPC) [5, 95]). The main idea of this technique consists of performing the op-
timization effort during the design phase and then store a discrete set of values from which the optimal control
can be easily determined (e.g., by interpolation, by picking a pre-determined gain for the respective region of the
state space). However such approach suffers from the curse of dimensionality. Moreover, the parallel between the
eMPC and DP approaches is obvious: offloading the bulk of the optimization to the design phase; efficient online
computation of the state feedback using some stored table of values.
Therefore, in spite of having been dismissed as intractable by many in the past, dynamic programming seems
to be gradually becoming a realistic approach for practical applications. Numerical methods for the computation
of the value function for problems involving continuous-time and continuous state space systems have been de-
veloped in the past two decades (see [19, 24, 30, 67] and references therein). Such methods can take advantage
of parallel computing (see, e.g., [44]); of course this is not enough to defeat the curse of dimensionality but it
can be explored in order to reduce computation times, to refine the discretization or to make it feasible to handle
problems with some additional extra dimensions (depending on the available number of machines and desired
accuracy). Dynamic programming methods are deeply connected to Hamilton-Jacobi (HJ) Partial Differential
Equations (PDE). In fact, the differential version of the DPP principle leads either to the Hamilton-Jacobi-Bellman
(HJB) PDE, if only the control input is considered, or to the Hamilton-Jacobi-Isaacs (HJI) PDE, if an adversarial
input is assumed. The results and techniques for the numerical solution of such PDE can be employed in the
computation of value functions.
One major issue of digital control systems is their limited control rate. The main factors for this limitation
are the available actuation rates (i.e., the rates at which the actuators can receive new references) and available
computing power (which may limit the frequency at which the estimation and control algorithms are executed).
One common approach for the design of digital control systems for continuous-time systems is to neglect the
control rate during the design of the control law, i.e., to assume a pure continuous-time system, and then to design
the computational system in a way that it may allow a control rate that does not hamper the expected performance
of the control law. This may lead to an economically infeasible designs or, on the other hand, it may lead the
designer to accept a performance worse than it could be obtained should the control rate be taken into account
in the design of the control law. Moreover, in certain situations, this may lead to catastrophic results. Figure 1.1
shows one such scenario. The rectangle represents an obstacle that must be avoided by the system. The initial
state of the system is inside the cell with vertices xi−1, j−1,xi, j−1,xi−1, j,xi, j. If a sufficiently high control rate
2
xi, j
xi−1, j
xi, j−1
Figure 1.1: Different trajectories generated by different sampling rates. The star marks the target and the shaded
rectangle is an obstacle to avoid.
is considered, the finite difference scheme is able to generate a trajectory that successfully avoids the obstacle
(composed of the smaller dashed arrows). However, if a lower control rate is used, the system will eventually
collide with the obstacle. This happens because, due to sampling and hold mechanism, the same initial optimal
control is applied during the entire duration of the larger control period.
This sampled data control problem has been identified and approached in the literature (see, e.g., [71] and
references therein). One of the goals of this research is to develop methods that explicitly take into account the
finite control rate at the design phase.
1.1.1 Two mode strategy
During this research, both infinite horizon differential games and target problems will be considered. This is
motivated by the following control strategy.
Let us consider systems with smooth dynamics described by x = f (x,a,b) where x ∈ Rn is the state vector, a
is a bounded control input and b is a bounded adversarial input modelling the effect of disturbances and model
uncertainty. It is desirable that the system verifies x ∈ R whenever possible. The set R models the maximum
tolerance for system operation. The control design is driven by the set of state constraints defined by x ∈R and
also by a given running cost for regular operation. By regular operation we mean that the vehicle is satisfying the
state constraints.
In practice, it cannot be assumed that the system will always meet the state constraints: unforeseen distur-
bances may drive the system outside R; the system may start outside R; even if the system starts inside R, there
may be no trajectory inside R for all the desired time horizon.
The above mentioned problems are related to the concept of invariance. Invariance analysis is a research topic
that has been and still is extensively studied by the community. We consider the definition of Maximal Robust
Controlled Invariant Set (MRCIS) from [18] but we remark that similar definitions exist (e.g., discriminating
kernel [7]). In simple terms, the Maximal Controlled Invariant Set (MCIS) is the set of states from which a system
is able to satisfy the state constraints afterwards. The definition of MCIS does not consider disturbances. The
MRCIS is the set of states from which a system is able to satisfy the state constraints afterwards, even when
subject to some set of disturbances.
Since the given R is not necessarily invariant, we define the first problem.
Problem 1. Find the MRCIS S ⊂R.
In order to meet the state constraints at all times, the system’s state must be inside S . Whenever the system
is outside S , we opt to bring the system to the interior of S in minimal time. In order to accomplish this, two
additional problems are considered:
Problem 2. Derive a control law such that if the system is outside S then it should drive the system into S as
soon as possible.
Problem 3. Derive a control law such that if the system is inside S then it should make the system stay inside
S while minimizing the time integral of a given smooth function L(x,a)≥ 0 (the running cost) over a large time.
The running cost is assumed to be bounded for x ∈R and the optimization horizon is assumed to be much larger
that the time for the system to reach an equilibrium or steady periodic behaviour.
3
More formally, Problem 2 consists of finding a control law fmt(x) such that, for any given initial condition
x(t0) = x0 6∈S , fmt(x0) = a(t0), where a(.) is the optimal control of an unconstrained minimum time problem
with boundary conditions x(t0) = x0 and x(t f ) ∈ S . Problem 3 is solved using a constrained infinite horizon
differential game. Therefore, problem 3 consists of finding a control law fS(x) such that, for any given initial
condition x(t0) = x0 ∈S , fS(x0) = a(t0), where a(.) is the optimal control of a differential game with running
cost L(x,a,b), terminal cost Ψ(x) = 0, t f = 0, t0→−∞ and state constraints defined by x ∈R (or x ∈S , which
leads to the same solution, as will be discussed later).
1.2 Summary of preliminary work
The work so far has focused on the development of a numerical approach for the implementation of the proposed
control strategy. We briefly describe the achievements accomplished so far.
We explored the subject of computing the numerical solution of undiscounted (non-vanishing running cost)
constrained infinite horizon problems. Specially for the case of differential games, no general results seem to exist.
Theoretical results do exist under certain conditions (e.g., [13, Chapter VII], [64], [12]) but practical applications
are not common. Under some controllability assumptions and for some classes of running cost, it can be shown
that V (t,x)− (V (x)− ct)→ 0 as t→−∞, where V (t,x) : (−∞,0]×Rn→ R is the solution of the infinite horizon
problem and c is the average cost of the of the optimal trajectories in the large-time (the ergodic cost). We follow
the approach of solving the infinite horizon problem as a finite horizon problem with increasing horizon. Each
iteration of the numerical algorithm corresponds to a step backward in time. On each iteration, the algorithm
checks whether the current solution provides a suitable approximation of the asymptotic solution V (x)− ct. For
the envisaged applications, the ergodic cost is assumed to be non-zero and no a priori knowledge is assumed
concerning the large time behaviour. In general, the optimal trajectories may not converge to the minimizer of
the value function, which precludes recasting this problem as a target problem. We have some results with level
set methods [90] but we found the semi-Lagrangian discretization of the dynamic programming equation more
suitable for our approach (see section 3.2).
The typical semi-Lagrangian (SL) solver based on first order interpolation leads to sub-optimal optimal control
laws for non-smooth value functions, with excessive or anticipated action near the non-smooth regions. Moreover,
the computed MCIS is an over-approximation of the exact MCIS. These effects can be minimized by refining
the grid near those regions. However, memory constraints may sometimes prevent such approach. The alterna-
tive technique of avoiding cells near the boundary of the computed MCIS may lead to over-conservative sub-
approximations of the MCIS. We propose a new algorithm for the finite and infinite horizon optimal control
problems. This algorithm is based on the SL scheme from [30] and uses some elements from receding horizon
control. The proposed algorithm gives an accurate sub-approximation of the MCIS up to the grid resolution. The
smearing introduced by the discontinuity at the boundary of the MCIS can be adjusted by tuning the algorithm
parameters.
We use the SL scheme as an explicit implementation of the discrete-time dynamic programming principle
where the time-step of the SL scheme corresponds to the control rate. Therefore, the control rate is taken into
account in a straightforward way during during the process of deriving the control law. The main requirement
for this approach is the employment of an adequate ODE solver. This approach is verified with an instance of
the discrete-time LQR problem. We observe a close correspondence between the control law obtained by the
numerical scheme and the one obtained by the solution of the algebraic Riccati equation.
We apply the current developments on an engineering application, namely the motion control of one of the
Autonomous Underwater Vehicles (AUV) developed at the University of Porto. These AUV are equipped with a
low cost embedded computer that runs all the navigation and control software. Therefore, the AUV can perform
the planned missions without any remote intervention; nowadays, even the deployment, retrieval and recharging
operations are being automated in order to keep human intervention at a minimal level. Therefore, we found it
important to use a control methodology that provides a systematic way to optimize the system operation taking
into account a large time horizon. One such example is the trade-off between tracking accuracy and hardware
wear. The implemented controller consists of the following components [90]: the controlled invariant set, the
minimum time feedback control law, the feedback control law associated to the infinite horizon problem and the
switching logic. The AUV is modelled as a 4-dimensional system of first order nonlinear differential equations.
The controller was analysed in simulation and it was already used in field operation. We are not aware of any
application of this kind in the literature.
Robotic problems typically deal with periodic (e.g., joint angles), bounded (e.g., hydraulic cylinder) and or
4
unbounded (e.g., distance to destination) state variables. Unbounded state variables, or even state variables with
a large range, pose a great difficulty in the discretization of the state space. However, it is not computationally
feasible to use a grid covering an unbounded set. However, by exploring the structure of the solution, that problem
can be circumvented in certain situations thus allowing the implementation of a numerical control law for the
unbounded working space. This is done by inspection of the value function computed for a sufficiently large
bounded set. We use this approach in the above mentioned application.
Some of the material presented on chapters 3 and 4 was already published, in [89], [90] and [32].
1.3 Outline
Chapter 2 presents a brief survey of the dynamic programming equations and related numerical methods. Chapter 3
describes preliminary work regarding the numerical synthesis of the proposed control strategy. Chapter 4 describes
the application of the control design methodology to the motion control of an AUV. Some experimental results are
presented. Chapter 5 describes the topics for future research.
5
6
Chapter 2
Background
2.1 Dynamic Programming: a brief introduction
Dynamic programming is a research topic introduced by Richard Bellman during the decade of 1950 [16], mo-
tivated by the problems of multistage decision processes. The basic idea behind dynamic programming (DP) is
presented in the form of the principle of optimality, originally stated as follows [16, p. 83]: “An optimal policy has
the property that, whatever the initial state and optimal first decision may be, the remaining decisions constitute
an optimal policy with regard to the state resulting from the first decision.” Another way to interpret the principle
of optimality is as follows: if the trajectory segments xab(.) : [t0, t1]→ Rn and xbc(.) : [t1, t2]→ R
n compose the
optimal trajectory from a to c, then the segment xbc(.) is also the optimal trajectory from b to c. In control theory,
the most well know form of the principle of optimality (also known as dynamic programming principle) is the
Hamilton-Jacobi-Bellman (HJB) partial differential equation (PDE), to be presented shortly.
Dynamic programming is based on the concept of value function. For a given dynamic optimization problem,
the value function V (t,x) gives the cost1 of the optimal trajectory starting from state x at time t. Bellman showed
that the value function, whenever smooth, can be computed as the solution of an HJB PDE. In [55], similar results
were proved for systems with competing inputs; this work introduced the Hamilton-Jacobi-Isaacs (HJI) PDE and
defined the field of differential games.
The lack of a precise or elegant definition of solution for the non-smooth case was an open challenge during
several years. The most successful solution to this challenge was the theory of viscosity solutions, first formalized
in [29] and a major break-trough for the development of theory and applications based on the HJB and HJI PDE.
2.2 Dynamic programming for deterministic systems continuous-time sys-
tems
Consider the system described by
x(t) = f (x(t),a(t)) (2.1)
and the cost functional
J(t0,x0,a) = Ψ(x(t f ))+∫ t f
t0
L(y(x0,τ ,a),a(τ))dτ (2.2)
where y(x, t,a) is the state of the system at time t when subject to the input sequence a(.) and initial state x(0) = x;
L(x,a) and Ψ(x) are the running and terminal costs, respectively.
Consider the optimal control problem (OCP)
mina∈Ua
J(t0,x0,a) (2.3)
1The term cost is associated to minimization problems; on his seminal document, Bellman considers a maximization problem and employs
the term return.
7
subject to:
a(.) ∈Ua (2.4)
x(t) ∈ X (2.5)
where Ua is the set of measurable input sequences such that a(t)∈Ua, where Ua is a compact set; and (2.5) defines
the state constraints. For the infinite and finite horizon control problems, we set t f = 0 and define t0 ∈ (−∞,0]. We
consider only the case of bounded inputs since for most engineering systems unbounded inputs are not physically
meaningful.
Consider also the following variation of the OCP presented above where
t f = mint : x(t) ∈T (2.6)
and T is a given target set. This is know as the optimal cost to reach problem or simply as target problem. This
version of the OCP has a static solution, in the sense that it not depends on t0. In that case, the cost functional can
be defined as J(x,a).The fundamental object of the dynamic programming framework is the value function [16]. The value function
associated to the OCP (2.3) is defined as follows:
V (t0,x0) = infa∈Ua
J(t0,x0,a) (2.7)
For the static OCP the value function is independent of t and it is defined as V (x0) = infa∈UaJ(x0,a).
One of the main appeals of the dynamic programming approach resides in the fact that, given the static value
function V (x), the optimal control can be determined in state feedback form fc(x) = a∗(0) where a(.) is given by
a∗ ∈ argmina∈Ua
∫ ∆
0L(y(x,τ ,a),a(τ))dτ +V (y(x,∆,a))
(2.8)
with ∆ ∈ (0, t f − t0]. Notice that (2.8) assumes a static value function. The same idea can be applied to time-
dependent value functions, leading to a time-dependent control law. However, the dependence on the time variable
is undesirable, specially if the objective is to optimize the performance of the system for an undefined period of
time or, ultimately, for the infinite horizon case. Since, in general, numerical solutions are employed, the extra
dimension incurred by the time variable would lead to increased computational requirements. That approach will
not be considered here further.
In what concerns the infinite horizon OCP (IHOCP), we remark that, for a controllable system under suitable
cost, V (t,x) converges to Vih(x)− ct as t →−∞, where c is the ergodic cost; in those cases, the time-invariant
Vih(x) is used in (2.8) in order to compute the optimal control.
The optimal control law for a system with a sample and hold scheme can be described by the discrete time
version of (2.8):
fc(x) ∈ argmina∈Ua
∫ ∆
0L(y∆(x,τ ,a),a)dτ +V (y∆(x,∆,a))
(2.9)
where y∆(x, t,a) = y(x, t, [0,∆]× a). Notice that, in spite of the discrete-time designation, the continuous-time
system model is still employed.
Of course, in order to apply either (2.8) or (2.9) the value function must be computed first. The dynamic
programming approach defines the value function in a constructive fashion, the so called dynamic programming
principle (DPP). The DPP for discrete time systems is expressed as follows:
V (t,x) = infa∈Ua
∫ ∆
0L(y∆(x,τ ,a),a)dτ +V (t +∆,y∆(x,∆,a))
,x 6∈T , t < t f (2.10)
For the time-dependent case we define the terminal condition
V (t f ,x) = Ψ(x) (2.11)
while for the static case we define the boundary condition
V (x) = Ψ(x),x ∈T (2.12)
8
As ∆ goes to zero, V (t,x) is a viscosity solution of the Hamilton-Jacobi-Bellman (HJB) partial differential
equation (PDE) (see, e.g., [46] or [13]):
∂
∂ tV (t,x)+H(∇V (t,x),x) = 0 (2.13)
where
H(p,x) = infa∈Ua
p · f (x,a)+L(x,a) (2.14)
is the Hamiltonian. The notion of viscosity solution must be introduced because, in general, the (classical) gradient
∇V (t,x) may be undefined in some regions of the state space. The viscosity solution, introduced in [29], is a type
of weak (generalized) solution that corresponds to the “physical” solution of the dynamic optimization problem;
accordingly, the HJB PDE formulation assumes some form of compatible generalized gradient. This form of the
DPP is more amenable to mathematical analysis. However, it does not take into account effect of the limited
control rate (at least in a straightforward way, as opposed to the discrete-time DPP).
In general, it is not possible to find an analytical expression for the value function. The DPP, either in the
form of (2.10) or as an HJB PDE (2.13), is explored to develop numerical solvers for the computation of the value
function.
The dynamic programming formulation can be extended to encompass the existence of adversarial inputs,
leading to the framework of differential games [55, 58]. The dynamical system is described by
x(t) = f (x(t),a(t),b(t)) (2.15)
with a(.) defined as before and b(.) ∈Ub ≡ b : [t0, t f ]→Ub,measurable, where Ub is a compact set.
Adversarial inputs are useful to encompass the effects of disturbances and model uncertainty. Intuitively, at
each instant, the adversarial input b will try to drive the system to the worst possible result (i.e., maximum cost)
while the control input a will do the opposite. In this setting, the optimal control problem (OCP) is defined as
follows:
supβ∈∆b
infa∈Ua
J(t0,x0,a,β [a]) (2.16)
where
J(t0,x0,a,b) = Ψ(x(t f ))+
∫ t f
t0
L(y(x0,τ ,a,b),a(τ),b(τ))dτ (2.17)
and ∆b = β : Ua→Ub : t > 0,a(s)= a(s)∀s≤ t⇒ β [a](s)= β [a](s)∀s≤ t is the set of nonanticipating strategies
for the adversarial input b. The concept of nonanticipating (or causal) strategy, introduced in [38] for differential
games, is necessary so that the corresponding value function may comply with the the dynamic programming
principle. The choice of the sequence supinf in (2.16) is not casual. This formulation corresponds to the so called
upper value of the game. This formulation gives informational advantage to the maximizing input (adversary), in
the sense that it is assumed that, besides the current state of the system, the adversary knows the current choice for
input a at the time it chooses b. The lower value solution can be defined in an analogous way. Since we want to
model the worst case effect of disturbances we prefer to follow a conservative approach and give the informational
advantage to the adversarial input and therefore use the upper value solution. Theses ideas become more clear in
the formulation of the dynamic programming principle for differential games.
The discrete-time DPP corresponding to the lower value solution of the differential game is the following:
V (t,x) = infa∈Ua
supb∈Ub
∫ ∆
0L(y∆(x,τ ,a,b),a,b)dτ +V (t +∆,y∆(x,∆,a,b))
(2.18)
The feedback control law is given by
fc(x) ∈ argmina∈Ua
maxb∈Ub
∫ ∆
0L(y∆(x,τ ,a,b),a,b)dτ +V (y∆(x,∆,a,b))
(2.19)
We assume systems such that V (x) can be obtained as described before for the single player case. The right hand
of (2.18) can be seen as a static game; since this game may not have a value, the infsup order cannot be chosen
9
arbitrarily (even disregarding the fact that it must comply to the formulation of the OCP). As stated above, this
formulation leads to a conservative control law: minimize the maximal cost that can be forced by the adversarial
input, taking into account that the adversarial input will know the choice of the control input; this corresponds to
choosing, among all possible actions, the action for which the maximum cost that can be imposed by the adversary
is minimal.
The value function corresponding to this problem has also been shown to be a viscosity solution of a first order
PDE, the Hamilton-Jacobi-Isaacs (HJI) PDE (see [13] and references therein). The corresponding Hamiltonian is
H(p,x) = infa∈Ua
supb∈Ub
p · f (x,a,b)+L(x,a,b) (2.20)
2.3 Receding horizon control
The receding horizon control approach, also known as Model Predictive Control (MPC), has been extensively
researched for the design of feedback controllers (see, e.g., [23]). Conceptually (the usual formulation assumes a
discrete time model), the objective for MPC is to compute a state feedback control law fc(x) = a∗(0) where
a∗ ∈ argmina∈Ua
∫ ∆
0L(y(x,τ ,a),a(τ))dτ +F(y(x,∆,a))
(2.21)
and F(x) is a given terminal cost. On classical MPC implementations, the right hand side of (2.21) is computed
at each control instant. Notice that (2.21) is a relaxation of the IHOCP. The main idea is to replace the IHOCP
value function V (x) by a given terminal cost function F(x), thus avoiding the costly computation of V (x). In order
to compensate for this, a sufficiently large optimization horizon ∆ must be considered at each control instant.
Therefore, the choice of ∆ is a compromise between the deviation from the optimal solution and the complexity of
the computations. Additionally, in general, the stability of the derived control law depends on the choice of F(x).Notice, that in the limit case of ∆ = 0, F(x) must be a control Lyapunov function (in the sense of [93]) in order to
ensure asymptotic stability of the closed loop control system.
In general, the right hand side of (2.21) is solved using numerical techniques. Such computation can be
infeasible for some applications. Explicit MPC (eMPC) [5] was developed in order to cope with that problem;
on eMPC, the RHS of (2.21) is computed off-line for a discrete set of states and stored in the control system.
Like dynamic programming approaches, eMPC is subject to the curse of dimensionality. Dynamic programming
techniques have also been used for the solution of the MPC problem (see, e.g., [28]).
The min-max approach for the design of robust optimal state feedback control laws for nonlinear systems has
been also employed in the context of MPC. For instance, [49] employ multi-parametric nonlinear programming
in the context of explicit nonlinear MPC; see [5] for a survey on explicit MPC and the connections between the
optimal control formulation and the parametric optimization problem for piecewise affine systems.
2.4 Numerical methods for the dynamic programming equations
Numerical computation of the value function is deeply connected with numerical methods for PDE. This is not
surprising since, as seen previously, the value function associated to an OCP can be computed as the viscosity
solution of a suitable HJB or HJI PDE. On the other hand, applications for numerical methods for PDE can
be found in other fields besides control. Front propagation problems, arising, e.g., from the fields of physics
and computer vision, can also be described by PDE. In a certain sense, the computation of the value function
is analogous to a front propagation problem. For instance, in the target problem, the boundary condition would
correspond to the initial description of the front. The increasing sequence of level sets of the value function would
then correspond to different positions of the front over time, i.e., they would correspond to the propagation of the
initial front (i.e., the boundary condition). As it will be shown, this analogy has been used in order to extend front
propagation (or moving interface) algorithms to the solution of the HJB and HJI PDE.
Numerical schemes for the solution of PDE may be roughly divided into Eulerian, Lagrangian and semi-
Lagrangian
In a Lagrangian scheme, the front is tracked by elements (also called markers or particles) that move along
with it. For instance, in a two dimensional domain, an interface could be approximated by a connected set of
segments. The front propagation would then be given by computing the trajectories of the markers, i.e., of the
10
segments’ endpoints. The Lagrangian formulation, along with the techniques to maintain an accurate description
of the front are collectively referred to as front tracking methods [73, pg. 26].
In an Eulerian scheme, the front evolution is captured (i.e., sampled) at fixed grid points2. The type of grid
may range from the Cartesian grid to unstructured grids. In practice, the computation is confined to a bounded
partition of the state space. Notice that for a state space of dimension n with a regular grid of k points per
dimension, the memory requirements are on the order of N = kn. In general, these schemes involve the numerical
approximation the spatial derivatives by finite-difference methods. Moreover, in order to ensure stability, the time
step for the numerical integration (corresponding to the time discretization) may have to take into account the
Courant-Friedrichs-Lewy (CFL) condition. The CFL condition imposes an upper bound to the ratio between the
maximum propagation during a time step and the grid resolution (the Courant number, ∆t ∑|vi|∆xi
, where vi is the
maximum velocity for dimension i - e.g., f (x,u∗) for a controlled system - and ∆xi is the grid spacing along
dimension i); in simpler terms, the higher the resolution of the grid, the smaller the integration time step will
be required to be and hence more costly the computation will be. For a first order explicit time discretization,
the scheme must comply with the CFL condition and, additionally, use upwind finite differences or add artificial
dissipation (using, for instance, the Lax-Friedrichs scheme) in order to ensure numerical stability. The introduction
of artificial dissipation to compensate for the lack of an upwind schemes has a negative impact on numerical
accuracy in the case of nonsmooth solutions, causing increased the numerical diffusion. On the other hand,
implementation of dimension independent upwind schemes may be a complex task, specially for the HJB and HJI
PDE where, due to the local optimization in the Hamiltonian, the choice of the velocity vector depends on the
choice of the one-sided derivative along each dimension.
Semi-Lagrangian (SL) schemes are also grid based (Eulerian component). However, instead of computing the
discrete approximation of the spatial derivatives, they evaluate the system trajectories emanating from the grid
nodes during a given time-step (Lagrangian component). In general, the endpoint of these trajectories will not
coincide with any grid node and interpolation methods must be used. In a certain sense, in terms of computational
burden, SL schemes replace the computation of the spatial derivatives by the interpolation procedure. The stability
of SL schemes does not depend on the CFL condition. This allows a more flexible trade-off between accuracy
and convergence rate. However, special care has to be taken in the numerical computation of the trajectories.
For large time steps, a simple Euler scheme may introduce significant errors; therefore, a more accurate ODE
solver may be required. Moreover, for SL schemes, the local optimization, analogous to the evaluation of the
Hamiltonian in finite difference schemes, becomes harder. Notice that, while for finite difference schemes the
local optimization only depends on the spatial derivatives and on the velocity equation (the system dynamics, for
control systems), SL schemes deal with the integration of the running cost along the trajectories (which must be
computed either analytically or numerically) and interpolation of the current solution of the PDE at the endpoints
of the trajectories. Only in special cases it is possible to obtain a closed form solution for this local optimization or
even in a form amenable to the application of efficient numerical static optimization techniques, specially for large
Courant numbers; in general, this will be a nonconvex problem, requiring a certain degree of exhaustive search to
avoid local solutions.
Other combinations can be found in the literature. For instance, in [60], the authors describe an algorithm
based on a Lagrangian description of the front. However, on each iteration of front propagation, the algorithm
assumes the existence of an underlying grid. For each grid point, only one point of the front is considered (the
nearest one). In [40], elements of SL and particle methods are combined.
This survey focus mainly on the Eulerian and SL approaches. On both approaches there is the question of
whether to use first or higher order methods. In the former, that applies to the finite difference approximation
of the derivatives; in the later, that applies to the interpolation. Second and higher order methods provide higher
accuracy for the same grid size in exchange of a higher computational cost; however, in order to avoid introducing
oscillations in the solution, the numerical schemes must be able to deal with the potential lack of smoothness of
the solution. Essentially Non-Oscillatory (ENO) and Weighted ENO (WENO) schemes [51, 77] are commonly
used for that purpose. In what concerns the discretization in time, the order of the integration scheme is relevant
for the accuracy of the schemes; for finite differences schemes, it may also contribute with stability [73, pg. 38].
2.4.1 Control-theoretic schemes
In [41], Falcone, based on some previous results, developed a fully discrete iterative (Gauss-Seidel) algorithm
for the numerical solution of deterministic infinite horizon control problems. The algorithm is based on a semi-
2The terms nodes and grid points are used interchangeably
11
Lagrangian scheme. Assuming continuous dynamics, Euler time discretization and first order simplicial inter-
polation, Falcone proved uniqueness of the solution, error bounds and first order rate of monotone convergence
to the viscosity solution (for the discounted case). This scheme was later extended for the target problems and
differential games, both for unconstrained and state constrained scenarios (see [13, 14] and also [30, 42] for more
recent results and surveys). In what concerns implementation aspects, the algorithm was extended for arbitrary
number of dimensions (in [25] the authors use multi-dimensional linear, quadratic and cubic interpolation) and
parallel execution (see, e.g., [44]).
Actually, this scheme computes the solution of the discrete-time dynamic programming equation. Neverthe-
less, as expected (see, e.g., [46, Theorem 8.1]), as the time-step goes to zero the result converges to the solution of
the corresponding HJI PDE. However, the smaller the time-step the smaller the rate of convergence, but, in prac-
tice, the HJI PDE can be solved with good accuracy for reasonably sized time-steps [43]. Notice that, for Courant
numbers less than unity, the scheme is basically solving implicit interpolations. In those cases, an infinite number
of iterations would be required to reach the fixed point. In practice, the machine precision will be reached in a
finite number of iterations and acceptable results can be achieved with even less iterations. An usual approach is
to stop the computation when the L1 or Linf norm of the difference between the solutions of successive iterations is
below a certain threshold. The algorithm can be accelerated by periodically performing iterations with the current
computed controls, in a policy iteration fashion [17], thus avoiding the costly local optimization.
In [24], the authors describe numerical algorithms based on the concepts of viability theory [7]. The authors
present algorithms for the following problems: over-approximation of the viability and discriminating kernels
(maximal controlled invariant set and maximal robust controlled invariant set); target problem and associated
capture basin (backward reachable set). The numerical schemes can be characterized as SL with zero order
interpolation. In what concerns the zero order interpolation, the algorithms consider all points in a neighbourhood
of the trajectory’s endpoint and pick the value of the optimizer of the update expression. This procedure is used
in order to ensure that the algorithms compute over-approximations of the solution. However, specially for the
target problem, such interpolation scheme introduces severe accuracy errors. Therefore, the authors describe a
grid refinement procedure and show convergence to the exact solution as the grid spacing goes to zero.
2.4.2 Level set methods
Level Set Methods (LSM) were first formally presented in [75] to compute the motion of a hypersurface Γ(t)⊂Rn
(also designated as interface or front). An example of a hypersurface, in R3, is the surface of a sphere. LSM
have many application, specially in image analysis, computer vision and computational physics, as can be seen
in [85], [73] and [76].
LSM are specially adequate for problems that otherwise would involve discontinuous solutions, such as some
instances of the target problem. When using the narrow band technique, which consists of solving the PDE only in
the neighbourhood of the level set at each iteration, very efficient implementations can be obtained. Moreover, for
a target problem, LSM compute the approximate solution for each node of the grid in a finite number of iterations
(if this value is finite). Of course, the accuracy will depend on the Courant number (the accuracy increases as
the Courant number decreases) and grid resolution. The application of LSM for the finite and infinite horizon
problems seems to be less common (in fact, the author is not aware of any previous publication concerning this
subject). One possible reason is the fact that, for smooth dynamics, the value function associated to such problems
is continuous, unless state constraints are considered. Nevertheless, LSM methods can be used for the solution of
this problem by considering an augmented system with time as an extra dimension, i.e., with state space defined
by [0,T ]×X where X is the state space of the original system, and boundary condition T ×X. We used this
approach in [90].
The main idea of LSM is to embeds Γ in a nicer function (the implicit function) φ(t,x),x ∈ Rn; by nicer,
it is meant that φ(t,x) will be at least Lipschitz continuous in x. It is assumed that Γ divides the domain into
separate subdomains, such that it is possible to define its interior, Γ− and its exterior, Γ+. Moreover, φ(t,x) has
the following property:
φ(t,x)< 0,x ∈ Γ−(t)
φ(t,x) = 0,x ∈ Γ(t)
φ(t,x)> 0,x ∈ Γ+(t)
The front is then given by the zero level sets of φ :
Γ(t) = x : φ(t,x) = 0 (2.22)
12
If we take the time derivatives of each member of φ(t,x) = 0 we get the convection equation:
φt(t,x)+∇φ(t,x) · f (t,x) = 0 (2.23)
with f (t,x)= dx(t)dt
and φ(0,x) given as initial condition. One common approach is to consider φ(0,x) as the signed
distance to Γ (distance is negative if x is inside Γ) for t = 0. Nevertheless, even with smooth initial conditions,
φ(t,x) may develop kinks (i.e., discontinuities in the first derivative). Therefore, LSM also rely on the concept
of viscosity solution to pick up the correct solution and, as stated before, the computation of the derivatives must
take into account this lack of smoothness.
In [65] (see also [67] for current developments), a toolbox for the numerical solution of the HJI PDE was
developed using LSM. An implementation of this toolbox for MatlabTM, the toolboxLS, is publicly available.
The options and several scenarios of application of the toolbox are described in detail in [66]. The toolbox can
deal with continuous nonlinear dynamics. The toolbox provides Total Variation Diminishing (TVD) integration
routines and dimension independent ENO/WENO spatial derivative schemes of varying order. Therefore the user
may choose a trade-off between the accuracy and computation time. In [69], the authors describe the application
of the toolbox to reachability analysis of continuous-time systems. The toolbox can also be used on the numerical
solution of the target problem (see, e.g., [68] for the double integrator system). On both cases, the user is required
to introduce the expression of the Hamiltonian (either in closed form, if possible, or in an algorithmic way). For
this type of problems, the current version of the toolbox uses central differencing ENO/WENO schemes for the
spatial derivatives and the Lax-Friedrichs scheme for the numerical approximation of the Hamiltonian. State
constraints can also be handled by the toolbox. Notice that, for the HJB PDE, the system flow (velocity) is chosen
from a given set in order to minimize ∇φ(t,x) · f (t,x). As seen before, for control systems, the trajectories obey
to dynamic constraints of the form x = f (t,x,a) where a can be chosen from a given set Ua; that can be written
as a differential inclusion x ∈A (x)≡ f (t,x,a) : a ∈Ua). Therefore, the level set equation for the time optimal
problem becomes
φt(t,x)+ infa∈A (x)
∇φ(t,x)a = 0 (2.24)
With similar considerations to the ones discussed previously in the formulation of differential games, the level set
equation for the time optimal differential game becomes
φt(t,x)+ infa∈Ua
supb∈Ub
∇φ(t,x) f (t,x,a,b) = 0 (2.25)
This formulation assumes an unity running cost. For the autonomous case, other running costs can be considered
by using the modified flow function f (x,a,b)/L(x,a,b) and taking t as cost.
In [19], the authors describe an algorithm for front propagation based on a nonlinear discretization scheme
(Ultra Bee). The Ultra Bee scheme is known to have good anti-diffusive properties. The algorithm is not a LSM
in the classical sense, because the implicit function is a discontinuous function, of the form:
φ(t,x) =−1,x ∈ Γ(t)∪Γ−
φ(t,x) = 1,x ∈ Γ+(t)
Instead of tracking φ(t,x) = 0, this algorithm tracks the changes of φ(t,x) from 1 to −1. The authors describe
an efficient dimension independent implementation, using a narrow band approach. The scheme is at most first
order, but numerical experiments showed that it tracks the front propagation much more accurately than a classical
Runge Kutta-ENO (both second order) LSM without reinitialization.
Semi-Lagrangian numerical schemes for LSM can also be found in the literature. This class of schemes is not
subject to the CFL condition. For instance, in [40] the authors combine a semi-Lagrangian scheme with a particle
based method; the particle method is used in order to support the detection and partial correction of distortions on
the front propagation. The authors use first order interpolation methods in order to ensure unconditional stability
and still claim acceptable levels of numerical diffusion.
In what concerns the choice between SL and finite difference schemes as building blocks for LSM, the same
considerations discussed previously for the general schemes apply. Therefore, no easy answer exist for this ques-
tion.
13
2.4.3 Efficient methods for boundary value problems
Let us start by showing how level set methods may be related to stationary boundary value problems. Consider, for
instance, V (x) as the minimum time to reach x starting from ∂Ω. In the context of LSM, the position of the front
at time t is given by Γ(t) = x : V (x) = t, with Γ(0) = ∂Ω. As it was done for the general LSM, if we take the
derivatives of both members of V (x) = t we obtain ∇V (x)x = 1. When studying the motion of the hypersurfaces,
it is usual to consider the velocity normal to the front:
fn(t,x) = f (t,x) ·∇φ(t,x)
‖∇φ(t,x)‖2
where ‖.‖2 is the Euclidean norm. Therefore, after considering the boundary condition we obtain
‖∇V (x)‖2 fn(x) = 1,∀x ∈Ω (2.26)
V (x) = 0,∀x ∈ ∂Ω (2.27)
with x ∈ Rn and Ω ⊂ R
n. Equation (2.26) is designated as Eikonal equation. For physical systems, this equation
may be used to describe the propagation of a front on an isotropic medium, i.e., a medium where the propagation
speed is the same on all directions. The propagation speed fn(x) is assumed to be always positive (front expansion)
or always negative (front contraction). Without loss of generality, we will assume the former.
As for the general formulation of LSM, this formulation is of little interest for control applications, unless
we assume systems of the form x ∈ Bn0(r), where Bn
0(r) is a hypersphere of radius r centred on the origin of Rn.
However, in order to explain the basic idea of the algorithms, we will consider this simple case for the time being.
The structure of the Eikonal equation allows the implementation of very efficient algorithms. This is because
the characteristic lines of this PDE coincide with the gradient of the solution. One such class of algorithms was
first developed in [98], based on the Dijkstra’s algorithm [36] (a brief description of the Dijkstra’s algorithm is
presented on Appendix A.1). A similar class of algorithms was developed independently by the Level Set commu-
nity and coined Fast Marching Method (FMM) [83, 84]. The algorithms compute the solution in a “single pass”
(i.e., each node is visited at most r times, where r is the number of its neighbours) and in a monotone fashion,
resembling the propagation of a wave. The Dijkstra’s algorithm, the Tsitsiklis’s algorithm and the FMM are fun-
damentally dynamic programming algorithms differing only in the way that the values of new points are computed
(update procedure), which in the later two cases is suitable for continuous domains. This difference is of central
importance. In order to appreciate that fact, consider the problem of finding the minimum distance from the origin
to a point (x,y) in R2 for a fully actuated particle assuming an obstacle-free environment. Assuming an orthogonal
grid, and independently of how fine is the considered discretization of the domain, the Dijkstra’s algorithm will
always produce the wrong result |x|+ |y| (the Manhattan norm), as opposed to the expected Euclidean norm. On
the other hand, both Tsitsiklis’s algorithm and the FMM converge to the exact solution.
The update procedure in [98] differs from the one used in the general FMM. In the former, the update function
performs a local minimization over all points of the convex combination of the neighbouring points (i.e., interme-
diate points are considered), in a semi-Lagrangian fashion. The later case use an Eulerian scheme; only grid points
are considered, and the value of each point is computed using an upwind difference scheme (see Appendices A.2
and A.3 for examples of both cases).
The computational complexity of these methods is O(N log(N)), where N is the number of nodes and n is the
dimension of the domain. The log(N) term is due to the heap-sort data structure used to store the ordered list of
considered nodes. In [102] (see also [79]), the authors develop a FMM with O(N) complexity. They achieve that
by exchanging the heap-sort data structure with the untidy priority queue, which has O(1) complexity on insert
and remove operations.
Fast Sweeping Method
The fast sweeping method (FSM) is a numerical method with O(N) computational complexity. It was first pre-
sented under this designation in [97] (see also [78] for more recent work and additional references). As opposed
to the FMM, it is an iterative method. However, for some problems, the number of iterations is independent of the
grid size, hence the claimed O(N) computational complexity. The designation of the method comes from the fact
that each iteration consists of 2n sweeps of the defined grid, where n is the dimension of the problem. One sweep
consists of visiting and updating each grid point in a given order. The idea is to change the travel direction one
2a characteristic line can be interpreted as the optimal flow of a given particle
14
dimension at a time until the 2n combinations are performed (e.g., for a one dimensional problem defined in [a,b]the sweeps would be from a to b and then from b to a). The local update scheme can be chosen as it most suits
the problem at hand: upwind finite-differences, finite-differences with artificial dissipation and semi-Lagrangian
schemes can be found in the literature. In [9], the authors propose an extension for the FSM, the locking method,
that avoids the processing of nodes whose value would not be changed by the update procedure in the current
sweep.
Obviously, the O(N) complexity does not imply that FSM will always behave better than other methods with
higher computational complexity. First, for problems of practical size, the constant factors not explicitly shown
in the O() classification play a determinant role in the computation time. Moreover, depending on the complexity
of the characteristic lines, the number of required iterations in order to find the exact solution may vary from one
to infinity. FSM have very good results for the Eikonal equation; the basic equation is solved in one iteration and
related problems are solved in a finite number of iteration (see, e.g., [62])). However, in the general case, the
method may not converge to the exact solution in a finite number of iterations.
In [103], the author shows several avenues for parallel implementations of FSM. One obvious approach is to
assign each sweep to a different processor. At the end of this first phase we will have 2n solutions, corresponding to
the 2n sweeps. The second and final phase of the iteration will consist of taking the minimum of the 2n solutions on
each grid point. This can also be done in parallel, by domain decomposition. The decomposition is straightforward
because no overlapping or communication between subdomains is required. However, it can be seen that, in the
sweeping phase, this parallel implementation will only take advantage of 2n processors at most. The authors do not
explore the fact that, in some upwind schemes, the local computation can explore further parallelism; in practice,
some of the max and min in the formulas correspond to different computation branches that could be be done in
parallel.
In order to further improve parallelism, domain decomposition must be applied in the sweeping phase. In
this case, communication between subdomains must occur. This is done by defining “thin” boundaries between
the subdomains. The nodes on those boundaries will have tentative values from each domain. In the second
phase, the minimum is taken and that value is used in the following iterations, again on each subdomain. This
way, information propagates naturally between subdomains. The authors base their results only on the Eikonal
equation. Numerical results confirm that the greater the number of subdomains the greater the number of required
iterations. However, in spite of the increase in the number of iterations, significant speedups may be achieved
(e.g., with 25 subdomains, the number of required iterations increased by a factor of 3 in the worst case example).
Anisotropic problems
When the system flow is given by x∈A (x) and A (x) is not an origin-centred hypersphere for some x, the problem
is designated as anisotropic. In this case, the maximum speed may depend not only on the position but also on the
direction of the path. Most control problems are anisotropic; for instance, the computation of minimum time paths
for the unicycle, with model (x1, x2, x3)∈A (x) with A (x) = (cos(x3),sin(x3),u) : u∈ [−1,1], is an anisotropic
problem.
The single pass methods described in the previous section were devised for isotropic problems. They can only
be applied to special cases of anisotropic problems. In [88], the authors show that those algorithms fail when the
characteristic direction is not in the same simplex used to compute the gradient of the solution. In [6], the authors
formally define a set of properties that anisotropic problems should meet in order to be treatable by the FMM.
That work is an extension of the results previously published on [74]. Robotics inspired examples are presented
in [6] to shown that FMM can be applied in situations where the Hamiltonian is described by norms other than the
Euclidean norm. However, the assumed underlying dynamics are still described by the trivial relation x∈Bn0(r(x)).
It is also shown that under appropriate linear transformation some anisotropic problems may become suitable for
the FMM. A similar idea had already been exposed in [97]. However, these remarks and advances are still of
limited applicability for general control systems.
The same problem affects FSM developed specifically for the Eikonal equation. Extensions of the local update
scheme were developed in order to handle general convex (e.g., [78] and [57]) and nonconvex Hamiltonians
(e.g., [26]). Nonconvex Hamiltonians may arise, for instance, in differential games. However, in general, in the
anisotropic case the FSM does not offer the same kind of overwhelming results it provides for Eikonal equation.
In [88] the authors present convergence proofs for a new method (presented earlier in [86]) that handles a larger
class of anisotropic problems. Two versions of the new method are described (semi-Lagrangian and fully Eulerian)
and they are designated “Ordered Upwind Methods” (OUM). This method is still classified as single-pass, in the
sense that once a node value is accepted as final, it is never computed again; however, as opposed to the classical
15
FMM, a node may be evaluated more times than the number of adjacent nodes. The limiting assumption of this
methods is that A (x) must be a bounded compact set with the origin in its interior, for every x in the continuous
state space. Moreover, the performance of the OUM degrades as A (x) deviates from an hypersphere centred at the
origin, i.e., as ϒ = maxA (x)minA (x) (coefficient of anisotropy) deviates from one. This can be understood by the following
analysis of the method. The OUM define the concept of accepted front which is the set of points that define the
boundary between the set of accepted nodes (the nodes with their final value computed) and the set of considered
nodes (the neighbours of the accepted nodes). When computing the value of a node x, instead of using only the
direct neighbours (as in the FMM), all nodes from the accepted front at certain distance of x are considered. The
accepted front is a simplicial surface with vertices composed by all accepted nodes with non-accepted neighbours.
In order to assure the “ordered” acceptance of nodes (i.e., monotone, in the sense of their values) that distance
must be proportional to the coefficient of anisotropy. Hence the degradation of the algorithm as ϒ grows. The
authors describe the application of the method on two-dimensional minimum time problems. However, extension
to higher dimensions seems a difficult task. The main challenge would be the efficient implementation of the
upwinding update formula. This formula must find the trajectory with minimal cost among a set of trajectories
starting from the current point and finishing in the accepted front; as the dimension grows, it will become harder
to find the intersection of the trajectory with the (n-1)-simplices (i.e., simplices with n-1 vertices) of the accepted
front. The OUM have O(ϒ(n−1)N log(N)) computational complexity.
In [100], the author extends OUM in order to handle time-dependent problems without adding an extra dimen-
sion to accommodate the time variable. The author consider both cases: solution when starting from the boundary
(easier, because we have the reference t = 0); and solution when going to boundary (harder, because the time to
reach the boundary is not known a priori).
In [31], the author presents a new method with connections to FMM. The proposed method, the Buffered
Fast Marching Method (BFMM), can handle very general systems dynamics, of the form x(t) = f (x(t),a(t),b(t))with f continuous in both variables, f Lipschitz continuous with respect to x uniformly in a. This method can
handle target problems with a single control input and also target problems in the context of zero-sum two player
differential games. Like in the OUM, the value of the considered nodes may have to be recomputed more times
than the number of adjacent nodes. The method considers a set of intermediate nodes (the buffer between the
accepted set and the narrow band). At each iteration, the node with minimum value in the narrow band is moved
to the buffer (instead of being immediately accepted as having the final value) and its neighbours are sent to the
narrow band (considered nodes). Periodically, the algorithm uses the iterative algorithm from [30] to compute the
value of each node in the buffer, taking the narrow band as temporary boundary condition. That computation is
made twice, but on each computation a different constant value is assigned temporarily to the nodes in the narrow
band (the minimum value in the buffer and infinity). If the computed value of a node in the buffer is not affected
by the value of the narrow band then that node is removed from the buffer and marked as accepted. The efficiency
of this method seems to be very application dependent (for instance, for the lunar landing problem - based on
3 different grid sizes - it seems to be O(N2) while for other two dimensional problems it seems to be O(N))and the author warns that without careful tuning the computation times may be higher than in the basic iterative
scheme from [30]. The author compares the performance of the BFMM against a FSM with the SL local update
scheme from [30]. The results show that the performance differences are problem dependent. For instance, the
BFMM is slower on a problem with high coefficient of anisotropy while the FSM is slower on problems where the
characteristic directions take a “complex” form (e.g., finding the optimal path on a region with lots of obstacles).
As a final remark, for anisotropic static boundary value problems, none of these methods seem to provide
better (or much better) results than the narrow band LSM (even taking into account the extra computational cost
of the reinitialization step for the narrow band LSM). However, this is only a conjecture because, to our best
knowledge, there are no published comparisons between these algorithms.
It must also be remarked that, in spite of their limitations, the single pass methods for isotropic problems
(as also the FSM) have several applications. Even in robotics, some applications can be found (e.g., high-level
path-planning).
2.4.4 Applications to hybrid systems
In [21], the authors extend the FMM to solve optimal control problems on some classes of hybrid systems. One
example is the problem of finding the minimum time path to reach any point in a building with several floors
connected by stairs. The authors model each floor as a discrete state. The traversal of a stair is modelled as a
transition between discrete states with a fixed cost and with guard (necessary condition for the transition) defined
16
by the physical position of the stair. The extension of the algorithm is straightforward: for each discrete state,
define a discretization of the respective continuous state; on each iteration, propagate the front and check if any
transition may be triggered; if the transition implies a change of the discrete state (the usual case, i.e., unless it is
a self-transition) propagate the front only if the destination is still unvisited.
In [87], the authors apply OUM to the solution of hybrid optimal control problems. The examples assume
anisotropic dynamics in R2. Otherwise, the handling of the hybrid dynamics is performed as in [21]. We highlight
two examples presented there. In the first example, state jumps with a predefined cost are allowed at certain
positions of the state space. This is analogous to the possibility of taking a bus at certain points instead of going
on foot. In the second example, the system may switch once between dynamics (e.g., walking versus skating) by
incurring a predefined cost.
In [56], the authors propose an algorithm, based on the fast sweeping method, to compute the value function
corresponding to the optimal control of systems with autonomous and controlled transitions between continuous
dynamics (no guards are considered for the controlled transitions). Switching costs may be considered and the
running cost may be a function of both continuous and discrete states. However, the described method requires the
Hamiltonian to be convex. No input constraints are assumed and only systems with one-dimensional unconstrained
input are considered. The input value is artificially limited by considering a quadratic penalization in the cost
function. These assumptions are taken in order to simplify the local minimization of the Hamiltonian. A Lax-
Friedrichs scheme is employed in order to obtain a consistent numerical approximation of the Hamiltonian. The
algorithm is illustrated on a model of a physical system (optimal control of hysteresis in smart actuators). The
system can be modelled by the following four state automaton: It can be observed that this example is very alike
a
b
c
d
?
?
?
?
![x2(t)≥ K]
![x2(t)≥ K]
x(t) = f1(x,u) x(t) = f1(x,u)
![x2(t)< K]
![x2(t)< K]
x(t) = f2(x,u) x(t) = f3(x,u)
Figure 2.1: Hybrid automaton modeling the benchmark system of [56].
to switched systems, in the sense that at any moment the system may switch either between states a and c or b and
d. Therefore, the discrete state is a function of the value of the discrete input (analogous to combinatorial logic
in digital systems). This is different from the decision of taking the bus in the examples above, where the event
of taking the bus triggers an irreversible change to a new discrete state. In the bus example, the discrete state is
actually used as a memory element (analogous to sequential logic in digital systems). The autonomous transitions
follow the logic of regional dynamics, i.e., the dynamics change as a function of the current state; once again, this
is combinatorial logic. Under these assumptions, the authors do not have to explicitly encode information about
the discrete state in the value function.
The application of the basic LSM to optimal control of hybrid systems is not straightforward. In [68], the
authors provide some hints regarding the application of the toolboxLS to such problems. However, that is done
in an ad hoc fashion, as the following comment by the authors summarizes: “Hybrid systems are not directly
supported by the toolbox, because there is no consistent modeling language or notation for general nondetermin-
istic, nonlinear hybrid systems. Until such time as one is available, we hope that the example in section 5 and
those in (edited) [66] make clear that analysis of such systems using the toolbox is still quite feasible even if the
discrete dynamics must coded by hand.” The example consists of a stochastic hybrid model, which will not be
analysed here. The toolbox is used for the reachability analysis of a three mode hybrid system in [66], see Fig. 2.2.
The system switches between states sequentially, with the first transition being controlled, and the second one au-
tonomous. The main idea of the solution approach is to propagate the unsafe or safe sets from discrete state to
discrete state, using the so called reach-avoid operator (“set of initial conditions from which trajectories can reach
one set while avoiding other along the way”). Other example of the same methodology can be found in [96].
17
Figure 2.2: Example from [65].
18
Chapter 3
Synthesis of the Feedback Controller
3.1 Solution of the two mode strategy by dynamic programming
In this section we describe the solution approach for the control strategy described in the introduction. Recall that
this approach involves the solution of three problems: computation of the MRCIS, an infinite horizon problem
and a minimum time problem. We assume time-independent (autonomous) systems.
Let us define VS(t,x) as the time dependent value function associated to the infinite horizon problem. We
assume the considered systems will meet the requirements such that VS(t,x)−(VS(x)−ct)→ 0 as t→−∞. Taking
this into account, our objective will be the computation of VS(x), so that the a static feedback control law may be
defined. This computation can be approached in two ways:
• First, find S ; then, define VS(x) = ∞,∀x 6∈S and compute VS(x),∀x ∈S .
• Define VS(x) = ∞,∀x 6∈ R and compute VS(x) for x ∈ R. As t →−∞, VS(x) will become infinity for all
x 6∈S .
The second approach underlines the idea that prior computation of S is not required, since the MRCIS can
be found as a sub-product of the solution of the infinite horizon problem, i.e., S = x : VS(x) 6= ∞. However, in
practice, those computations will be performed using numerical techniques. In general, numerical methods will
not be able to compute S exactly. The discrete representation of the state space is the main obstacle for this.
If the resulting set, S , is an over-approximation of S , then S cannot be used as the target when the system is
outside S . If S was used as target, the resulting trajectories could bounce in the band corresponding to S \Sand never go into S . In order to deal with this issue, we define an additional problem:
Problem 4. Find a suitable under-approximation T of the MRCIS S .
By suitable we mean that T should be as close to S as possible while ensuring that the discrete implementa-
tion of fS(x) is able make the system’s state remain inside S for all trajectories departing from T . On the other
hand, Problem 1 is recast as following:
Problem 5. Find a suitable over-approximation S ⊂R of the MRCIS.
With that in mind, T will be target for the minimum time problem. We define Vmt(x) as the value function
associated to this problem.
Given the value functions, the respective optimal control is found using (2.19): fmt(x) is derived from Vmt(x)and fS(x) is derived from VS(x). Finally, the following state feedback control strategy is defined:
f (q,x) =
fmt(x),q = q0
fS(x),q = q1(3.1)
with the discrete dynamics defined as
q′ = ζ (q,x)≡
q0,x /∈S
q0,q = q0∧ x /∈T
q1,q = q0∧ x ∈T
q1,q = q1∧ x ∈S
(3.2)
19
Intuitively, the two modes of operation, q0 and q1, correspond to “travelling into T (in minimal time)” and
“staying inside S ”, respectively. The control system selects the appropriate feedback control law (either fmt(x)or fS(x)) for each mode of operation.
3.2 Numerical algorithm
The dynamic programming community has been extensively studying the target, the finite horizon and the infinite
horizon problems. For time-independent systems, the target problem is cast as the computation of a static (time-
independent) value function in a straightforward way; the classical minimum time and pursuer-evader problems
fall on this category.
The finite horizon problem can be solved by computing a time-dependent value function for an initial value
problem. In fact, the problem is solved backward in time. Thus the procedure begins with the terminal condition
V (0,x) = Ψ(x), where Ψ(x) is the terminal cost, and the value function is computed for t ∈ [−T,0), where T is
the horizon.
Under very general assumptions, the solution of the discounted (vanishing running cost) infinite horizon prob-
lem is known to converge to a static value function. However, in the context of this work and due to the envisaged
applications, our interest lies on the undiscounted version (non-vanishing running cost) of the constrained infinite
horizon problems. In general, theoretical results for this problem assume some form of compactness of the state
space (see [12]). Since we are considering operation inside the MRCIS, our formulation meets this assumption.
The approximate solution for this problem will be computed as in the finite horizon case, by taking a sufficiently
large horizon.
At the time this research was initiated, the only publicly available numerical solver for problems of arbitrary
dimension we were aware of was the ToolboxLS [67]. Preliminary experiments were performed with this tool [90],
assuming infinite sampling rate. However, no satisfactory results were found for the solution of the infinite horizon
problem. Two approaches were used for the infinite horizon problem. The first approach is based on the LSM but
it requires the consideration of a new state variable, in order to cast time as a space variable. The second approach
consists of using the toolbox as an ordinary finite difference scheme for time-dependent HJI PDE. The former
approach leads to increased processing and memory requirements, due to the additional dimension; the later, lead
to very diffusive results due to the Lax-Friedrichs scheme employed by the ToolboxLS.
The numerical experiments of this research are computed using a numerical solver implemented by the author.
This solver is based on the semi-Lagrangian scheme developed by Maurizio Falcone and co-authors (see [30]
and references therein). As stated before, this scheme implements the discrete-time DPP. This is suitable for our
approach to data-sampled systems.
We implemented a dimension independent version of this scheme using the C++ programming language. The
current implementation uses a regular grid for space discretization. Let us denote by Ω the set of grid nodes
covering the desired region of the state space X .
The user must define f (x,a,b), L(x,a), the grid parameters, the time-step and the initial or boundary condi-
tions. The boundary elements are marked in a grid with the same structure as the one used to store the value
function.
In order to perform the local optimization of (2.10), the user must provide a discrete representation of the input
space (e.g., a set of uniformly spaced values). In fact, for many practical applications, the input space is a discrete
one. For instance, computer controlled systems have digital interfaces with the actuators. Therefore this approach
is suitable from the application point of view.
Except for periodic state variables (such as angular position, for instance), there is no simple way to compute
V (y∆(x,∆,a)) if y∆(x,∆,a) lies outside the defined computational space. In general, extrapolation schemes will
introduce large errors. The safer approach consists of defining a sufficiently large computational space such that
its boundaries do not affect the region of interest. In our implementation, the algorithm assumes that the system
must respect x∈X . If the underlying problem does not naturally lead to the definition of such state constraint, the
computational space must be enlarged such that the effect of these artificial constraints do not affect the solution
in the region of interest. Additional state constraints can be defined by setting the value of the forbidden nodes to
a special constant defined as infinity.
20
3.3 The semi-Lagrangian scheme
The solver has two slight different versions, directed to the target problem and for the finite/infinite horizon
problems, respectively. However, the kernel of the algorithm is the same for both versions. The central procedure
consists of the application of the DPP (2.10) to every grid node. This procedure returns an approximation of
the value function and the corresponding optimal input at every grid node. For each considered input value, the
procedure computes
∫ ∆0 L(y∆(x,τ ,a),a)dτ +V (y∆(x,∆,a))
and then chooses the minimum. This entails two
main sub-procedures:
1. The numerical solution of the augmented set or ordinary differential equations (x, c) = ( f (x,a),L(x,a)) in
order to obtain y∆(x,∆,a) and the cost to go from x to y∆(x,∆,a);
2. Interpolation of V (x) in order to compute V (y∆(x,∆,a)) from the values at the neighbouring nodes. The
following interpolation schemes were implemented: multilinear and second and third order ENO/WENO
schemes.
In what concerns the solution of the ordinary differential equations (ODE) for the computation of y∆(x(t),∆,a,b),the current implementation provides the Euler scheme, a fixed step fourth order Runge-Kutta scheme and allows
the choice of any of the variable step ODE solvers from the GNU Scientific Library (GSL) ( [48]). As the time step
becomes larger, the variable time step ODE solvers become the only acceptable choice; however, the computation
time also increases and it may become the main contribution for the total CPU time.
As pointed out in [11], it is possible to compute these trajectories just once and to store every y∆(x(t),∆,a,b)
and the respective∫ ∆
0 L(y∆(x,τ ,a),a)dτ in the main memory, thus avoiding repeating the computation in every
iteration. In fact, it is even more efficient to store y∆(x(t),∆,a,b) in terms of barycentric coordinates with respect
to the nodes used for interpolation. However, either approach may require a prohibitive amount of memory: for a
grid with N nodes, a system with M possible input values and n state variables, the extra required memory will be
the size of the floating point data type (e.g., 8 bytes) multiplied either by NM(n+1), if only every y∆(x(t),∆,a,b)is stored, or by NM2n, if the barycentric coordinates for the multilinear scheme are stored.
The numerical solution for the target problem is obtained as described in [30] and other related publications.
The algorithm uses Gauss-Seidel iterations, i.e., the computation always uses the most recent values as opposed
to relying only on the values from the previous iteration. More specifically, the nodes are visited as in the Fast
Sweeping Method. The main challenge for the target problem, in what concerns numerical methods, resides in
the fact that the respective value function may be discontinuous even for the unconstrained case. In general, even
with ENO/WENO interpolation schemes, the SL scheme will be diffusive near the discontinuities. This will be a
topic of future work. In what follows, we discuss the constrained finite and infinite horizon problems with more
detail.
3.4 Constrained finite and infinite horizon problems
The finite and infinite horizon problems are associated to time-dependent value functions V (t,x). The computation
of V (t,x) starts from the terminal condition V (0,x). Each posterior iteration corresponds to a progression back-
ward in time. For a time-step ∆, the result of the first iteration, V1, will be the approximation of V (−∆,x), the result
of the second iteration, V2, will be the approximation of V (−2∆,x) and so on. In practice, we are only interested
in the final iteration. For the infinite horizon case, we assume that for a sufficient large number of iterations the
value function will converge to V (t,x) =V (x)− ct, where c is a constant. Therefore, for a basic implementation,
only two grids must be kept: one holding the result from the previous iteration (for the first iteration, this will
be V (0,x)) and other for the result of the current iteration. The results of iteration i are based exclusively on the
results of iteration i−1 (Jacobi iterations).
The Jacobi iterations facilitate the parallel execution of the algorithm. At the present, parallelism is explored
by creating a partition of the computational space and assigning each subset of the partition to a different proces-
sor. Notice, however, that each processor may have to access data outside its own subset of nodes, namely near the
boundaries. A multi-thread version was implemented in order to take advantage of the now common multiproces-
sor systems with shared memory (e.g., the typical “dual core” personal computers fall on this classification). The
Jacobi iterations eliminates potential data access conflicts between the threads of computation: on each iteration
i, the main function creates a new buffer to store Vi, starts the threads and waits their termination; in order to
compute Vi every thread accesses only the results obtained at iteration i− 1, without changing, it and stores the
21
results on the new buffer (Vi). Since there is a partition of the computational space, each node of Vi can only be
written by a single thread. We remark that, except for eventual bottlenecks on the memory bus, this scheme scales
linearly with the number of processors.
For the unconstrained case with smooth dynamics and running cost, the value function is continuous. However,
the state constraints introduce discontinuities in the value function. By definition, V (t,x) =∞ for x 6∈X. Moreover,
the introduction of state-constraints leads to the consideration of the maximal controlled invariant set (MCIS).
Depending on the system dynamics, state constraints and input constraints, some of the trajectories beginning on
X cannot be kept inside it for all times. Therefore, the cost associated to the respective initial state is defined
as infinity. This is well captured by the infinite horizon problem. In this sense, the MCIS is defined as x :
limt→−∞ V (t,x) 6= ∞.
If one or more vertexes of the cell containing y∆(x,∆,a) are defined as infinity, special care has to be taken
when performing the interpolation. One possibility is to define V (t,x) = ∞ whenever one or more vertexes of the
cell containing y∆(x,∆,a) are defined as infinity. This is a conservative approximation. Only trajectories passing
through well defined cells (i.e., not containing vertexes defined as infinity) will be accepted. However, for certain
systems and grid resolutions, this may lead to useless solutions. The forbidden nodes will tend to absorb nodes
corresponding to eventually valid trajectories. Moreover, the absorption of nodes will have a cascading effect, i.e.,
nodes marked as forbidden will tend to absorb in their neighbourhood. This will happen if the reach set from node
x, over the defined time-step, does not intersect a well defined cell. This effect is more pronounced as the Courant
number decreases. After some iterations, the computed value function may be finite only on a small region or
even infinite everywhere. Therefore, this method will produce a sub-approximation of the MCIS. Whenever the
intersection of the reach set from x with the MCIS is a very narrow region (less than one grid cell), the method
will disregard x and the result will be a sub-approximation of the exact solution. As explained, this will have
a cascading effect. Moreover, some of the resulting trajectories will be sub-optimal in order to escape from the
absorbing grid cells.
On the other hand, we may relax the definition of infinity and, instead, consider a very large value Kinf when
performing the interpolation. This will produce an over-approximation of the MCIS. The cost near the boundary
of the MCIS will be higher than the optimal. This is because the trajectories will try to avoid the cells with
the artificially high costs. The extent of this diffusive effect (sometimes informally designated as smearing is
analogous to the absorption effect of the previous approach. It will also depend on the reach set from each
node and on the grid resolution. In general, the higher the grid resolution, the smaller the region affected by the
smearing. Analogously to the previous case, the region affected by the smearing will produce sub-optimal results.
This happens because the trajectory tries to escape at all costs from the region with the artificially higher value
introduced by the smearing effect.
This method implicitly retains all the feasible trajectories but, on the other hand, it may exhibit ”false pos-
itives”. In this context, a false positive is a state x marked as finite (i.e., Vi(x) < Kinf) for which no feasible
trajectory can be generated by the resulting control law. This means that a false positive can either correspond to
a state outside the exact MCIS or to a state belonging to the MCIS for which the synthesized controller cannot
produce a feasible trajectory. If the system is known to have a nonempty equilibrium set Se or a nonempty set
Ss of cells free of smearing effects with finite values at their vertexes, these false positives can be identified by
exhaustive simulation of the system trajectories. The trajectories are simulated until they either reach Ss or Se.
However, this is a time consuming procedure. Moreover, for nonzero ergodic cost and for a sufficiently high num-
ber of iterations, the finite values of V (t,x) might start approaching K∞. This may lead to very inaccurate results
(affecting the decision for both inputs) and to slow convergence. In order to avoid that the algorithm computes the
approximation of V (x) =V (t,x)− ct in the following way:
1. Start with i = 0 and no target set. ∀x set V0(x) = 0 and fi(x) = 0.
2. Apply the DPP on each grid node, in order to obtain V ′i+1(x) and fi+1(x).
3. Set Vi+1 =V ′i+1−minV ′i+1.
4. If ‖ fi+1(x)− fi(x)‖p < δtol,p, f and ‖Vi+1(x)−Vi(x)‖p < δtol,p,v, where δtol,p, f and δtol,p,v are given thresh-
olds, stop. Otherwise, set i = i+1 and go to step 2.
We consider the norms 0 (number of nonzero components), 1 and infinity for the stopping condition. Ideally
we would like ‖ fi+1(x)− fi(x)‖p to become zero and ‖Vi+1(x)−Vi(x)‖p to converge to the machine precision.
22
However, in most cases, that may require an unreasonable amount of time. Additionally, due to the space dis-
cretization, the final solution may not be the exact solution even if those norms go to zero. Therefore, in practice,
it is acceptable to define more relaxed thresholds.
3.5 Numerical experiments
3.5.1 Continuous-time LQR: unconstrained case
Consider the infinite horizon optimal control of the continuous time linear system (state x ∈ R2 and input u ∈ R)
x(t) =
[
0 1
0 −1
]
x(t)+
[
0
1
]
u(t) (3.3)
with cost functional
minJ(x,u) =
∫ ∞
0(xT (τ)x(τ)+u2(τ))dτ (3.4)
for any given x(0) = x0. It is well known that the optimal control for this problem is given by a time-invariant linear
state feedback control law, i.e., u(t) = −Kx(t). The associated value function, V (x0) = minJ(x,u) : x(0) = x0,can also be derived in closed form. For the current example, we have
V (x) = xT
[
2 1
1 1
]
x (3.5)
and K =[
1 1]
.
3.5.2 State and input constraints
In what follows, we consider the state constraint x(t) ∈ X, where X = x ∈ R2 : ‖x‖∞ ≤ 10, and the input
constraint |u(t)| ≤ umax. The input space is composed of 801 equally spaced values between −umax and umax. The
numerical computations used the Euler scheme and multilinear interpolation.
It is possible to derive an analytical description of the MRCIS for this problem. Analysis of the extremal
trajectories shows that the minimum distance to the boundary of X along the x1 dimension, in the first and third
quadrants of the x1− x2 phase plane, is given by
|x2|−umax ln
(
umax + |x2|
umax
)
. (3.6)
The boundary of the boundary set is represented on Fig. 3.1(a). Fig. 3.1(b) presents a detail of the boundary of
MCIS for different values of umax. For the unconstrained case, the optimal control will never exceed u = 20 on the
considered domain. Let us define ΩLQR as the maximal invariant set, with respect to X, of the closed loop system
composed of (3.3) with feedback control u=−Kx. We compute this particular set by integration of the trajectories
emanating from each grid node on X and elimination of those nodes corresponding to trajectories going through
R2 \X. When designing for the constrained case, the controller will use all the available range of controls in order
to avoid the forbidden domain. As expected, the MCIS for umax = 20 is a superset of ΩLQR. This means that the
absolute value of the control input will be higher than |−Kx| when going through Ωumax=20 \ΩLQR; for x ∈ΩLQR
the optimal control will coincide.
The main challenge of this problem, in terms of numerical computation, resides in the fact that the optimal
trajectories are tangent to the boundary of the MCIS in the curved segments depicted on figure 3.1(a) and, with
more detail, on 3.1(b). Therefore, the interpolation procedure uses points near the discontinuity. As expected, this
introduces some smearing. This smearing leads to inaccurate computation of the optimal control near the boundary
of the MCIS. This effect can be attenuated by using a finer grid on these regions, as illustrated on Fig. 3.2. Fig. 3.2
shows that, at the boundary of the MCIS, the absolute value of the computed control is higher than the optimal as
the resolution decreases. Fig. 3.3 shows the trajectories obtained by integration of the system equations with input
u(t) given by the numerical dynamic programming control law.
23
x1
x 2
-10 -5 0 5 10-10
-5
0
5
10
(a) umax = 40.
x1
x 2
-10 -9 -8 -7 -6-10
-8
-6
-4
-2
0
(b) From left to right: |u(t)| ≤ 40 (the same as
Fig. 3.1(a)); |u(t)| ≤ 20; using the unconstrained LQR;
|u(t)| ≤ 10.
Figure 3.1: Boundary of the MCIS for different scenarios.
-10-8
-6-4 -10
-50-10
0
10
20
30
x2x
1
Figure 3.2: Difference between the optimal control derived from the 201×201 grid and the one derived from the
401× 401 grid. The lower resolution leads to a sub-optimal higher actuation near the boundary of the invariant
set.
0 2 4 6 8 10-10
0
10
20
30
40
(a) 201×201 grid
0 2 4 6 8 10-10
0
10
20
30
40
(b) 401×401 grid
Figure 3.3: Evolution of x1(t) (dash-dot line) and x2(t) (dashed line) with u(t) (solid line) given by the numerical
DP control law for the LQR problem.
24
3.5.3 Sampled data system
As discussed previously, computer controlled systems are subject to finite control rates. Typically, a sample and
hold control scheme is employed. For linear systems, there are systematic methodologies to obtain an exact
discrete-time equivalent model for the composition of the original continuous-time model with the sample and
hold scheme. However, the same is not true for nonlinear systems. In the following example we obtain the
optimal control for the discrete-time system without using the actual discrete-time model. All computations are
based on the continuous-time model and the time step of the SL scheme is set to the value of the control period.
Let us consider again the example of the LQR problem. It is well known that, if a sample and hold control
scheme is used, the optimal gain K changes according to the control rate. The discrete-time version of the LQR
can be computed on Matlab using the following command:
lqr(c2d(ss([0 1; 0 -1], [0; 1], [1 1], 0), ts), [1 0; 0 1], 1)
where ts is the control period. Table 3.1 shows the optimal K for different control rates. We remark that using
the gains derived for the continuous-time LQR in the time sampled system may lead to an unstable closed loop
system (in this example, instability would occur for 1/∆ < 0.5 Hz).
Table 3.1: Optimal K for different control rates
Control rate (Hz) K
0.2 (0.2193, 0.2183)
2 (0.7827, 0.7767)
20 (0.9753, 0.9753)
200 (0.9975, 0.9975)
The value function and the corresponding optimal control map were computed for each of the control rates
indicated on table 3.1. The computational domain was [−10,10]× [−10,10] and the grid resolution was 301×101;
the inputs were chosen from a set of 151 uniformly spaced values in [−20,20]. The differential equations were
solved using the Runge-Kutta-Fehlberg method from the GSL, with absolute and relative tolerances of 10−12. All
interpolations were performed with the multilinear scheme.
The resulting numerical control map fc(x) is very similar to the one given by the analytical solution. This
comparison was based on the numerical approximation of the gradient of fc(x). A finite difference method was
used for that purpose. We observed that the numerical solution is somewhat noisy in the sense that the numerical
gradient is not constant when computed between adjacent grid nodes. If the numerical gradient is computed using
more distant nodes (e.g., 20 nodes apart), the numerical results become similar to those presented on table 3.1.
Those observations are also supported by the simulation of the trajectories of the closed loop system. The trajecto-
ries are computed by applying the piecewise constant optimal feedback control for the considered control period.
The trajectories obtained with the numerical control law are very similar to those obtained by the exact discrete
time LQR (see Fig 3.4). As expected, the simulations show degradation of the system performance when the gains
of the continuous time LQR are employed.
3.5.4 Minimum time
We consider again system (3.3) but with a different control objective and input constraints:
minJ(T,x,u) =∫ T
0dτ (3.7)
such that x(0) = x0, x(T ) = 0, x(t) ∈ X (as defined before) and |u(t)|< 10.
As is well known, the optimal control law for this problem is of bang-bang type. The switching surface,
corresponding to the states where the control action should change sign, is defined as
x1 =−x2 +umax ln
(
x2
umax+1
)
,x2 ≥ 0 (3.8)
x1 =−x2−umax ln
(
−x2
umax+1
)
,x2 ≤ 0 (3.9)
25
0 1 2 3 4 5-2.5
-2
-1.5
-1
-0.5
0
0.5
x1
x 2
(a) Trajectory on the x1− x2 plane
1 2 3 4 5 6
42
44
46
48
50
52
t
∫ 0t L(x
(τ),
u(τ)
)dτ
(b) Accumulated cost
Figure 3.4: Numerical solution for ∆ = 0.5 s (dashed thick line) vs exact discrete-time LQR solution (thin line) vs
solution using the gains from the continuous-time LQR problem (dashed thin line).
The value function on the left side of the switching surface is defined as:
V (x) =−x1 + x2
umax+2ln
(√
(
x2−umax
umax
)
e
(
x1+x2umax
)
+1+1
)
(3.10)
The value function on the right side of the switching surface is defined as:
V (x) =x1 + x2
umax+2ln
(√
(
−x2−umax
umax
)
e
(
−x1+x2umax
)
+1+1
)
(3.11)
Figure 3.5 shows several trajectories using the controller derived from the numerical value function on a
201×201 grid. We observe that the trajectories consistently start to “bend” several grid nodes before hitting the
switching surface. This means that the control switch is made before the optimal instant. Once again, this is
caused by smearing near a non-smooth region, the switching surface, in this case. As described for the boundary
of the invariant set (figure 3.2), this effect is minimized when we define finer grid on the region containing the
switching surface, i.e., the numerical switching surface converges monotonically to the exact switching surface as
the grid is refined.
x1
x 2
-10 -5 0 5 10-10
-5
0
5
10
Figure 3.5: The thin lines represent the system trajectories from different initial states using the control law
derived from the numerical solution on a 201×201 grid. It can be seen that the control law produces a change in
the actuation before reaching the ideal switching surface (thick line).
The computations were performed on a grid of 801×401 nodes and with time-step ∆= 0.02. The computation
of the value function, using FSM-like iterations and assuming the bang-bang control (only two possible actuation
values), took 17 seconds to converge to a maximum change between iterations of 10−9 on an Intel T7250 CPU
based computer. If we assume no knowledge about the optimal control signal, we must consider a larger number
of actuation values. For 101 equally spaced actuation values, the computation time increased to 320 seconds when
26
using value-iteration and 170 seconds when using policy-iteration. Note that the increase in computation time is
not proportional to the number of input values.
For the simulation of the system trajectories, we also allowed the control law to take intermediate values (see
figure 3.6). Due to the linear interpolation, the DP controller actually chooses intermediate control values near the
discontinuity of the switching surface.
As expected, the numerical control law produces some chattering when the trajectory is near the switching
surface. Since the control value is switched before time, the trajectory tends to deviate from the neighbourhood of
the switching surface; when that deviation becomes too large, with respect to the the accuracy and resolution of
the numerical value function, the controller takes a corrective action to bring the trajectory nearer the switching
surface.
0 0.5 1 1.5 2 2.5-10
-5
0
5
10
0 0.5 1 1.5 2 2.5-10
-5
0
5
10
Figure 3.6: Evolution of x1(t) (dash-dot line) and x2(t) (dashed line) with u(t) (solid line) given by the numerical
DP control law for the optimal time problem on a 801×401 grid with ∆ = 0.02. On the left figure, the controller
is only allowed to use |u(t)|= 10 (bang-bang control action); On the right figure, the controller is allowed to use
|u(t)| ≤ 10.
3.6 Semi-Lagrangian receding horizon scheme
3.6.1 The algorithm
In order to overcome the limitations of the approaches described in the previous section, we propose an algorithm
(SLRH) based on concepts of receding horizon control. Recall that, for each x, the basic semi-Lagrangian scheme
computes the corresponding optimal trajectory over one time-step. The key idea of our algorithm is to compute
the optimal trajectory for an horizon of N∆ > 1 time-steps when the trajectory does not end in a well defined cell.
By optimizing over a larger horizon, the endpoint of the trajectory will have more chances of reaching a well
defined cell. Let us define Vi,N∆≡ Vi,Vi−1, . . . ,Vi−N∆+1, i≥N∆−1 and Vi,N∆
≡ Vi,Vi−1, . . . ,V0, i < N∆−1. For
N∆ = 1, this algorithm is equivalent to the basic SL scheme with the strict definition of infinity.
At the present, we only consider the control input (single player). The case of competing inputs (zero-sum two
player differential game) will be a topic a future work.
As for the basic scheme, for solving the optimal control problem for a time horizon T , the algorithm starts with
V0(x) = Ψ(x),∀x ∈ Ω. It then proceeds iteratively, calling Algorithm 1 for i = 1, . . . ,NT where NT is the smaller
integer greater than T/∆. The main algorithm is based on the procedure ComputeVx; this procedure is described
by Algorithm 2. Ku is a constant that marks the node’s value as undefined. All the cells to which that node belongs
will be identified as not well defined.
With a sufficiently large N∆, this algorithm would be able to avoid any distortion (i.e., sub-optimality) intro-
duced by the discontinuity at the boundary of the MCIS. However, the processing time and storage requirements
grow exponentially with N∆. In order to further minimize these requirements, we consider a reduced input space
Ua,i for the larger horizons. For many problems, one sensible approach is to consider only extremal controls when
optimizing for more than one time-step (e.g., Ua,1 = Ua and Ua,i = min(Ua),max(Ua), i > 1). Additionally,
as soon as one valid cell is found, the procedure stops extending the optimization horizon. This accelerates the
computation but, on the other hand, it may prevent the evaluation of better trajectories. In general, these two
optimizations will lead to sub-optimality near the boundary.
27
Input: Ω, i, Vi−1,N∆, N∆
Output: Vi,N∆
foreach x ∈ x : x ∈Ω∧Vi−1(x) 6= ∞ do
Vi(x),defined← ComputeVx(x,Vi−1,N∆, N∆);
if defined = f alse then
if i≥ N∆ then
j← i−1;
while j > i−N∆∧ j > 0 do
Vj(x) = ∞;
j← j−1;
end
else
Vi(x)← Ku;
end
end
end
Vi,N∆←Vi∪Vi−1,N∆
\Vi−N∆
Algorithm 1: Compute Vi using the semi-Lagrangian receding horizon scheme.
Input: x, Vi−1,N∆, N∆
Output: Vi(x), defined
Sconsidered←x;Vi(x)← ∞;
level← 1;
finished← false;
defined← false;
repeatSnew
considered← /0
foreach x ∈ Sconsidered do
foreach a ∈Ua,level do
Calculate y∆(x,∆,a);
c←∫ ∆
0 L(y∆(x,τ ,a),a)dτ;
v←Vi−level(y∆(x,∆,a));if v is well defined then
if Vi(x)> c+ v then
Vi(x)← c+ v;
defined← true;
end
else
Snewconsidered← Snew
considered∪y∆(x,∆,a);end
end
end
if level = N∆∨ i− level = 0∨defined thenfinished← true
else
Sconsidered← Snewconsidered;
if Sconsidered = /0 thenfinished← true
end
end
until finished;
.Algorithm 2: Compute Vi(x) using up to N∆ time-steps
28
The memory requirements for this implementation may seem higher than for the basic implementation. In
fact, the basic implementation only requires the storage of two sets of values: the values compute at each grid
node at the current iteration, Vi, and the values from the previous iteration Vi−1. The proposed algorithm requires
the storage of N + 1 sets of values. However, by careful tuning of the maximum optimization horizon, N∆, and
Ua,i1, the grid resolution can be lower than in the basic semi-Lagrangian scheme while still providing a better
sub-approximation of the MCIS. Nevertheless, for the real-time implementation of the control law, the worst case
optimization time should be kept within the processing capabilities. Therefore, the value of N∆ must be chosen
with that in mind.
3.6.2 Numerical example
Consider the following system:
x =
sin(x2)a
(3.12)
with x =[
x1 x2
]′and input a ∈Ua, where Ua is a set of 31 equally spaced values from −0.25 to 0.25. We define
Ua,i = −0.25,0.25, i > 1. The system is constrained to [−2,2]× [−π/2,π/2]. The grid resolution is 201×201.
All the experiments are performed with ∆ = 0.1.
Fig 3.7 shows the contour of the approximated MCIS for different values of N∆. For N∆ = 1 a significant
contraction of the MCIS is observed. The results improve significantly even with small values of N∆. However,
we remark that this is highly dependent on the Courant number. A change in the grid resolution or time step leads
to different results. In this case, the classical SL (using the relaxed definition of infinity) produces a noticeably
rough over-approximation of the MCIS.
-2 -1 0 1 2-1.5
-1
-0.5
0
0.5
1
1.5
x1
x 2
Figure 3.7: Computation of the MCIS. The thicker line corresponds to the exact MCIS. The inner contours cor-
respond to the results of the SLRH with different N∆: 1, 2 and 4 (from the inner to the outer contour). The outer
contour corresponds to the over-approximation obtained with the classical SL scheme.
Table 3.2 shows the number of nodes in the approximation of the MCIS for different values of N∆. It also
shows the CPU time for 68 iterations. This is the number of iterations required by the algorithm to converge to
the approximation of the MCIS for N∆ = 1. For N∆ ≥ 40, the computation times do not differ significantly. This
is because the extra computation time near the boundary is negligible when compared to the computation of the
remaining nodes. Moreover, the number of nodes for which the entire optimization horizon N∆∆ must be used
gets smaller as N∆ increases.
29
Table 3.2: Approximation of the MCIS with the SLRH algorithm
N∆ Number of nodes CPU Time (s)
1 17197 32
2 23351 38
3 23715 39
4 24303 39
10 25726 41
20 26035 42
40 26108 44
50 26120 44
60 26121 44
70 26121 45
30
Chapter 4
A Motivating Application
In this sections we describe the application of the dynamic programming based two mode control approach to
the motion control of an Autonomous Underwater Vehicle (AUV) in two different settings: path following and
trajectory tracking.
The developments are targeted at small sized AUV (Fig. 4.1) developed at the Underwater Systems and Tech-
nologies Laboratory from Porto University. These AUV have a torpedo shaped hull about 1.6 m long, with a
diameter of 20 cm and weighting about 35 kg in air. Their maximum forward speed is 2 m/s.
The control design is based on simplified models. The simplification consists of assuming slightly coupled
motions and modelling some dynamical terms as disturbances. In order to clarify the assumptions and simplifica-
tions, we start by presenting a brief description of a full six degree of freedom model for AUV.
4.1 System model
4.1.1 Six degree of freedom model
The motion model for autonomous underwater vehicles is best described by a nonlinear model. In what follows,
we follow. with minor variations, the formulation of [47] and the notation from the Society of Naval Architects and
Marine Engineers (SNAME) [61]. In order to define the model, two coordinate frames are considered: body-fixed
and earth-fixed. The motions in the body-fixed frame are described by 6 velocity components ν = [νT1 ,ν
T2 ]
T =[u,v,w, p,q,r]T respectively, surge, sway, heave, roll, pitch, and yaw, relative to a constant velocity coordinate
frame moving with the ocean current. The six components of position (x, y, z, typically in the north, east, down
convention) and attitude (roll, pitch, yaw) in the earth-fixed frame are η = [ηT1 ,η
T2 ]
T = [x,y,z,φ ,θ ,ψ]T . The
earth-fixed reference frame can be considered inertial for the AUV.
Assuming, the velocities in both reference frames are related through the Euler angle transformation
η = J(η2)ν + c (4.1)
Figure 4.1: NAUV autonomous underwater vehicle from University of Porto, waiting to be deployed.
31
where c is the velocity of slow varying currents and
J(η2) =
[
J1(η2) 0
0 J2(η2)
]
J1(η2) =
cψcθ (cψsθsφ − sψcφ) (sψsφ + cψcφsθ)sψcθ (cψcφ + sφsθsψ) (sθsψcφ − cψsφ)−sθ cθsφ cθcφ
J2(η2) =
1 sφ tanθ cφ tanθ0 cφ −sφ
0sφcθ
cφcθ
The equations of motion are composed of the standard terms for the motion of an ideal rigid body and, ad-
ditionally, the terms due to hydrodynamic forces and moments. We skip the details regarding the hydrodynamic
terms; they can be found in [47]. The main relevant aspect is that most of those terms are hard to identify. The
identification requires either expensive in-water tests or very specialized software packages. Heuristic formulas
are available in the literature but, in practice, they lead to significant modelling errors.
The dynamics of the AUV are described in the body-fixed frame by the following nonlinear equations of
motion:
Mν +C(ν)ν +D(ν)ν +L(ν)ν +g(η) = τ (4.2)
where M is the constant inertia and added mass matrix (hydrodynamic effect) of the vehicle, C(ν) is the Coriolis
and centripetal matrix, D(ν) is the damping matrix, L(ν) is the lift matrix, g(η2) is the vector of restoring forces
and moments and τ is the vector of body-fixed forces and moments from the actuators.
The considered AUVs are not fully actuated. They use a propeller for actuation in the longitudinal direction
and 4 tail fins, arranged in a cruciform fashion, for attitude control. The thrust produced by the propeller is
approximately proportional to the square of its angular speed; the steady state surge is practically proportional to
the propeller angular speed. Each fin produces a lift force perpendicular to the fluid and a quadratic damping force
in the direction of the the fluid flow; the lift force is approximately proportional to the fin’s deflection angle with
respect to the fluid flow, for small deflection angles (less than 30 degrees). For instance, the formulas for the top
rudder fin’s force and moment are the following:
Yf = ρCLF S f in(u2δr−uv− x f inur) (4.3)
N f = x f inYf (4.4)
where ρ is the water density, S f in is the fin’s face area, CLF depends essentially on the geometrical aspect of the
fin, x f in is the position of the fin relatively to the origin of the body fixed coordinate frame and δr is the fin’s
deflection angle with respect to the plane generated by the x and z axis of the body fixed coordinate frame.
For safety reasons, the AUVs are in general slightly buoyant. The center of gravity is slightly below the center
of buoyancy, providing a restoring moment in pitch and roll which is useful for these underactuated vehicles. The
restoring term g(η) is described by the following expression:
g(η2) =
(W −B)sinθ−(W −B)cosθ sinφ−(W −B)cosθ cosφ
zGW cosθ sinφzGW sinθ
0
where W is the vehicle’s weight and B is the buoyancy force.
In what concerns parameter identification, the most challenging parameters are those in D(ν), L(ν), the added
mass parameters and those related to the actuation system. In general, the elements of the damping matrix are
defined so that linear and quadratic components arise (e.g., Xu+Xu|u||u| for D11). For an AUV with port/starboard,
top/bottom and fore/aft symmetries, M and D(ν) and L(ν) can be assumed to be diagonal. However, that is not
often the case. The AUVs considered in this work can be assumed to be port/starboard symmetric. On the other
hand, significant asymmetries are found both from the fore to the aft and from top to bottom; this is due to
the actuation system in the tail, the sensors on the nose, the antennas on top and the internal disposition of the
hardware. In [33], we compute the numerical parameters for a model of one of the AUVs from the University of
Porto using basic measurements and empirical formulas from the literature. The procedure can be easily adapted
to AUVs of similar shape.
32
4.1.2 Model simplification
In [52, 53], Healey and co-authors first reported the approach of decoupling AUV control design into loosely
interacting control modes. Since then, this practice has become common (see, e.g., [92] and [1]). Three modes
of control are generally considered: (forward) speed control, steering (or lateral control) and diving (or depth
control); additionally, roll control is sometimes also considered.
Let us consider operation at the horizontal plane (constant depth) with constant speed. Let us start by consid-
ering a reduced model, with state described by (x,y,ψ,v,r); u is assumed to be a nonnegative constant uref and the
remaining variables, (z,φ ,θ ,w, p,q), are assumed to be 0. However, in practice, even with active compensation,
the variables (z,φ ,θ ,u,w, p,q) are not constant; depending on the disturbances and on the reference trajectory,
those states will have more or less pronounced variations. In order to encompass the effect of such variations,
(z,φ ,θ ,u∆ ≡ u− uref,w, p,q) can be considered as bounded disturbances. Moreover, a low cost AUV may not
have sensors for measuring v. Since an accurate model for the v dynamics may not be available, we opt to model v
as an additional bounded disturbance. The z variable has no effect on the considered type of motion; even so, we
are left with a disturbance vector composed of seven inputs (φ ,θ ,u∆,v,w, p,q). In order to simplify the control
design, we consider the following model:
x = ucos(ψ)+ cx (4.5)
y = usin(ψ)+ cy (4.6)
ψ = r+ cψ (4.7)
r = k1r+ k2r|r|+ k3δr + cr (4.8)
where cx, cy, cψ and cr are inputs used to model the effect of environmental disturbances and model uncertainty.
Note that cx and cy include the effect of currents. The term k3δr models the torque produced by the top and bottom
fin of the AUV with respect to the body fixed z axis.
Notice that this simplified model is also applicable for wheeled and aerial vehicles operating on the horizontal
plane.
4.2 Path following problem
Operations with autonomous vehicles often entail the following of prescribed paths such as roads, aerial corridors
or simple concatenations of way-points. The basic idea of path following is to follow some given geometric path
with no specific time parametrization. In its simplest setting, a path following controller must ensure forward
motion along the given path while making the cross-track error converge to zero. For paths with zero curvature,
the cross-track error is the shortest distance between the vehicle and the path. The main characteristic of path
following control relies on its independence with respect to time, as opposed to trajectory tracking problems, where
the desired reference is a function of the time variable. This time independence provides additional freedom which
can be explored to provide a control strategy with increased robustness. The change of variables corresponding
to the formalization of the path following problem may lead either to a system of reduced dimension, while
maintaining the same number of inputs, (as in [54], for instance) or to a system with the same dimension, but with
an additional input corresponding to the free time parametrization (see [2], for instance). After this formalization,
the path following problem can be seen as an ordinary nonlinear (or even linear) control problem; however, due
to its practical applicability, this problem has been targeted as an application for several control approaches in the
more diverse settings (e.g., [99], [72], [3]).
We will consider the case of single vehicle operation on the horizontal plane and the problem of following a
path within a given maximum cross-track error. Additionally, the motion will be optimized with respect to a given
cost function. That means that the objective is to maintain the vehicle inside a neighbourhood of the reference
path; this neighbourhood is sometimes designated as tube. The idea of this control strategy is to enable a more
relaxed control for systems subject to persistent measurement errors and disturbances: the control effort will be
stronger when the system is near the boundary of the tube and more relaxed when it is near the reference path.
For instance, autonomous underwater vehicles have strong limitations in what concerns the estimation of their
exact position. Given that uncertainty, it does not make sense to be very stringent in what concerns following a
path that the vehicle cannot even determine very accurately. Moreover, persistent disturbances may also prevent
stabilization to the reference path, as will be verified later. Finally, in practice, if the control system is implemented
in a computer based platform, limitations due to the finite control rate and discretization of the actuator commands
can make stabilization to a single point (in the Lyapunov sense, e.g., [91]) an impossible task.
33
u
v
T
N
East
North
x
y
d
ψr
ψt
Figure 4.2: Coordinate system (North-East-Down convention).
The path following problem has been studied for wheeled mobile robots (see [80] and [94] for early ap-
proaches), for under-actuated marine vehicles (see [39] and [37] for instance) and aerial vehicles (see [70] for
instance). Some tube based approaches can be found in the literature for the path following problem (see,
e.g., [4], [2]); however, those approaches do not provide straightforward procedures for the definition of the
tube and for tuning the performance of the system when inside the tube. In the context of path-following for ma-
rine vehicles, robustness with respect to model uncertainty was approached in [59]; however, robustness against
disturbances is not handled by those results.
4.2.1 Model
Consider a frame with its origin at the nearest point of the path. We assume there is some scheme to disambiguate
the choice in case of multiple candidates. This frame (the Frenet frame) has a T axis tangent to the path and a N
axis normal to the path. The orientation of the T axis with respect to the earth fixed frame is ψt and we define the
vehicle’s angle relative to the path as ψr = ψ −ψt (see Fig. 4.2). The cross-track error d is the shortest distance
between the vehicle (i.e., the origin of the body fixed frame) and the path. Thus, the cross-track error dynamics
may be described by the following model:
x = f (x,a,b) =
usin(ψr)+ cn
r+ cψr
k1r+ k2r|r|+ k3δr + cr
(4.9)
where x =[
d ψr r]′
, a =[
δr
]
and b =[
cn cψr cr
]′. This model incorporates the following terms: cn =
−cx sin(ψt) + cy cos(ψt), ct = cx cos(ψt) + cy sin(ψt) and cψr = −(ucos(ψr) + ct)κ + cψ where κ is the path’s
curvature at the origin of the Frenet frame (for a straight path κ = 0). The bounds for cn, cy and cψr are defined
accordingly.
Furthermore, in order to encompass the penalization of chattering in the design of the control laws, we consider
another model with δr as an additional state variable. In this case, the reference for the servomotors, δr,d , becomes
the system input; δr,d is modelled as a discrete variable, with the same resolution as the one provided by the actual
target system. We assume first order linear dynamics for the servomotors. Therefore, the system model with
actuator dynamics comes as follows:
x = f (x,a,b) =
usin(ψr)+ cn
r+ cψr
k1r+ k2r|r|+ k3δr + cr
k4(−δr +δr,d)
(4.10)
where x =[
d ψr r δr
]′, a =
[
δr,d
]
and b =[
cn cψr cr
]′.
34
4.2.2 Control design
We consider the two mode control strategy. For normal operation, the control law ( fS(x)) should be derived taking
into account the following state constraints:
• |d(t)| ≤ dmax, where dmax is the maximum acceptable cross-track error.
• |ψr(t)| ≤ ψmax ≡ arccos(ct,max/u); this constraint avoids backward motion along the path.
• |r(t)| ≤ rmax, where rmax is the maximum attainable yaw rate solely under the influence of the control input.
Therefore we have
x(t) ∈R ≡ [−dmax,dmax]× [−ψmax,ψmax]× [−rmax,rmax] (4.11)
Additionally, for some inspection applications (e.g., when using a camera), it is important to minimize the
occurrence of fast changes in the vehicle’s heading. Therefore, we define the following expression for the running
cost:
L(x,a,b) = Kdd2 +Kψr ψ2r +Krr
2 (4.12)
where x =[
d ψr r]′
, a =[
δr
]
and b =[
cv cψr cr
]′
In order to derive the two control laws, two value functions are defined:
• VS(x), associated to the infinite horizon DG defined by (2.16) with t0 = −∞, t f = 0, L(x,a,b) given by
(4.12), system dynamics given by (4.10) and state constraints given by (4.11);
• Vmt(x), associated to a minimal time to reach (MTTR) DG defined by (2.16) with t0 = 0, t f = mint : x(t) ∈T , L(x,a,b) = 1, system dynamics given by (4.9) and target set T .
At this point, the control laws fS(x) and fmt(x) are obtained by applying (2.19) to VS(x) and Vmt(x), respectively.
It can be seen that the actuator dynamics are neglected for the MTTR OCP. Recall that the actuator dynamics
were modelled in order to allow the minimization of chattering. Otherwise, their dynamics do not have a strong
impact in the solution of the OCPs. However, the minimization of chattering conflicts with the minimization of
the time to reach. Therefore, taking into account that the system is not expected to enter q0 mode very often,
we give preference to the later objective. As a side effect, we get a solution more amenable to computational
implementation (one dimension less).
Since the control law will not be defined in analytical form, its domain should be carefully defined. Obviously,
it is not possible to compute or store a numerical control law for an unbounded domain. However, by exploring
the structure of the solution and observing practical aspects of the problem, that problem can be avoided. In this
application, this only applies to fmt(x), since the domain for fS(x), S ∈R, will be a bounded set.
Therefore, let us consider the domain of fmt(x). For the periodic ψr ∈ S1 state variable, only the range [−π,π]
has to be considered. For the r state variable, any measurements exceeding rmax in absolute value will be assumed
as spurious and truncated to sign(r(t))rmax. Frequent occurrences of this truncation will point to abnormal con-
ditions and, ultimately, will trigger safety measures not discussed here (e.g., vehicle surfacing). Thus, the only
remaining unbounded state variable is the cross-track error (state d).
Physical insight tell us that, for absolute values of the cross-track error above a certain threshold, the optimal
action will always be to make the vehicle approach a direction perpendicular to the path. Therefore, it is expected
that the projection of the computed control map on the subspace defined by ψr×r will be constant for all |d|> dπ/2
where dπ/2 is some constant whose value depends on the model parameters. Thus, the optimal control may be
obtained by truncating d(t) to sign(d(t))dπ/2, whenever |d(t)|> dπ/2. This means that the optimal control may be
determined for the unbounded domain using only the values stored on the grid covering [−dπ/2,dπ/2]× [−π,π]×[−rmax,rmax].
The constant dπ/2 will be found by inspection of the control law; however, in order to verify this, the compu-
tational space along the d dimension must, at least, include [−dπ/2,dπ/2]. Therefore, an estimate of dπ/2 must be
known. A rough estimate for dπ/2 can be found by considering the 2 dimensional reduced model:
dπ/2 = (u+ cn,maxπ/2)/(rmax− cr,max) (4.13)
This is the minimum radius of curvature, u/(rmax− cr,max), expanded to account the drift imposed by the worst
case disturbance.
35
0 10 20 30 40-0.1
0
0.1
0.2
t
δ rFigure 4.3: Fin angle with (thick line) and without (thin line) penalization of fin angular velocity.
4.2.3 Simulation and experimental results
The controllers were implemented in C++ and integrated in the AUV control software. During the development
phase, the control software was also tested in the simulation environment described in [33]. This simulation envi-
ronment consists of replacing the sensor and actuator drivers by a simulation task offering the exact same software
interfaces. The CPU time consumed by the simulation task is sufficiently low so that the whole application can
be tested on the target CPU while standing on the bench. This allows us to detect potential problems in the
implementation of the control system before deploying the vehicles on water.
The model parameters for the NAUV are the following: u = 1.5, k1 = −3, k2 = −3, k3 = 5.5, k4 = 20,
rmax = 0.4, δr,max = 0.26, cv,max = 0.5, rv,max = 0 (straight paths); the input δr,d(t) is chosen from a set of 31
uniformly spaced values, ranging from −δr,max to δr,max; cv(t) is chosen from a set of 21 uniformly spaced values,
ranging from −cv,max to cv,max; the control period is ∆ = 0.05 seconds; the maximum acceptable cross-track error
is defined as dmax = 6 meters; after some experiments, we found that Kd = 1, Kr = 300 and Kδr= 300 provided
satisfactory results.
The value function VS(x) was computed on a 301× 361× 21× 31 grid. The choice of this grid size was
trade-off between: a) computation time and memory requirements; b) accuracy. In general, the grid resolution
has a direct impact in the accuracy of the final solution of the numerical computation. This is because errors may
propagate and even accumulate during the computation of the value function. However, after its computation, the
value function may be resampled by a less dense grid or by a multiresolution grid, such as in [95]. The objective
of such resampling is to reduce the storage requirements in the target control system and, eventually, to enable
loading of all data to main memory; for instance, NAUV’s computer system has 1GB of RAM and therefore it
allows that approach. In cases when not enough main memory is available, the data can loaded as needed. This
is a reasonable alternative, specially if the data is stored on a solid-state drive (SSD). In general, the seek time for
this type of devices is small enough to meet the real-time requirements of most applications.
In what concerns the numerical solution of the MTTR, the range of the d dimension was defined as [−15,15].The threshold dπ/2 was found to be dπ/2 = 9. Therefore, in what concerns the d dimension, we only store data for
the range [−9,9] in the target system.
Figure 4.3 illustrates the effect of omitting the penalization of δr (i.e., of setting Kδr= 0). As expected, higher
oscillation is observed in the fin angle (δr).
Figure 4.4 shows one simulation for the hybrid controller (3.1), considering typical measurement errors. Note
that the initial cross-track error is larger than dπ/2. The controller starts in mode q0 and switches to q1 after 13
seconds. We assume that the estimated state is affected by an offset whose value is taken from a standard normal
distribution at every 3 seconds. This mimics the behaviour of the acoustic navigation system.
Figure 4.5 shows the results obtained for an underwater operation with the NAUV using the designed DP
control strategy. The operations were performed at the marina of the APDL harbour, Matosinhos, Portugal. The
operation consisted of following a rectangular path. The control system was configured to switch to a new segment
of the path as soon as the vehicle was at distance of 5 meters from the current segment’s endpoint (the vertexes of
the rectangle, in this case). The marine currents were small (about 10 cm/s) and constant during the operations.
We must remark that the most of the disturbances observed in the trajectories of Fig. 4.5(a) and Fig. 4.5(b)
are mainly due to navigation errors introduced by the noisy updates of the acoustic system. Nevertheless, it
can be observed that the results are similar to those obtained in simulation. The cross-track error only exceeds
36
0 50 100 150 200 250 300
0
5
10
x
y
(a) Trajectory on the x-y plane
0 50 100 150 200
-0.2
-0.1
0
0.1
0.2
0.3
t
δ r
(b) Fin angle
Figure 4.4: AUV starting with a large cross-track error and subject to measurement errors. The objective is
to follow the x axis. The thick line represents the actual trajectory and the thin line represents the estimated
trajectory, as used by the controller.
the expected maximum (6 meters, as defined above) in some of the sharp turns corresponding to the changes of
segment. The actuation signal was also relatively smooth, except at the sharp turns. This is not surprising since
the controller was not designed to handle such sharp turns. On those instants, the controller switched to the MTTR
control law which, as expected, produces a more aggressive actuation.
4.3 Formation control
4.3.1 Model
We consider the problem of formation control. In this case the surge velocity will be considered as a control input.
The turning radius of AUVs is almost velocity independent for the typical range of velocities. In order to model
that fact we consider the following slightly modified model:
x
y
ψ
=
ucos(ψ)+ cx
usin(ψ)+ cy
κiu+ cr
(4.14)
where κ is a control input that determines the vehicle curvature. Note that κu is the angular velocity r as before.
We are interested in the problem of keeping a static formation of N vehicles along a reference trajectory
(x0(t),y0(t),ψ0(t)). The reference trajectory is generated by a system of the form (4.14); we will designate this
system as virtual vehicle. By rewriting the system equations with respect to the virtual vehicle’s body fixed frame,
it is possible to formulate the tracking problem for each vehicle in a 3-dimensional space. This is a common
procedure, that can be traced as far back as [55]. With that in mind, we consider the following model for the
motion of each vehicle i ∈ . . . ,N:
si
di
ψr,i
=
−u0 +ui cos(ψr,i)+ r0di
ui sin(ψr,i)+ r0si + cd
κiui− r0
(4.15)
where xi =[
si di ψr,i
]′defines the pose of vehicle i > 0 with respect to the frame fixed on the virtual vehicle;
u0 is the speed of the virtual vehicle with added disturbances; cd models the effects of disturbances and model
uncertainties; all inputs - ui, κi, u0, r0, and cd - are bounded. We assume the vehicles do not know the current
values of u0, r0 and cd .
Define O = oi : i = 1, ..,N. Consider the problem of tracking a reference trajectory (x0(t),y0(t),ψ0(t)) with
a static formation of N vehicles. The static formation is defined by offsets (oi) = (oi cos(ψo,i),oi sin(ψo,i)) with
respect to the body fixed frame of the virtual vehicle. The reference trajectory for each vehicle is:
[
xd,i(t)yd,i(t)
]
=
[
x0(t)+oi(cos(ψo,i)cos(ψ0(t))− sin(ψo,i)sin(ψ0(t)))y0(t)+oi(cos(ψo,i)sin(ψ0(t))+ sin(ψo,i)cos(ψ0(t)))
]
(4.16)
37
-280 -260 -240 -220 -200 -180
40
60
80
100
120
Nor
th (
m)
East (m)
(a) Trajectory on the x-y plane. The rectangle is the
reference path.
0 50 100 150 200 250-10
-5
0
5
10
Time (s)
Cro
ss-t
rack
err
or (
m)
(b) Cross-track error
0 50 100 150 200 250
-0.2
-0.1
0
0.1
0.2
0.3
Time (s)
Fin
ang
le (
rad)
(c) Fin angle
Figure 4.5: Underwater operation at the APDL harbour, Matosinhos, Portugal.
Problem: Given O , find the decentralized feedback control laws fi(xi) and the regions Si such that once xi
is inside a region Si, the vehicles will not collide, independently of the bounded actions of the virtual vehicle and
remaining disturbances. The operation inside Si should be optimized with respect to the following running cost:
Li(xi,ai) = ks,i(si−oi cos(ψo,i))2 + kd,i(di−oi sin(ψo,i))
2 + ku,iu2i + kri,i(κiui)
2, (4.17)
ai>0 =[
ui ri
]′, a0 =
[
u0 r0 cd
]′and f (xi(t),ai(t),a0(t)) is the right hand side of (4.15).
We do not discuss the problem of safely driving each vehicle into the respective Si. Still, this problem
formulation leaves some ambiguity in what concerns the definition of Si. We assume that the (si,di) position of
each vehicle will be constrained to a circle centred in oi; therefore, Si will be the MRCIS with respect to this
circle.
If no symmetries are found, this problem implies the computation of N value functions of the form:
Vi(x) = supβ∈∆0
infai(.)∈Ai
Ji(xi(.),ai(.),β [ai(.)]),xi(0) = x (4.18)
where ∆0 is the set of nonticipating strategies for the adversarial input a0; Ai is the set of piecewise constant input
sequences such that ai(t) belongs to a compact set.
4.3.2 Simulation results
We consider a formation of two identical AUV moving in parallel with the virtual vehicle. The offsets are o1 =
(0,6) and o2 = (0,−6). The first AUV is allowed to stay in R1 = x :
√
s21 +(d1−6)2 ≤ 6. The second is
allowed to stay in R2 = x :
√
s21 +(d1 +6)2 ≤ 6. Given the symmetry, in this case we just have to compute the
value function for one of the vehicles. By appropriate change of coordinates, this value function is also used for
the second vehicle. The remaining problem data is u0 = 1 m/s, 1 < ui < 2 (m/s), −0.25 < κi < 0.25 and cd = 0.
A control rate of 20 Hz is assumed.
38
-5 0 50
2
4
6
8
10
12
s
dFigure 4.6: Sample trajectories on the s−d plane for the tracking of a straight path at constant speed. The initial
positions are marked by a cross. The star marks the desired position of the AUV in the s−d plane. The AUV is
required to stay inside the circle.
We start with the case of straight line trajectory r0. The running cost was defined as L1(x1,u1) = s21+(d1−6)2.
A regular grid of 61×61×73 nodes was used. A discrete space of controls with 11×11 distinct values was used.
With this settings, the computation of V1(x) takes almost two hours (approximately 300 iterations) on a workstation
with a Intel(R) Xeon(R) X3220 CPU (2.40GHz), using four threads of computation. Fig. 4.6 depicts trajectories
on the s−d plane for different starting configurations ((−3,3,π/4), (3,3,π/4), (−3,9,π/4) and (3,9,π/4)). All
trajectories converge to the desired reference and remain inside R1 at all times.
These AUVs have a positive lower bound for velocity. This is because, as seen before, the fin actuation depends
on the vehicle’s velocity. Without sufficient velocity the horizontal fins will not be able to maintain the desired
depth. This is analogous to the stall speed on airplanes. In this case, the considered velocity of the vertical vehicle
equals this lower bound. Therefore, it can be observed that when the AUV starts ahead of the desired position, it
follows an oscillatory trajectory. The AUV does that in order to loose speed with respect to the reference trajectory
(the actual longitudinal speed remains constant).
The computations were repeated with r0 ∈ −0.05,0.05 (rad), u0 = 1.5 m/s and L1(x1,u1) = s21 + (d1 −
6)2+600(uiκi).2. Fig. 4.7 shows the simulation results for a trajectory involving several turns using the maximum
curvature. In this case, the vehicles are able to take advantage of the whole range of input ui. Both vehicles follow
the desired trajectory with a bounded error (see Fig. 4.7(b)).
39
0 50 100 150-20
0
20
40
60
x-y trajectories
y (m
)
x (m)
(a) Trajectories on the x−y plane. Middle trajectory
is the reference.
-5 0 50
2
4
6
8
10
12
d 1 (m
)
s1 (m)
(b) Trajectory on the s1−d1 plane.
0 50 100 1500
0.5
1
1.5
2
2.5
Time (s)
u 1 (m
/s)
(c) Velocity command for AUV 1 (top trajectory on
Fig. 4.7(a)).
0 50 100 150
-0.2
-0.1
0
0.1
0.2
0.3
Cur
vatu
re c
omm
and
(rad
ians
)
Time (s)
(d) Curvature command for AUV 1 (top trajectory on
Fig. 4.7(a)).
Figure 4.7: Simulation with u0 = 1.5 m/s.
40
Chapter 5
Research Plan
In the case of the constrained finite and infinite horizon problems, it is easy to identify the discontinuities of the
value function. In general, the same is not true for the minimum time problem. We plan to develop an algorithm
for the minimum time problem based on the ideas of single pass algorithms (e.g., [88]). The new algorithm will
still assume a finite control rate. The main challenge is the development of a method to identify and to track the
discontinuities. We plan to use elements from the algorithm developed in section 3.6, namely the concept of well
defined cells and multiple time-steps on each iteration.
The use of the discrete-time dynamic programming principle to handle the control design of the sampled data
system is a sound approach, at least from the theoretical point of view. However, when adversarial inputs are
considered, this is not true. A straightforward application of the discrete-time dynamic programming principle
for differential games will assume the same finite sampling rate for both control and adversarial inputs. Given
that the disturbances can change its value at any time, such approach may not be most suitable to model the
adversarial input in certain scenarios. Namely, the sampling rate should be higher than the upper frequency of the
disturbance; otherwise, such approach is optimistic rather than conservative. A more suitable formulation of the
dynamic programming principle would be as follows:
V (t,x) = infa∈Ua
supb∈Ub
∫ ∆
0L(y(x,τ , [0,∆]×a,b),a,b(τ))dτ +V (y(x,∆, [0,∆]×a,b))
(5.1)
In this case, the feedback control law would be given by
f (x) ∈ argmina∈Ua
supb∈Ub
∫ ∆
0L(y(x,τ , [0,∆]×a,b),a,b(τ))dτ +V (y(x,∆, [0,∆]×a,b))
(5.2)
When using numerical approaches, it would be desirable to let the corresponding sampling time to be as high
as possible or physically meaningful for the adversarial input. We propose to study efficient implementations of
this approach and to identify scenarios in which it produces results which are clearly more accurate than the ones
obtained assuming the same sampling rate for both inputs. We will adapt the developed algorithms to take into
account this formulation. We remark that the results obtained on chapter 4 assume the same sampling time for the
control and adversarial inputs. We plan to employ the algorithm developed in section 3.6 as the building block for
the numerical solution of the infinite horizon differential game.
Let us recall the two-mode control strategy presented in chapter 1. From the theoretical point of view, this
control strategy has nice global properties: the domain of attraction for the closed loop system is the MRCIS.
We plan to characterize the stability of the closed loop system for the considered applications when inside the
MRCIS. This is important since for a general form of the running cost, there is no simple way to predict the large
time behaviour of the controlled system. Moreover, the design methodology provides just an approximation of the
infinite horizon value function. We plan to base such study on the numerical approximation of the invariant set of
the closed loop system. In this case, it will be essential to obtain a good over-approximation of the invariant set.
As described in the previous sections, our design follows a pessimistic approach; during the computation of
the value function, informational advantage is given to the adversarial input. In practice, the disturbances do not
always take the worst case value. Moreover, in certain cases, the control system can estimate tighter bounds for
the disturbances during system operation. One practical example of that would be the robustness of AUV motion
with respect to marine currents c. In this case, the value function is computed assuming a range ‖c‖ ≤ cmax;
41
however, occasionally, a tighter bound can be estimated for that disturbance during system operation. On-the-
fly computation of a new value function is, in general, infeasible. Storing several value functions, tailored for
different scenarios, can also be expensive. A simpler alternative is the computation of the optimal control taking
into account the gathered information. This will reduce the range of actions that must be considered for the
adversarial input. Even with complete information and in the absence of disturbances, this will not ensure that the
resulting trajectory is the same as the one that would be obtained should the value function had been computed
assuming an undisturbed system. The main question is how much can be gained by following such approach.
We also plan to consider measurement errors. These errors may be due to sensor accuracy, noise and commu-
nication delays. For instance, in the case of networked applications, the delays can be much larger than the period
of each local controller. The problem of delays has been addressed in the context of dynamic programming in [63]
but this is still an active field of research [45]. For the continuous-time case, this becomes an infinite dimensional
problem and few practical results seem to exist. We plan to address some of these aspects in the final design
methodology.
Another research avenue is the optimal control of hybrid systems [20] for which the discrete state variable acts
as a memory. This is different from basic switched systems (see [8,82,101] and references therein) because in this
case the controlled transitions will be allowed only between certain discrete states. Moreover, autonomous (state
dependent) transitions will also be considered. Some preliminary work has already been done on this topic [34,35],
namely on the numerical computation of hybrid value functions. The considered problems combine the following
aspects: a) both controlled (discrete input) and autonomous (state dependent) transitions between discrete states
are allowed; b) controlled transition are allowed to be triggered only at certain points of the hybrid state space; c)
running cost depends on the discrete state; d) nonlinear continuous-time dynamics are assumed. We are not aware
of any published work on DP for hybrid systems combining all of the above mentioned aspects and offering a
general efficient computation method. In [20], where the author presented a taxonomy for hybrid systems which
subsumed previous models, optimal control of hybrid systems in the setting of DP is discussed. However, no
general efficient computation method is devised there. That line of development is still an open problem today.
Most of the results toward efficient methods are obtained by restricting their range of applicability just to certain
classes of problems. In [81] the authors develop a Hybrid Bellman Equation for systems with regional dynamics
(i.e., only autonomous transitions are considered). The results are demonstrated only for problems featuring two
dimensional continuous dynamics, with quadratic cost. Practical applicability of the method to more complex
systems is not discussed. A similar problem for piecewise affine systems is presented in [10], also using ideas
from DP. There is a vast body of work on dynamic optimization of switched systems and multi-model systems
(switched systems with state jumps) but none seems to meet all the above mentioned aspects (see [8, 82, 101] and
references therein).
Fig 5.1 presents the Gantt Diagram for the the thesis research. The research tasks are independent from each
other. In case of project slip or unpredicted difficulties, the order of execution of the tasks can be easily changed.
42
2011 2012
The case of adversarial inputs
Applications to hybrid systems
Algorithm for the minimum time problem
Verification of the closed loop system
Sub-optimal behaviour of the adversarial input
Measurement errors
Thesis redaction
Figure 5.1: Gantt diagram for the thesis research
43
44
Bibliography
[1] Pascoal A., Silvestre C., and Oliveira P. Vehicle and mission control of single and multiple autonomous ma-
rine robots. In G. N. Roberts and Robert Sutton, editors, Advances in Unmanned Marine Vehicle. Institution
of Electrical Engineers Press, 2006.
[2] A. Pedro Aguiar and Joao P. Hespanha. Trajectory-tracking and path-following of underactuated au-
tonomous vehicles with parametric modeling uncertainty. IEEE Transactions on Automatic Control,
52(8):1362–1379, 2007.
[3] L. E. Aguilar, T. Hamel, and P. Soueres. Robust path following control for wheeled robots via sliding
mode techniques. In Proceedings of the 1997 IEEE/RSJ Int. Conf. Intell. Robot. Syst., pages 1389–1395,
September 1997.
[4] M. Aicardi, G. Casalino, G. Indiveri, A. Aguiar, P. Encarnacao, and A. Pascoal. A planar path follow-
ing controller for underactuated marine vehicles. In Proc. 9th Mediterranean Conference on Control and
Automation, Dubrovnik, Croatia, 2001.
[5] Alessandro Alessio and Alberto Bemporad. A survey on explicit model predictive control. In Lalo Magni,
Davide Raimondo, and Frank Allgower, editors, Nonlinear Model Predictive Control, volume 384 of Lec-
ture Notes in Control and Information Sciences, pages 345–369. Springer Berlin - Heidelberg, 2009.
[6] Ken Alton and Ian M. Mitchell. Fast marching methods for stationary hamilton-jacobi equations with
axis-aligned anisotropy. SIAM Journal of Numerical Analysis, 47(1):363–385, 2008.
[7] J.P. Aubin. Viability Theory. Systems & Control: Foundations & Applications. Birkhauser Boston Inc.,
Boston, 1991.
[8] H. Axelsson, M. Boccadoro, M. Egerstedt, P. Valigi, and Y. Wardi. Optimal mode-switching for hybrid
systems with varying initial states. Nonlinear Analysis: Hybrid Systems, 2(3):765 – 772, 2008.
[9] Stanley Bak, Joyce McLaughlin, and Daniel Renzi. Some improvements for the fast sweeping method.
SIAM J. Sci. Comput., 32:2853–2874, September 2010.
[10] M. Baotic, F.J. Christophersen, and M. Morari. Constrained optimal control of hybrid systems with a linear
performance index. IEEE Transactions on Automatic Control, 51(12):1903–1919, Dec. 2006.
[11] M. Bardi, T. E. S. Raghavan, and T. Parthasarathy, editors. Stochastic and Differential Games: Theory
and Numerical Methods, volume 4 of Annals of the International Society of Dynamic Games. Birkhauser
Boston, Basel, Switzerland, 1999.
[12] Martino Bardi. On differential games with long-time-average cost. In P. Bernhard, V. Gaitsgory, and
O. Pourtallier, editors, Advances in Dynamic games and their Applications, volume 10 of Annals of the
International Society of Dynamic Games. Birkhauser Boston, 2009.
[13] Martino Bardi and I. Capuzzo-Dolcetta. Optimal control and viscosity solutions of Hamilton-Jacobi-
Bellman equations. Birkhauser Boston, 1997.
[14] Martino Bardi, Maurizio Falcone, and Pierpaolo Soravia. Numerical methods for pursuit-evasion games
via viscosity solutions. In M. Bardi, T. E. S. Raghavan, and T. Parthasarathy, editors, Stochastic and
Differential Games: Theory and Numerical Methods, volume 4 of Annals of the International Society of
Dynamic Games, chapter 3, pages xvi + 380. Birkhauser Boston, Basel, Switzerland, 1999.
45
[15] Alfredo Bellen and Marino Zennaro. Numerical Methods for Delay Differential Equations. Numerical
Mathematics and Scientific Computation, Oxford Science Publications. Oxford University Press, Oxford,
2003.
[16] R. Bellman. Dynamic programming. Princeton University Press, Princeton, 1957.
[17] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 1995.
[18] Franco Blanchini and Stefano Miani. Set-Theoretic Methods in Control. Birkhauser, Boston, 2008.
[19] O. Bokanowski, E. Cristiani, and H. Zidani. An efficient data structure and accurate scheme to solve front
propagation problems. J. Sci. Comput., 42:251–273, February 2010.
[20] Michael Branicky. Studies in Hybrid Systems: Modeling, Analysis and Control. PhD thesis, MIT, 1995.
[21] M.S. Branicky, R. Hebbar, and G Zhang. A fast marching algorithm for hybrid systems. In Proceedings of
the IEEE Conference on Decision and Control, pages 4897–4902, Phoenix, AZ, December 1999.
[22] Michele Breton and Krzysztof Szajowski, editors. Advances in Dynamic Games: Theory, Applications,
and Numerical Methods for Differential and Stochastic Games, volume 11 of Annals of the International
Society of Dynamic Games. Birkhauser Boston, 2011.
[23] E.F. Camacho and C. Bordons. Model predictive control. Advanced textbooks in control and signal pro-
cessing. Springer, 2004.
[24] P. Cardaliaguet, M. Quincampoix, and P. Saint-Pierre. Set-valued numerical analysis for optimal control and
differential games. In M. Bardi, T. E. S. Raghavan, and T. Parthasarathy, editors, Stochastic and Differential
Games: Theory and Numerical Methods, volume 4 of Annals of International Society of Dynamic Games.
Birkhauser Boston, 1999.
[25] Elisabetta Carlini, Maurizio Falcone, and Roberto Ferretti. An efficient algorithm for hamilton-jacobi
equations in high dimension. Computing and Visualization in Science, 7(1):15–29, 2004.
[26] Thomas C. Cecil, Stanley J. Osher, and Jianliang Qian. Simplex free adaptive tree fast sweeping and
evolution methods for solving level set equations in arbitrary dimension. Journal of Computational Physics,
213(2):458 – 473, 2006.
[27] Wang Chi-Shu. Essentially non-oscillatory and weighted essentially non-oscillatory schemes for hyper-
bolic conservation laws. Technical report, Institute for Computer Applications in Science and Engineering
(ICASE), 1997.
[28] Frank Christophersen, Mato Baotic, and Manfred Morari. Optimal control of piecewise affine systems: A
dynamic programming approach. In Thomas Meurer, Knut Graichen, and Ernst Gilles, editors, Control and
Observer Design for Nonlinear Finite and Infinite Dimensional Systems, volume 322 of Lecture Notes in
Control and Information Sciences, pages 183–198. Springer Berlin - Heidelberg, 2005.
[29] M. Crandall and P.-L. Lions. Viscosity solution of hamilton-jacobi equations. Transactions of the American
Mathematical Society, 277:1–42, 1983.
[30] E. Cristiani and M. Falcone. Fully-discrete schemes for the value function of pursuit-evasion games with
state constraint. In Pierre Bernhard, Vladimir Gaitsgory, and Odile Pourtallier, editors, Advances in Dy-
namic Games and Their Applications: Analytical and Numerical Developments, volume 10 of Annals of
International Society of Dynamic Games, pages 178–205. Birkhauser Boston, 2009.
[31] Emiliano Cristiani. A fast marching method for hamilton-jacobi equations modeling monotone front prop-
agations. J. Sci. Comput., 39(2):189–205, 2009.
[32] Jorge Estrela da Silva and Joao Borges de Sousa. Dynamic programming techniques for feedback control.
In Proceedings of the IFAC 18th World Congress, Milan, August 2011.
[33] Jorge Estrela da Silva, Bruno Terra, Ricardo Martins, and Joao Borges de Sousa. Modeling and simulation
of the lauv autonomous underwater vehicle. In 13th IEEE IFAC International Conference on Methods and
Models in Automation and Robotics, Szczecin, Poland, August 2007.
46
[34] J. Borges de Sousa, Jorge Estrela da Silva, and Fernando Lobo Pereira. New problems of optimal path
coordination for multi-vehicle systems. In Proceedings of the 10th European Control Conference, Budapest,
Hungary, August 2009.
[35] Joao Borges de Sousa and Jorge Estrela da Silva. Cooperative path planning in the presence of adversarial
behavior. In 4th International Scientific Conference on Physics and Control, Catania, Italy, September
2009.
[36] E. W. Dijkstra. A note on two problems in connection with graphs. Numerische Mathematik I, pages
269–271, 1959.
[37] K.D. Do and J. Pan. Underactuated ships follow smooth paths with integral actions and without velocity
measurements for feedback: theory and experiments. IEEE Transactions on Control Systems Technology,
14(2):308 – 322, march 2006.
[38] R.J. Elliott and N.J. Kalton. The existence of value in differential games. Memoirs of the American
Mathematical Society. American Mathematical Society, 1972.
[39] P. Encarnacao, A. Pascoal, and M. Arcak. Path following for marine vehicles in the presence of unknown
currents. In Proceedings of SYROCO’2000 - 6th International IFAC Symposium on Robot Control, vol-
ume II, pages 469–474, Vienna, Austria, 2000.
[40] Douglas Enright, Frank Losasso, and Ronald Fedkiw. A fast and accurate semi-lagrangian particle level set
method. Computers & Structures, 83(6-7):479 – 490, 2005.
[41] M. Falcone. A numerical approach to the infinite horizon problem of deterministic control theory. Applied
Mathematics and Optimization, 15:1–13, 1987.
[42] M. Falcone. Numerical methods for differential games via pdes. International Game Theory Review,
8(2):231–272, 2006.
[43] M. Falcone and R. Ferretti. Semi-lagrangian schemes for hamilton-jacobi equations, discrete representation
formulae and godunov methods. J. Comput. Phys., 175:559–575, January 2002.
[44] M. Falcone and P. Stefani. Advances on parallel algorithms for the isaacs equation. In Andrzej S. Nowak
and Krzysztof Szajowski, editors, Advances in Dynamic Games: Applications to Economics, Finance,
Optimization, and Stochastic Control, volume 7 of Annals of the International Society of Dynamic Games,
pages 515–544. Birkhauser Boston, 2005.
[45] Salvatore Federico, Ben Goldsys, and Fausto Gozzi. Hjb equations for the optimal control of differential
equations with delays and state constraints, i: regularity of viscosity solutions. SIAM Journal of Control
and Optimization, 48(8):4910–4937, 2010.
[46] Wendell H. Fleming and Halil Mete Soner. Controlled Markov Processes and Viscosity Solutions. Springer,
New York, 2006.
[47] T. I. Fossen. Guidance and Control of Ocean Vehicles. John Wiley and Sons Ltd., 1994.
[48] Brian Gough. GNU Scientific Library Reference Manual - 2nd Edition. Network Theory Ltd., 2003.
[49] A. Grancharova and T. A. Johansen. Computation, approximation and stability of explicit feedback min-
max nonlinear model predictive control. Automatica, 45:1134–1143, 2009.
[50] Lars Grune and Jurgen Pannek. Nonlinear Model Predictive Control - Theory and Algorithms. Communi-
cations and Control Engineering. Springer Verlag, London, 2011.
[51] Ami Harten, Bjorn Engquist, Stanley Osher, and Sukumar R. Chakravarthy. Uniformly high order accurate
essentially non-oscillatory schemes, 111. J. Comput. Phys., 71:231–303, August 1987.
[52] A. J. Healey and D. B. Marco. Slow speed flight control of autonomous underwater vehicles: Experimen-
tal results with the nps auv ii. In Proceedings of the 2nd International Offshore and Polar Engineering
Conference (ISOPE), San Francisco, CA, 1992.
47
[53] A.J. Healey and D. Lienard. Multivariable sliding mode control for autonomous diving and steering of
unmanned underwater vehicles. IEEE Journal of Oceanic Engineering, 18(3):327–339, jul 1993.
[54] G. Indiveri, M. Aicardi, and G. Casalino. Nonlinear time-invariant feedback control of an underactuated
marine vehicle along a straight course. In Proceedings of the 5th IFAC Conference on Manoeuvring and
Control of Marine Crafts, MCMC 2000, pages 221–226, Aalborg, Denmark, August 2000.
[55] Rufus Isaacs. Differential games; a mathematical theory with applications to warfare and pursuit, control
and optimization. John Wiley & Sons, New York, 1965.
[56] Chiu Yen Kao, Carmeliza Navasca, and Stanley Osher. The lax-friedrichs sweeping method for optimal
control problems in continuous and hybrid dynamics. Nonlinear Analysis (Invited Talks from the Fourth
World Congress of Nonlinear Analysts - WCNA 2004), 63:e1561–e1572, 2005.
[57] Chiu-Yen Kao, Stanley Osher, and Jianliang Qian. Legendre-transform-based fast sweeping methods for
static hamilton-jacobi equations on triangulated meshes. J. Comput. Phys., 227(24):10209–10225, 2008.
[58] N.N. Krasovskii and A.I. Subbotin. Game-theoretical control problems. Springer-Verlag, New York, 1988.
[59] L. Lapierre and B. Jouvencel. Robust nonlinear path-following control of an auv. IEEE Journal of Oceanic
Engineering, 33(2):89 –102, April 2008.
[60] Shingyu Leung and Hongkai Zhao. A grid based particle method for moving interface problems. Journal
of Computational Physics, 228:2993–3024, 2009.
[61] E. Lewis, editor. Principles of Naval Architecture. Society of Naval Architects and Marine Engineers,
1989. 2nd revision.
[62] Fengyan Li, Chi-Wang Shu, Yong-Tao Zhang, and Hongkai Zhao. A second order discontinuous galerkin
fast sweeping method for eikonal equations. J. Comput. Phys., 227(17):8191–8208, 2008.
[63] David G. Luenberger. Cyclic dynamic programming: A procedure for problems with fixed delay. Opera-
tions Research, 19(4):1101–1110, July-August 1971.
[64] Hiroyoshi Mitake. Asymptotic solutions of hamilton-jacobi equations with state constraints. Applied Math-
ematics and Optimization, 58(3):393–410, 2008.
[65] Ian Mitchell. Application of Level Set Methods to Control and Reachability Problems in Continuous and
Hybrid Systems. PhD thesis, Stanford University, 2002.
[66] Ian M. Mitchell. A toolbox of level set methods. Technical Report TR-2007-11, UBC Department of
Computer Science, June 2007.
[67] Ian M. Mitchell. The flexible, extensible and efficient toolbox of level set methods. Journal of Scientific
Computing, 35(2-3):300–329, June 2008.
[68] Ian M. Mitchell and Jeremy A. Templeton. A toolbox of hamilton-jacobi solvers for analysis of nondeter-
ministic continuous and hybrid systems. In HSCC, pages 480–494, 2005.
[69] I.M. Mitchell, A.M. Bayen, and C.J. Tomlin. A time-dependent hamilton-jacobi formulation of reachable
sets for continuous dynamic games. IEEE Transactions on Automatic Control, 50(7):947–957, July 2005.
[70] D.R. Nelson, D.B. Barber, T.W. McLain, and R.W. Beard. Vector field path following for miniature air
vehicles. IEEE Transactions on Robotics, 23(3):519 –529, June 2007.
[71] D. Nesic, A. Lorıa, E. Panteley, and A. R. Teel. On stability of sets for sampled-data nonlinear inclusions via
their approximate discrete-time models and summability criteria. SIAM J. Control Optim., 48:1888–1913,
June 2009.
[72] Urbano Nunes and L. Bento. Data fusion and path-following controllers comparison for autonomous vehi-
cles. Nonlinear Dynamics, 49:445–462, 2007.
48
[73] S. Osher and R. Fedkiw. Level Set Methods and Dynamic Implicit Surfaces. Springer-Verlag, New York,
2002.
[74] S. Osher and R. P. Fedkiw. Level set methods: An overview and some recent results. Journal of Computa-
tional Physics, 169:463–502, 2001.
[75] S. Osher and J. A. Sethian. Fronts propagating with curvature-dependent speed: Algorithms based on
hamilton–jacobi formulations. Journal of Computational Physics, 79:12–49, 1988.
[76] Stanley Osher and Nikos Paragios, editors. Geometric Level Set Methods in Imaging, Vision, and Graphics.
Springer-Verlag, 2003.
[77] Stanley Osher and Chi-Wang Shu. High-order essentially nonoscillatory schemes for hamilton-jacobi equa-
tions. SIAM Journal on Numerical Analysis, 28(4):907–922, August 1991.
[78] Jianliang Qian, Yong-Tao Zhang, and Hong-Kai Zhao. A fast sweeping method for static convex hamilton-
jacobi equations. J. Sci. Comput., 31(1-2):237–271, 2007.
[79] Christian Rasch and Thomas Satzger. Remarks on the o(n) implementation of the fast marching method.
IMA Journal of Numerical Analysis, 29(3), 2007.
[80] Claude Samson. Path following and time-varying feedback stabilization of a wheeled mobile robot. In Int.
Conf. ICARCV’92, Singapore, 1992.
[81] A. Schollig, P.E. Caines, M. Egerstedt, and R. Malhame. A hybrid bellman equation for systems with
regional dynamics. In Proceedings of the 46th IEEE Conference on Decision and Control, pages 3393–
3398, Dec. 2007.
[82] C. Seatzu, D. Corona, A. Giua, and A. Bemporad. Optimal control of continuous-time switched affine
systems. IEEE Transactions on Automatic Control, 51(5):726–741, May 2006.
[83] J. A. Sethian. A fast marching level set method for monotonically advancing fronts. Proceedings of the
National Academy of Sciences of the United States of America, 93:1591–1595, 1996.
[84] J. A. Sethian. A review of theory, algorithms, and applications of level set methods for propagating inter-
faces. Acta Numerica, 5:309–395, 1996.
[85] J. A. Sethian. Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational
Geometry, Fluid Mechanics, Computer Vision and Materials Sciences. Cambridge University Press, 1999.
[86] J. A. Sethian and A. Vladimirsky. Ordered upwind methods for static hamilton-jacobi equations. Proceed-
ings of the National Academy of Sciences of the USA, 98(20):11069–11074, September 2001.
[87] J.A. Sethian and A. Vladimirsky. Ordered upwind methods for hybrid control. In M.R. Greenstreet
C.J. Tomlin, editor, Hybrid Systems: Computation and Control: 5th International Workshop (Proceedings),
volume 2289, pages 393–406, Stanford, CA, USA, March 2002. Springer-Verlag GmbH.
[88] James A. Sethian and Alexander Vladimirsky. Ordered upwind methods for static hamilton-jacobi equa-
tions: Theory and algorithms. SIAM J. Numer. Anal., 41(1):325–363, 2003.
[89] Jorge Estrela Silva and Joao Borges de Sousa. A dynamic programming approach for the control of au-
tonomous vehicles on planar motion. In Proceedings of International Conference on Autonomous and
Intelligent Systems (AIS 2010), Povoa de Varzim, Portugal, June 2010.
[90] Jorge Estrela Silva and Joao Borges de Sousa. A dynamic programming approach for the motion control
of autonomous vehicles. In Proceedings of the 49th IEEE Conference on Decision and Control, December
2010.
[91] J.J.E. Slotine and W. Li. Applied Nonlinear Control. Prentice-Hall, Englewood Cliffs, New Jersey 07632,
1991.
[92] D.A. Smallwood and L.L. Whitcomb. Model-based dynamic positioning of underwater robotic vehicles:
theory and experiment. IEEE Journal of Oceanic Engineering, 29(1):169 – 186, jan. 2004.
49
[93] E.D. Sontag. A lyapunov-like characterization of asymptotic controllability. SIAM J. Control Optim.,
21(3):462–471, 1983.
[94] O.J. Sordalen and C. Canudas de Wit. Exponential control law for a mobile robot: extension to path
following. pages 2158 –2163 vol.3, May 1992.
[95] S. Summers, C.N. Jones, J. Lygeros, and M. Morari. A multiresolution approximation method for fast
explicit model predictive control. IEEE Transactions on Automatic Control, 2011.
[96] Claire Tomlin, Ian Mitchell, Alexandre Bayen, and Meeko Oishi. Computational techniques for the verifi-
cation of hybrid systems. Proceedings of the IEEE, 91(7):986–1001, July 2003.
[97] Yen-Hsi Richard Tsai, Li-Tien Cheng, Stanley Osher, and Hong-Kai Zhao. Fast sweeping algorithms for a
class of hamilton-jacobi equations. SIAM Journal on Numerical Analysis, 41:673–694, 2003.
[98] J. N. Tsitsiklis. Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic
Control, 40(9):1528–1538, September 1995.
[99] Diederik Verscheure, Bram Demeulenaere, Jan Swevers, Joris De Schutter, and Moritz Diehl. Time-
optimal path tracking for robots: A convex optimization approach. IEEE Transaction on Automatic Control,
54(10):2318–2327, October 2009.
[100] A. Vladimirsky. Static pdes for time-dependent control problems. Interfaces and Free Boundaries,
8(3):281–300, 2006.
[101] S. Wei, K. Uthaichana, M. Zefran, R. A. DeCarlo, and S. Bengea. Applications of numerical optimal control
to nonlinear hybrid systems. Nonlinear Analysis: Hybrid Systems, 1(2):264–279, 2007.
[102] Liron Yatziv, Alberto Bartesaghi, and Guillermo Sapiro. O(n) implementation of the fast marching algo-
rithm. J. Comput. Phys., 212(2):393–399, 2006.
[103] Hongkai Zhao. Parallel implementation of the fast sweeping method. Journal of Computational Mathe-
matics, 25(4):421–429, 2007.
50
Appendix A
Appendix
A.1 Dijkstra’s algorithm
The following description of the Dijkstra’s algorithm is essentially a quote of [88]. All nodes are divided into
three classes (three possible labels): Far (no information about the correct value of V is known), Accepted (the
correct value of V has been computed), and Considered (adjacent to Accepted).
1. Start with all the nodes in Far.
2. Move the boundary nodes to Accepted, with V (x) = g(x).
3. Move all the nodes x adjacent to the boundary into Considered and evaluate the tentative values.
4. Find the node x with the smallest value among all the Considered.
5. Move x to Accepted.
6. Move the Far nodes adjacent to x into Considered.
7. Reevaluate the value of all the Considered x adjacent to x.
8. If Considered is not empty, then go to 4.
The described algorithm has the computational complexity of O(M log(M)); the factor of log(M) reflects the
necessity to maintain a sorted list of the Considered values to determine the next Accepted node.
A.2 Semi-Lagrangian scheme
As an example of semi-Lagrangian schemes, we describe the method employed in [98], for regular grids. Given a
simplex composed of x, xi and x j, where x is the point to be updated, the tentative value of V (x) is given by
minθ
θV (xi)+(1−θ)V (x j)+ c(x)d(θ)
where d(θ) = ‖x− (θxi + (1− θ)x j)‖2. In a two-dimensional domain, four simplices are considered: x,x+(h,0),x+(0,h), x,x+(−h,0),x+(0,h), x,x+(−h,0),x+(0,−h) and x,x+(h,0),x+(0,−h), where h
is the spacing of the grid; the value of V (x) is given by the minimum of the respective four tentative values.
A.3 Upwind difference scheme
The equations below illustrate the application of the Godunov difference scheme to the Eikonal equation in a
two-dimensional domain (see [97], for instance).
[max(max(D−xi j V,0),−min(D+x
i j V,0))2+
max(max(D−yi j V,0),−min(D+y
i j V,0))2] = c2(x) (A.1)
51
with D−xi j V = (Vi, j−Vi−1, j)/∆x, D+x
i j V = (Vi+1, j−Vi, j)/∆x; ∆x is the grid spacing in x dimension; the definitions
for the y dimension are similar. The upwind concept consists of choosing the one-sided finite difference Di j
corresponding to the proper propagation of the front. In the case of single-pass methods for the Eikonal equation,
this will be analogous to picking the appropriate nodes from the already computed set while avoiding the still non
considered nodes.
A.4 Essentially Non-oscillatory Scheme
We briefly describe the ENO technique [51] for the second order unidimensional interpolation (see also [27]).
Consider a stencil of four grid nodes xi− 3
2,x
i− 12,x
i+ 12,x
i+ 32
such that xi− 1
2≤ x < x
i+ 12. Consider the interpolation
using the 3 leftmost stencil nodes:
S(x) = P1(x)+V [xi− 3
2,x
i− 12,x
i+ 12](x− x
i− 12)(x− x
i+ 12) (A.2)
where V [xi− 3
2,x
i− 12,x
i+ 12] is the Newton’s second degree divided difference. Likewise, consider the interpolation
using the 3 rightmost stencil nodes:
S(x) = P1(x)+V [xi− 1
2,x
i+ 12,x
i+ 32](x− x
i− 12)(x− x
i+ 12) (A.3)
For a regular grid, (x− xi− 1
2)(x− x
i+ 12) = (x− x
i− 12)(x− x
i− 12− 1) = dx(dx− 1)(∆x)2. The interpolation
formulas change accordingly; we illustrate only for the left case:
S(x) = P1(x)+V < xi− 3
2,x
i− 12,x
i+ 12> dx(dx−1) (A.4)
where V < xi− 3
2,x
i− 12,x
i+ 12> is the Newton’s second degree undivided difference (undivided in the sense that the
division by (∆x)2 is not performed).
The expressions for the left and right Newton’s second degree undivided differences V−≡V < xi− 3
2,x
i− 12,x
i+ 12>
and V+ ≡V < xi− 1
2,x
i+ 12,x
i+ 32> are as follows:
V− = (vi− 3
2−2v
i− 12+ v
i+ 12)/2 (A.5)
V+ = (vi− 1
2−2v
i+ 12+ v
i+ 32)/2 (A.6)
The ENO scheme uses the difference with smaller absolute value. This is because, in general, a smaller difference
corresponds to a stencil sampling a smoother part of the function to be interpolated. Notice that the idea is to avoid
discontinuities.
The scheme can be extended to higher dimensions in a tree-like fashion. For a system of dimension n, the
second order ENO scheme requires 4n−13
computations such as the one described for the unidimensional case.
52