dynamic programming for computer based feedback control of ...dee07006/trp-jes.pdf · an infinite...

60
FACULDADE DE E NGENHARIA DA UNIVERSIDADE DO P ORTO DOCTORAL P ROGRAMME IN E LECTRICAL AND C OMPUTER E NGINEERING Dynamic programming for computer based feedback control of continuous-time systems by Jorge Manuel Estrela da Silva A thesis proposal submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical and Computer Engineering) in the University of Porto, School of Engineering October 2011 Doctoral Committee: Professor Fernando M. F. Lobo Pereira, Chair (FEUP) Professor Hasnaa Zidani (ENSTA - ParisTech) Professor Fernando A. C. C. Fontes (FEUP)

Upload: others

Post on 24-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO

DOCTORAL PROGRAMME IN ELECTRICAL AND COMPUTER ENGINEERING

Dynamic programming for computer basedfeedback control of continuous-time systems

by

Jorge Manuel Estrela da Silva

A thesis proposal submitted in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

(Electrical and Computer Engineering)

in the University of Porto, School of Engineering

October 2011

Doctoral Committee:

Professor Fernando M. F. Lobo Pereira, Chair (FEUP)

Professor Hasnaa Zidani (ENSTA - ParisTech)

Professor Fernando A. C. C. Fontes (FEUP)

Page 2: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and
Page 3: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Abstract

This document proposes the research directions to be followed by the author for the fulfilment of the requisites

of the Ph.D degree in the Doctoral Program in Electrical and Computer Engineering. The focus of this work is

on the development of efficient methodologies for the application of dynamic programming techniques on control

engineering problems. We work in the framework of deterministic differential games in order to encompass the

effect of disturbances and model uncertainty in the control design. The proposal is substantiated by preliminary

work. The main achievements of the preliminary work consist of a survey of numerical methods for the solution of

the dynamic programming equations, development of a numerical approach for the proposed control strategy and

application of the numerical approach to an engineering application. The main application consists of the motion

control of an Autonomous Underwater Vehicle (AUV). The AUV motion is modelled as a 4 dimensional system

of first order nonlinear ordinary differential equations. However, the adversarial input is assumed to be generated

by a discrete-time system at the same rate of the control cycle. The main line of future research will be the case

of mixed discrete-time (for the control input) and continuous-time (disturbances) inputs. We also plan to extend

the design methodology for the case of general Hybrid Systems.

Page 4: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

ii

Page 5: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Acknowledgements

There is still some road ahead and I will leave the complete list of acknowledgements for the final thesis document.

However, I would like to start by thanking Professor Fernando Lobo Pereira for all the trust that he deposited in

me during this work. I would like to thank Professor Hasnaa Zidani and Professor Fernando Fontes for accepting

being members of my Doctoral committee.

I would also like to dedicate a special acknowledgement to Eng. Joao Borges de Sousa. His support and

guidance were of capital importance to some of the results reported on this document. The main incentive to

explore the application of numerical methods to optimal control engineering applications has come from him,

back when I was taking the Hybrid Systems course that he taught during my first year of the PDEEC. He was the

responsible to arrange the visit of Professor Ian Mitchell to FEUP, in November 2010, and to make my presence

at the BIRS Workshop on Advancing numerical methods for viscosity solutions and applications, Canada, in

February 2011, possible (I also acknowledge the financial support from Universita’ di Roma Tre for the travel

expenses and I thank, in particular, Professor Roberto Ferretti for arranging this support). Professor Ian Mitchell

gave me precious insights on the numerical methods for HJI PDEs during his stay at Porto. The BIRS workshop

gave me the chance to meet some very active members of this field. The suggestion of application of dynamic

programming techniques to AUV motion control, in the context of the C4C European project, has come from Eng.

Joao Borges de Sousa. He had the courtesy of providing all the necessary facilities for the experiments with the

AUV from the Underwater Systems and Technology Laboratory from FEUP.

I thank the financial support of the C4C European Project for the participation in the following conferences:

ECC2009, Physcon2009, AIS2010, CDC2010, IFAC2011, Physcon2011.

I thank my institution, Instituto Superior de Engenharia do Porto (ISEP), and Fundacao da Ciencia e Tec-

nologia for partial funding of this research, under the PROTEC program.

iii

Page 6: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

iv

Page 7: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Two mode strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Summary of preliminary work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 7

2.1 Dynamic Programming: a brief introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Dynamic programming for deterministic systems continuous-time systems . . . . . . . . . . . . . 7

2.3 Receding horizon control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Numerical methods for the dynamic programming equations . . . . . . . . . . . . . . . . . . . . 10

2.4.1 Control-theoretic schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4.2 Level set methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4.3 Efficient methods for boundary value problems . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.4 Applications to hybrid systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Synthesis of the Feedback Controller 19

3.1 Solution of the two mode strategy by dynamic programming . . . . . . . . . . . . . . . . . . . . 19

3.2 Numerical algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 The semi-Lagrangian scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4 Constrained finite and infinite horizon problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5.1 Continuous-time LQR: unconstrained case . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5.2 State and input constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5.3 Sampled data system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.5.4 Minimum time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.6 Semi-Lagrangian receding horizon scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.6.1 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.6.2 Numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 A Motivating Application 31

4.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Six degree of freedom model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.2 Model simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Path following problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.2 Control design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2.3 Simulation and experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3 Formation control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Research Plan 41

Bibliography 44

v

Page 8: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

A Appendix 51

A.1 Dijkstra’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

A.2 Semi-Lagrangian scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

A.3 Upwind difference scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

A.4 Essentially Non-oscillatory Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

vi

Page 9: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Chapter 1

Introduction

1.1 Motivation

The importance of dynamic optimization techniques has been recognized in several areas. From chemical to

aerospace engineering, from finance to biology, there are several niches where the employment of such techniques

was found to be profitable. In the context of feedback control of dynamic systems, one may be interested in the

design of a control law that makes the closed loop system meet a certain optimization criteria while satisfying a

set of state and input constraints.

The objectives of the control laws can be roughly divided in two main classes: output regulation, which - in

the context of dynamic optimization - may be translated into an infinite horizon problem; and reachability of some

given target (target problem). Common formulations of the reachability problem are in terms of time-optimality or

energy-optimality. For certain combinations of system dynamics and optimization criteria, it is possible to obtain

analytical expressions for such control laws; one classical example of that is the linear-quadratic regulator (LQR),

an infinite horizon optimal control problem (OCP). However, for general systems with input and state constraints,

it is generally impossible to derive an analytical expression for the control law.

The focus of this research is on dynamic programming (DP) techniques [16] for the feedback control of con-

tinuous time systems on a deterministic setting. The main goal is to develop an efficient control design procedure

for the deployment of computer based controllers with specific global robustness and performance properties.

For this purpose, the proposed control strategy is based on a set of dynamic optimization problems. We use the

framework of deterministic differential games [22, 55, 58] in order to ensure the robustness properties. Differen-

tial games generalize the dynamic programming techniques for systems with competing inputs. We will consider

the zero-sum case with two competing inputs. These inputs will be designated as control input and adversarial

input. In this sense, the effect of disturbances and model uncertainty can be modelled as a vectorial adversarial

input. Such disturbances and uncertainties will be handled in a worst-case basis. We assume computer controlled

systems and, therefore, the more usual discrete-time feedback control law.

The DP approach is based on the concept of value function. The value function gives the optimal cost of

the optimal control problem as a function of the current state and, eventually, time. The dynamic programming

principle (DPP) provides a constructive procedure to compute the value function. Finally, given the value function,

the optimal control can be determined as a state feedback.

However, traditionally, DP has been considered infeasible for practical problems. This is because, in general,

the value function must be determined using numerical methods. This offers two main challenges: the actual

computation of the value function; and efficient storage of the value function on the target system. Both challenges

are associated to the so called curse of dimensionality. In fact, for general approaches, the processing and storage

requirements grow exponentially with the system dimension. The computation of the value function is the most

demanding task in this process. However, it must be remarked that this computation just has to be performed

once for each given problem data. Assuming the problem data is constant (i.e., no change of state constraints,

optimization criteria or system parameters is expected during system operation) this computation can be performed

offline and, therefore, state of the art workstations, General Purpose Computing on Graphics Processing Units

(GPGPU), clusters of computers and more powerful solutions may be used to minimize the computation time.

On the other hand, given the value function, the computation of the optimal feedback control is many order

of magnitudes faster. The main question here is what kind of representation of the value function should be

1

Page 10: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

employed in order to allow accurate computation of the optimal controls while minimizing the required storage

space. Notice that such optimization is also important for the computation of the value function. However, the later

avenue offers more delicate problems: it is harder to find the optimal discretization while still computing the value

function; loss of accuracy incurred by the discretization can propagate and accumulate during the computation of

the value function.

Given these difficulties, many applications of dynamic optimization consist of open loop solutions. For in-

stance, the target problem is usually solved either as a open loop problem or by decoupling it into two different

sub-problems: optimal trajectory (or path) planning and trajectory (or path) tracking. The open loop consists

of computing the optimal control sequence for the given problem data and feeding that sequence to the system.

This approach clearly suffers from lack of robustness with respect to model uncertainties and disturbances. The

trajectory tracking approach, while in general ensuring target reachability, may require trajectory replanning in

order to account for large deviations from the expected trajectory; otherwise, optimality may be severely impaired

and state constraints may not be met. Moreover, this approach still requires the design of an appropriate feedback

controller. The same applies to path tracking (also known as path following); the main difference to trajectory

tracking is the relaxation of the dependence with respect to time in the later case.

In the past decades, Model Predictive Control (MPC), also known as receding horizon control, has been one

of the most active fields of research on dynamic optimization based feedback control. The key idea of MPC

is to solve a finite horizon OCP at each control instant and use the first value of the optimal control sequence.

This technique requires high processing capabilities for the target system. Moreover, ensuring stability of a MPC

regulator is not a straightforward task; the choice of the prediction horizon and terminal cost play an essential role

in this subject. The main application of MPC is the optimal regulation of systems with input and state constraints.

For piecewise-affine systems with input and state constraints, efficient solutions can be found, for certain classes

of optimization criteria. For more general systems and optimization criteria, the research for efficient solutions is

still an open field, sometimes coined as Nonlinear MPC (NMPC) (see, e.g., [50]).

In spite of the steady evolution of computer systems, finding a sufficiently accurate numerical solution for an

optimal control problem in real time (even in the finite horizon case) is still an infeasible task for many applica-

tions, even if only due to economical aspects. In order to overcome this problem, the MPC community developed

the concept of explicit MPC (eMPC) [5, 95]). The main idea of this technique consists of performing the op-

timization effort during the design phase and then store a discrete set of values from which the optimal control

can be easily determined (e.g., by interpolation, by picking a pre-determined gain for the respective region of the

state space). However such approach suffers from the curse of dimensionality. Moreover, the parallel between the

eMPC and DP approaches is obvious: offloading the bulk of the optimization to the design phase; efficient online

computation of the state feedback using some stored table of values.

Therefore, in spite of having been dismissed as intractable by many in the past, dynamic programming seems

to be gradually becoming a realistic approach for practical applications. Numerical methods for the computation

of the value function for problems involving continuous-time and continuous state space systems have been de-

veloped in the past two decades (see [19, 24, 30, 67] and references therein). Such methods can take advantage

of parallel computing (see, e.g., [44]); of course this is not enough to defeat the curse of dimensionality but it

can be explored in order to reduce computation times, to refine the discretization or to make it feasible to handle

problems with some additional extra dimensions (depending on the available number of machines and desired

accuracy). Dynamic programming methods are deeply connected to Hamilton-Jacobi (HJ) Partial Differential

Equations (PDE). In fact, the differential version of the DPP principle leads either to the Hamilton-Jacobi-Bellman

(HJB) PDE, if only the control input is considered, or to the Hamilton-Jacobi-Isaacs (HJI) PDE, if an adversarial

input is assumed. The results and techniques for the numerical solution of such PDE can be employed in the

computation of value functions.

One major issue of digital control systems is their limited control rate. The main factors for this limitation

are the available actuation rates (i.e., the rates at which the actuators can receive new references) and available

computing power (which may limit the frequency at which the estimation and control algorithms are executed).

One common approach for the design of digital control systems for continuous-time systems is to neglect the

control rate during the design of the control law, i.e., to assume a pure continuous-time system, and then to design

the computational system in a way that it may allow a control rate that does not hamper the expected performance

of the control law. This may lead to an economically infeasible designs or, on the other hand, it may lead the

designer to accept a performance worse than it could be obtained should the control rate be taken into account

in the design of the control law. Moreover, in certain situations, this may lead to catastrophic results. Figure 1.1

shows one such scenario. The rectangle represents an obstacle that must be avoided by the system. The initial

state of the system is inside the cell with vertices xi−1, j−1,xi, j−1,xi−1, j,xi, j. If a sufficiently high control rate

2

Page 11: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

xi, j

xi−1, j

xi, j−1

Figure 1.1: Different trajectories generated by different sampling rates. The star marks the target and the shaded

rectangle is an obstacle to avoid.

is considered, the finite difference scheme is able to generate a trajectory that successfully avoids the obstacle

(composed of the smaller dashed arrows). However, if a lower control rate is used, the system will eventually

collide with the obstacle. This happens because, due to sampling and hold mechanism, the same initial optimal

control is applied during the entire duration of the larger control period.

This sampled data control problem has been identified and approached in the literature (see, e.g., [71] and

references therein). One of the goals of this research is to develop methods that explicitly take into account the

finite control rate at the design phase.

1.1.1 Two mode strategy

During this research, both infinite horizon differential games and target problems will be considered. This is

motivated by the following control strategy.

Let us consider systems with smooth dynamics described by x = f (x,a,b) where x ∈ Rn is the state vector, a

is a bounded control input and b is a bounded adversarial input modelling the effect of disturbances and model

uncertainty. It is desirable that the system verifies x ∈ R whenever possible. The set R models the maximum

tolerance for system operation. The control design is driven by the set of state constraints defined by x ∈R and

also by a given running cost for regular operation. By regular operation we mean that the vehicle is satisfying the

state constraints.

In practice, it cannot be assumed that the system will always meet the state constraints: unforeseen distur-

bances may drive the system outside R; the system may start outside R; even if the system starts inside R, there

may be no trajectory inside R for all the desired time horizon.

The above mentioned problems are related to the concept of invariance. Invariance analysis is a research topic

that has been and still is extensively studied by the community. We consider the definition of Maximal Robust

Controlled Invariant Set (MRCIS) from [18] but we remark that similar definitions exist (e.g., discriminating

kernel [7]). In simple terms, the Maximal Controlled Invariant Set (MCIS) is the set of states from which a system

is able to satisfy the state constraints afterwards. The definition of MCIS does not consider disturbances. The

MRCIS is the set of states from which a system is able to satisfy the state constraints afterwards, even when

subject to some set of disturbances.

Since the given R is not necessarily invariant, we define the first problem.

Problem 1. Find the MRCIS S ⊂R.

In order to meet the state constraints at all times, the system’s state must be inside S . Whenever the system

is outside S , we opt to bring the system to the interior of S in minimal time. In order to accomplish this, two

additional problems are considered:

Problem 2. Derive a control law such that if the system is outside S then it should drive the system into S as

soon as possible.

Problem 3. Derive a control law such that if the system is inside S then it should make the system stay inside

S while minimizing the time integral of a given smooth function L(x,a)≥ 0 (the running cost) over a large time.

The running cost is assumed to be bounded for x ∈R and the optimization horizon is assumed to be much larger

that the time for the system to reach an equilibrium or steady periodic behaviour.

3

Page 12: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

More formally, Problem 2 consists of finding a control law fmt(x) such that, for any given initial condition

x(t0) = x0 6∈S , fmt(x0) = a(t0), where a(.) is the optimal control of an unconstrained minimum time problem

with boundary conditions x(t0) = x0 and x(t f ) ∈ S . Problem 3 is solved using a constrained infinite horizon

differential game. Therefore, problem 3 consists of finding a control law fS(x) such that, for any given initial

condition x(t0) = x0 ∈S , fS(x0) = a(t0), where a(.) is the optimal control of a differential game with running

cost L(x,a,b), terminal cost Ψ(x) = 0, t f = 0, t0→−∞ and state constraints defined by x ∈R (or x ∈S , which

leads to the same solution, as will be discussed later).

1.2 Summary of preliminary work

The work so far has focused on the development of a numerical approach for the implementation of the proposed

control strategy. We briefly describe the achievements accomplished so far.

We explored the subject of computing the numerical solution of undiscounted (non-vanishing running cost)

constrained infinite horizon problems. Specially for the case of differential games, no general results seem to exist.

Theoretical results do exist under certain conditions (e.g., [13, Chapter VII], [64], [12]) but practical applications

are not common. Under some controllability assumptions and for some classes of running cost, it can be shown

that V (t,x)− (V (x)− ct)→ 0 as t→−∞, where V (t,x) : (−∞,0]×Rn→ R is the solution of the infinite horizon

problem and c is the average cost of the of the optimal trajectories in the large-time (the ergodic cost). We follow

the approach of solving the infinite horizon problem as a finite horizon problem with increasing horizon. Each

iteration of the numerical algorithm corresponds to a step backward in time. On each iteration, the algorithm

checks whether the current solution provides a suitable approximation of the asymptotic solution V (x)− ct. For

the envisaged applications, the ergodic cost is assumed to be non-zero and no a priori knowledge is assumed

concerning the large time behaviour. In general, the optimal trajectories may not converge to the minimizer of

the value function, which precludes recasting this problem as a target problem. We have some results with level

set methods [90] but we found the semi-Lagrangian discretization of the dynamic programming equation more

suitable for our approach (see section 3.2).

The typical semi-Lagrangian (SL) solver based on first order interpolation leads to sub-optimal optimal control

laws for non-smooth value functions, with excessive or anticipated action near the non-smooth regions. Moreover,

the computed MCIS is an over-approximation of the exact MCIS. These effects can be minimized by refining

the grid near those regions. However, memory constraints may sometimes prevent such approach. The alterna-

tive technique of avoiding cells near the boundary of the computed MCIS may lead to over-conservative sub-

approximations of the MCIS. We propose a new algorithm for the finite and infinite horizon optimal control

problems. This algorithm is based on the SL scheme from [30] and uses some elements from receding horizon

control. The proposed algorithm gives an accurate sub-approximation of the MCIS up to the grid resolution. The

smearing introduced by the discontinuity at the boundary of the MCIS can be adjusted by tuning the algorithm

parameters.

We use the SL scheme as an explicit implementation of the discrete-time dynamic programming principle

where the time-step of the SL scheme corresponds to the control rate. Therefore, the control rate is taken into

account in a straightforward way during during the process of deriving the control law. The main requirement

for this approach is the employment of an adequate ODE solver. This approach is verified with an instance of

the discrete-time LQR problem. We observe a close correspondence between the control law obtained by the

numerical scheme and the one obtained by the solution of the algebraic Riccati equation.

We apply the current developments on an engineering application, namely the motion control of one of the

Autonomous Underwater Vehicles (AUV) developed at the University of Porto. These AUV are equipped with a

low cost embedded computer that runs all the navigation and control software. Therefore, the AUV can perform

the planned missions without any remote intervention; nowadays, even the deployment, retrieval and recharging

operations are being automated in order to keep human intervention at a minimal level. Therefore, we found it

important to use a control methodology that provides a systematic way to optimize the system operation taking

into account a large time horizon. One such example is the trade-off between tracking accuracy and hardware

wear. The implemented controller consists of the following components [90]: the controlled invariant set, the

minimum time feedback control law, the feedback control law associated to the infinite horizon problem and the

switching logic. The AUV is modelled as a 4-dimensional system of first order nonlinear differential equations.

The controller was analysed in simulation and it was already used in field operation. We are not aware of any

application of this kind in the literature.

Robotic problems typically deal with periodic (e.g., joint angles), bounded (e.g., hydraulic cylinder) and or

4

Page 13: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

unbounded (e.g., distance to destination) state variables. Unbounded state variables, or even state variables with

a large range, pose a great difficulty in the discretization of the state space. However, it is not computationally

feasible to use a grid covering an unbounded set. However, by exploring the structure of the solution, that problem

can be circumvented in certain situations thus allowing the implementation of a numerical control law for the

unbounded working space. This is done by inspection of the value function computed for a sufficiently large

bounded set. We use this approach in the above mentioned application.

Some of the material presented on chapters 3 and 4 was already published, in [89], [90] and [32].

1.3 Outline

Chapter 2 presents a brief survey of the dynamic programming equations and related numerical methods. Chapter 3

describes preliminary work regarding the numerical synthesis of the proposed control strategy. Chapter 4 describes

the application of the control design methodology to the motion control of an AUV. Some experimental results are

presented. Chapter 5 describes the topics for future research.

5

Page 14: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

6

Page 15: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Chapter 2

Background

2.1 Dynamic Programming: a brief introduction

Dynamic programming is a research topic introduced by Richard Bellman during the decade of 1950 [16], mo-

tivated by the problems of multistage decision processes. The basic idea behind dynamic programming (DP) is

presented in the form of the principle of optimality, originally stated as follows [16, p. 83]: “An optimal policy has

the property that, whatever the initial state and optimal first decision may be, the remaining decisions constitute

an optimal policy with regard to the state resulting from the first decision.” Another way to interpret the principle

of optimality is as follows: if the trajectory segments xab(.) : [t0, t1]→ Rn and xbc(.) : [t1, t2]→ R

n compose the

optimal trajectory from a to c, then the segment xbc(.) is also the optimal trajectory from b to c. In control theory,

the most well know form of the principle of optimality (also known as dynamic programming principle) is the

Hamilton-Jacobi-Bellman (HJB) partial differential equation (PDE), to be presented shortly.

Dynamic programming is based on the concept of value function. For a given dynamic optimization problem,

the value function V (t,x) gives the cost1 of the optimal trajectory starting from state x at time t. Bellman showed

that the value function, whenever smooth, can be computed as the solution of an HJB PDE. In [55], similar results

were proved for systems with competing inputs; this work introduced the Hamilton-Jacobi-Isaacs (HJI) PDE and

defined the field of differential games.

The lack of a precise or elegant definition of solution for the non-smooth case was an open challenge during

several years. The most successful solution to this challenge was the theory of viscosity solutions, first formalized

in [29] and a major break-trough for the development of theory and applications based on the HJB and HJI PDE.

2.2 Dynamic programming for deterministic systems continuous-time sys-

tems

Consider the system described by

x(t) = f (x(t),a(t)) (2.1)

and the cost functional

J(t0,x0,a) = Ψ(x(t f ))+∫ t f

t0

L(y(x0,τ ,a),a(τ))dτ (2.2)

where y(x, t,a) is the state of the system at time t when subject to the input sequence a(.) and initial state x(0) = x;

L(x,a) and Ψ(x) are the running and terminal costs, respectively.

Consider the optimal control problem (OCP)

mina∈Ua

J(t0,x0,a) (2.3)

1The term cost is associated to minimization problems; on his seminal document, Bellman considers a maximization problem and employs

the term return.

7

Page 16: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

subject to:

a(.) ∈Ua (2.4)

x(t) ∈ X (2.5)

where Ua is the set of measurable input sequences such that a(t)∈Ua, where Ua is a compact set; and (2.5) defines

the state constraints. For the infinite and finite horizon control problems, we set t f = 0 and define t0 ∈ (−∞,0]. We

consider only the case of bounded inputs since for most engineering systems unbounded inputs are not physically

meaningful.

Consider also the following variation of the OCP presented above where

t f = mint : x(t) ∈T (2.6)

and T is a given target set. This is know as the optimal cost to reach problem or simply as target problem. This

version of the OCP has a static solution, in the sense that it not depends on t0. In that case, the cost functional can

be defined as J(x,a).The fundamental object of the dynamic programming framework is the value function [16]. The value function

associated to the OCP (2.3) is defined as follows:

V (t0,x0) = infa∈Ua

J(t0,x0,a) (2.7)

For the static OCP the value function is independent of t and it is defined as V (x0) = infa∈UaJ(x0,a).

One of the main appeals of the dynamic programming approach resides in the fact that, given the static value

function V (x), the optimal control can be determined in state feedback form fc(x) = a∗(0) where a(.) is given by

a∗ ∈ argmina∈Ua

∫ ∆

0L(y(x,τ ,a),a(τ))dτ +V (y(x,∆,a))

(2.8)

with ∆ ∈ (0, t f − t0]. Notice that (2.8) assumes a static value function. The same idea can be applied to time-

dependent value functions, leading to a time-dependent control law. However, the dependence on the time variable

is undesirable, specially if the objective is to optimize the performance of the system for an undefined period of

time or, ultimately, for the infinite horizon case. Since, in general, numerical solutions are employed, the extra

dimension incurred by the time variable would lead to increased computational requirements. That approach will

not be considered here further.

In what concerns the infinite horizon OCP (IHOCP), we remark that, for a controllable system under suitable

cost, V (t,x) converges to Vih(x)− ct as t →−∞, where c is the ergodic cost; in those cases, the time-invariant

Vih(x) is used in (2.8) in order to compute the optimal control.

The optimal control law for a system with a sample and hold scheme can be described by the discrete time

version of (2.8):

fc(x) ∈ argmina∈Ua

∫ ∆

0L(y∆(x,τ ,a),a)dτ +V (y∆(x,∆,a))

(2.9)

where y∆(x, t,a) = y(x, t, [0,∆]× a). Notice that, in spite of the discrete-time designation, the continuous-time

system model is still employed.

Of course, in order to apply either (2.8) or (2.9) the value function must be computed first. The dynamic

programming approach defines the value function in a constructive fashion, the so called dynamic programming

principle (DPP). The DPP for discrete time systems is expressed as follows:

V (t,x) = infa∈Ua

∫ ∆

0L(y∆(x,τ ,a),a)dτ +V (t +∆,y∆(x,∆,a))

,x 6∈T , t < t f (2.10)

For the time-dependent case we define the terminal condition

V (t f ,x) = Ψ(x) (2.11)

while for the static case we define the boundary condition

V (x) = Ψ(x),x ∈T (2.12)

8

Page 17: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

As ∆ goes to zero, V (t,x) is a viscosity solution of the Hamilton-Jacobi-Bellman (HJB) partial differential

equation (PDE) (see, e.g., [46] or [13]):

∂ tV (t,x)+H(∇V (t,x),x) = 0 (2.13)

where

H(p,x) = infa∈Ua

p · f (x,a)+L(x,a) (2.14)

is the Hamiltonian. The notion of viscosity solution must be introduced because, in general, the (classical) gradient

∇V (t,x) may be undefined in some regions of the state space. The viscosity solution, introduced in [29], is a type

of weak (generalized) solution that corresponds to the “physical” solution of the dynamic optimization problem;

accordingly, the HJB PDE formulation assumes some form of compatible generalized gradient. This form of the

DPP is more amenable to mathematical analysis. However, it does not take into account effect of the limited

control rate (at least in a straightforward way, as opposed to the discrete-time DPP).

In general, it is not possible to find an analytical expression for the value function. The DPP, either in the

form of (2.10) or as an HJB PDE (2.13), is explored to develop numerical solvers for the computation of the value

function.

The dynamic programming formulation can be extended to encompass the existence of adversarial inputs,

leading to the framework of differential games [55, 58]. The dynamical system is described by

x(t) = f (x(t),a(t),b(t)) (2.15)

with a(.) defined as before and b(.) ∈Ub ≡ b : [t0, t f ]→Ub,measurable, where Ub is a compact set.

Adversarial inputs are useful to encompass the effects of disturbances and model uncertainty. Intuitively, at

each instant, the adversarial input b will try to drive the system to the worst possible result (i.e., maximum cost)

while the control input a will do the opposite. In this setting, the optimal control problem (OCP) is defined as

follows:

supβ∈∆b

infa∈Ua

J(t0,x0,a,β [a]) (2.16)

where

J(t0,x0,a,b) = Ψ(x(t f ))+

∫ t f

t0

L(y(x0,τ ,a,b),a(τ),b(τ))dτ (2.17)

and ∆b = β : Ua→Ub : t > 0,a(s)= a(s)∀s≤ t⇒ β [a](s)= β [a](s)∀s≤ t is the set of nonanticipating strategies

for the adversarial input b. The concept of nonanticipating (or causal) strategy, introduced in [38] for differential

games, is necessary so that the corresponding value function may comply with the the dynamic programming

principle. The choice of the sequence supinf in (2.16) is not casual. This formulation corresponds to the so called

upper value of the game. This formulation gives informational advantage to the maximizing input (adversary), in

the sense that it is assumed that, besides the current state of the system, the adversary knows the current choice for

input a at the time it chooses b. The lower value solution can be defined in an analogous way. Since we want to

model the worst case effect of disturbances we prefer to follow a conservative approach and give the informational

advantage to the adversarial input and therefore use the upper value solution. Theses ideas become more clear in

the formulation of the dynamic programming principle for differential games.

The discrete-time DPP corresponding to the lower value solution of the differential game is the following:

V (t,x) = infa∈Ua

supb∈Ub

∫ ∆

0L(y∆(x,τ ,a,b),a,b)dτ +V (t +∆,y∆(x,∆,a,b))

(2.18)

The feedback control law is given by

fc(x) ∈ argmina∈Ua

maxb∈Ub

∫ ∆

0L(y∆(x,τ ,a,b),a,b)dτ +V (y∆(x,∆,a,b))

(2.19)

We assume systems such that V (x) can be obtained as described before for the single player case. The right hand

of (2.18) can be seen as a static game; since this game may not have a value, the infsup order cannot be chosen

9

Page 18: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

arbitrarily (even disregarding the fact that it must comply to the formulation of the OCP). As stated above, this

formulation leads to a conservative control law: minimize the maximal cost that can be forced by the adversarial

input, taking into account that the adversarial input will know the choice of the control input; this corresponds to

choosing, among all possible actions, the action for which the maximum cost that can be imposed by the adversary

is minimal.

The value function corresponding to this problem has also been shown to be a viscosity solution of a first order

PDE, the Hamilton-Jacobi-Isaacs (HJI) PDE (see [13] and references therein). The corresponding Hamiltonian is

H(p,x) = infa∈Ua

supb∈Ub

p · f (x,a,b)+L(x,a,b) (2.20)

2.3 Receding horizon control

The receding horizon control approach, also known as Model Predictive Control (MPC), has been extensively

researched for the design of feedback controllers (see, e.g., [23]). Conceptually (the usual formulation assumes a

discrete time model), the objective for MPC is to compute a state feedback control law fc(x) = a∗(0) where

a∗ ∈ argmina∈Ua

∫ ∆

0L(y(x,τ ,a),a(τ))dτ +F(y(x,∆,a))

(2.21)

and F(x) is a given terminal cost. On classical MPC implementations, the right hand side of (2.21) is computed

at each control instant. Notice that (2.21) is a relaxation of the IHOCP. The main idea is to replace the IHOCP

value function V (x) by a given terminal cost function F(x), thus avoiding the costly computation of V (x). In order

to compensate for this, a sufficiently large optimization horizon ∆ must be considered at each control instant.

Therefore, the choice of ∆ is a compromise between the deviation from the optimal solution and the complexity of

the computations. Additionally, in general, the stability of the derived control law depends on the choice of F(x).Notice, that in the limit case of ∆ = 0, F(x) must be a control Lyapunov function (in the sense of [93]) in order to

ensure asymptotic stability of the closed loop control system.

In general, the right hand side of (2.21) is solved using numerical techniques. Such computation can be

infeasible for some applications. Explicit MPC (eMPC) [5] was developed in order to cope with that problem;

on eMPC, the RHS of (2.21) is computed off-line for a discrete set of states and stored in the control system.

Like dynamic programming approaches, eMPC is subject to the curse of dimensionality. Dynamic programming

techniques have also been used for the solution of the MPC problem (see, e.g., [28]).

The min-max approach for the design of robust optimal state feedback control laws for nonlinear systems has

been also employed in the context of MPC. For instance, [49] employ multi-parametric nonlinear programming

in the context of explicit nonlinear MPC; see [5] for a survey on explicit MPC and the connections between the

optimal control formulation and the parametric optimization problem for piecewise affine systems.

2.4 Numerical methods for the dynamic programming equations

Numerical computation of the value function is deeply connected with numerical methods for PDE. This is not

surprising since, as seen previously, the value function associated to an OCP can be computed as the viscosity

solution of a suitable HJB or HJI PDE. On the other hand, applications for numerical methods for PDE can

be found in other fields besides control. Front propagation problems, arising, e.g., from the fields of physics

and computer vision, can also be described by PDE. In a certain sense, the computation of the value function

is analogous to a front propagation problem. For instance, in the target problem, the boundary condition would

correspond to the initial description of the front. The increasing sequence of level sets of the value function would

then correspond to different positions of the front over time, i.e., they would correspond to the propagation of the

initial front (i.e., the boundary condition). As it will be shown, this analogy has been used in order to extend front

propagation (or moving interface) algorithms to the solution of the HJB and HJI PDE.

Numerical schemes for the solution of PDE may be roughly divided into Eulerian, Lagrangian and semi-

Lagrangian

In a Lagrangian scheme, the front is tracked by elements (also called markers or particles) that move along

with it. For instance, in a two dimensional domain, an interface could be approximated by a connected set of

segments. The front propagation would then be given by computing the trajectories of the markers, i.e., of the

10

Page 19: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

segments’ endpoints. The Lagrangian formulation, along with the techniques to maintain an accurate description

of the front are collectively referred to as front tracking methods [73, pg. 26].

In an Eulerian scheme, the front evolution is captured (i.e., sampled) at fixed grid points2. The type of grid

may range from the Cartesian grid to unstructured grids. In practice, the computation is confined to a bounded

partition of the state space. Notice that for a state space of dimension n with a regular grid of k points per

dimension, the memory requirements are on the order of N = kn. In general, these schemes involve the numerical

approximation the spatial derivatives by finite-difference methods. Moreover, in order to ensure stability, the time

step for the numerical integration (corresponding to the time discretization) may have to take into account the

Courant-Friedrichs-Lewy (CFL) condition. The CFL condition imposes an upper bound to the ratio between the

maximum propagation during a time step and the grid resolution (the Courant number, ∆t ∑|vi|∆xi

, where vi is the

maximum velocity for dimension i - e.g., f (x,u∗) for a controlled system - and ∆xi is the grid spacing along

dimension i); in simpler terms, the higher the resolution of the grid, the smaller the integration time step will

be required to be and hence more costly the computation will be. For a first order explicit time discretization,

the scheme must comply with the CFL condition and, additionally, use upwind finite differences or add artificial

dissipation (using, for instance, the Lax-Friedrichs scheme) in order to ensure numerical stability. The introduction

of artificial dissipation to compensate for the lack of an upwind schemes has a negative impact on numerical

accuracy in the case of nonsmooth solutions, causing increased the numerical diffusion. On the other hand,

implementation of dimension independent upwind schemes may be a complex task, specially for the HJB and HJI

PDE where, due to the local optimization in the Hamiltonian, the choice of the velocity vector depends on the

choice of the one-sided derivative along each dimension.

Semi-Lagrangian (SL) schemes are also grid based (Eulerian component). However, instead of computing the

discrete approximation of the spatial derivatives, they evaluate the system trajectories emanating from the grid

nodes during a given time-step (Lagrangian component). In general, the endpoint of these trajectories will not

coincide with any grid node and interpolation methods must be used. In a certain sense, in terms of computational

burden, SL schemes replace the computation of the spatial derivatives by the interpolation procedure. The stability

of SL schemes does not depend on the CFL condition. This allows a more flexible trade-off between accuracy

and convergence rate. However, special care has to be taken in the numerical computation of the trajectories.

For large time steps, a simple Euler scheme may introduce significant errors; therefore, a more accurate ODE

solver may be required. Moreover, for SL schemes, the local optimization, analogous to the evaluation of the

Hamiltonian in finite difference schemes, becomes harder. Notice that, while for finite difference schemes the

local optimization only depends on the spatial derivatives and on the velocity equation (the system dynamics, for

control systems), SL schemes deal with the integration of the running cost along the trajectories (which must be

computed either analytically or numerically) and interpolation of the current solution of the PDE at the endpoints

of the trajectories. Only in special cases it is possible to obtain a closed form solution for this local optimization or

even in a form amenable to the application of efficient numerical static optimization techniques, specially for large

Courant numbers; in general, this will be a nonconvex problem, requiring a certain degree of exhaustive search to

avoid local solutions.

Other combinations can be found in the literature. For instance, in [60], the authors describe an algorithm

based on a Lagrangian description of the front. However, on each iteration of front propagation, the algorithm

assumes the existence of an underlying grid. For each grid point, only one point of the front is considered (the

nearest one). In [40], elements of SL and particle methods are combined.

This survey focus mainly on the Eulerian and SL approaches. On both approaches there is the question of

whether to use first or higher order methods. In the former, that applies to the finite difference approximation

of the derivatives; in the later, that applies to the interpolation. Second and higher order methods provide higher

accuracy for the same grid size in exchange of a higher computational cost; however, in order to avoid introducing

oscillations in the solution, the numerical schemes must be able to deal with the potential lack of smoothness of

the solution. Essentially Non-Oscillatory (ENO) and Weighted ENO (WENO) schemes [51, 77] are commonly

used for that purpose. In what concerns the discretization in time, the order of the integration scheme is relevant

for the accuracy of the schemes; for finite differences schemes, it may also contribute with stability [73, pg. 38].

2.4.1 Control-theoretic schemes

In [41], Falcone, based on some previous results, developed a fully discrete iterative (Gauss-Seidel) algorithm

for the numerical solution of deterministic infinite horizon control problems. The algorithm is based on a semi-

2The terms nodes and grid points are used interchangeably

11

Page 20: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Lagrangian scheme. Assuming continuous dynamics, Euler time discretization and first order simplicial inter-

polation, Falcone proved uniqueness of the solution, error bounds and first order rate of monotone convergence

to the viscosity solution (for the discounted case). This scheme was later extended for the target problems and

differential games, both for unconstrained and state constrained scenarios (see [13, 14] and also [30, 42] for more

recent results and surveys). In what concerns implementation aspects, the algorithm was extended for arbitrary

number of dimensions (in [25] the authors use multi-dimensional linear, quadratic and cubic interpolation) and

parallel execution (see, e.g., [44]).

Actually, this scheme computes the solution of the discrete-time dynamic programming equation. Neverthe-

less, as expected (see, e.g., [46, Theorem 8.1]), as the time-step goes to zero the result converges to the solution of

the corresponding HJI PDE. However, the smaller the time-step the smaller the rate of convergence, but, in prac-

tice, the HJI PDE can be solved with good accuracy for reasonably sized time-steps [43]. Notice that, for Courant

numbers less than unity, the scheme is basically solving implicit interpolations. In those cases, an infinite number

of iterations would be required to reach the fixed point. In practice, the machine precision will be reached in a

finite number of iterations and acceptable results can be achieved with even less iterations. An usual approach is

to stop the computation when the L1 or Linf norm of the difference between the solutions of successive iterations is

below a certain threshold. The algorithm can be accelerated by periodically performing iterations with the current

computed controls, in a policy iteration fashion [17], thus avoiding the costly local optimization.

In [24], the authors describe numerical algorithms based on the concepts of viability theory [7]. The authors

present algorithms for the following problems: over-approximation of the viability and discriminating kernels

(maximal controlled invariant set and maximal robust controlled invariant set); target problem and associated

capture basin (backward reachable set). The numerical schemes can be characterized as SL with zero order

interpolation. In what concerns the zero order interpolation, the algorithms consider all points in a neighbourhood

of the trajectory’s endpoint and pick the value of the optimizer of the update expression. This procedure is used

in order to ensure that the algorithms compute over-approximations of the solution. However, specially for the

target problem, such interpolation scheme introduces severe accuracy errors. Therefore, the authors describe a

grid refinement procedure and show convergence to the exact solution as the grid spacing goes to zero.

2.4.2 Level set methods

Level Set Methods (LSM) were first formally presented in [75] to compute the motion of a hypersurface Γ(t)⊂Rn

(also designated as interface or front). An example of a hypersurface, in R3, is the surface of a sphere. LSM

have many application, specially in image analysis, computer vision and computational physics, as can be seen

in [85], [73] and [76].

LSM are specially adequate for problems that otherwise would involve discontinuous solutions, such as some

instances of the target problem. When using the narrow band technique, which consists of solving the PDE only in

the neighbourhood of the level set at each iteration, very efficient implementations can be obtained. Moreover, for

a target problem, LSM compute the approximate solution for each node of the grid in a finite number of iterations

(if this value is finite). Of course, the accuracy will depend on the Courant number (the accuracy increases as

the Courant number decreases) and grid resolution. The application of LSM for the finite and infinite horizon

problems seems to be less common (in fact, the author is not aware of any previous publication concerning this

subject). One possible reason is the fact that, for smooth dynamics, the value function associated to such problems

is continuous, unless state constraints are considered. Nevertheless, LSM methods can be used for the solution of

this problem by considering an augmented system with time as an extra dimension, i.e., with state space defined

by [0,T ]×X where X is the state space of the original system, and boundary condition T ×X. We used this

approach in [90].

The main idea of LSM is to embeds Γ in a nicer function (the implicit function) φ(t,x),x ∈ Rn; by nicer,

it is meant that φ(t,x) will be at least Lipschitz continuous in x. It is assumed that Γ divides the domain into

separate subdomains, such that it is possible to define its interior, Γ− and its exterior, Γ+. Moreover, φ(t,x) has

the following property:

φ(t,x)< 0,x ∈ Γ−(t)

φ(t,x) = 0,x ∈ Γ(t)

φ(t,x)> 0,x ∈ Γ+(t)

The front is then given by the zero level sets of φ :

Γ(t) = x : φ(t,x) = 0 (2.22)

12

Page 21: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

If we take the time derivatives of each member of φ(t,x) = 0 we get the convection equation:

φt(t,x)+∇φ(t,x) · f (t,x) = 0 (2.23)

with f (t,x)= dx(t)dt

and φ(0,x) given as initial condition. One common approach is to consider φ(0,x) as the signed

distance to Γ (distance is negative if x is inside Γ) for t = 0. Nevertheless, even with smooth initial conditions,

φ(t,x) may develop kinks (i.e., discontinuities in the first derivative). Therefore, LSM also rely on the concept

of viscosity solution to pick up the correct solution and, as stated before, the computation of the derivatives must

take into account this lack of smoothness.

In [65] (see also [67] for current developments), a toolbox for the numerical solution of the HJI PDE was

developed using LSM. An implementation of this toolbox for MatlabTM, the toolboxLS, is publicly available.

The options and several scenarios of application of the toolbox are described in detail in [66]. The toolbox can

deal with continuous nonlinear dynamics. The toolbox provides Total Variation Diminishing (TVD) integration

routines and dimension independent ENO/WENO spatial derivative schemes of varying order. Therefore the user

may choose a trade-off between the accuracy and computation time. In [69], the authors describe the application

of the toolbox to reachability analysis of continuous-time systems. The toolbox can also be used on the numerical

solution of the target problem (see, e.g., [68] for the double integrator system). On both cases, the user is required

to introduce the expression of the Hamiltonian (either in closed form, if possible, or in an algorithmic way). For

this type of problems, the current version of the toolbox uses central differencing ENO/WENO schemes for the

spatial derivatives and the Lax-Friedrichs scheme for the numerical approximation of the Hamiltonian. State

constraints can also be handled by the toolbox. Notice that, for the HJB PDE, the system flow (velocity) is chosen

from a given set in order to minimize ∇φ(t,x) · f (t,x). As seen before, for control systems, the trajectories obey

to dynamic constraints of the form x = f (t,x,a) where a can be chosen from a given set Ua; that can be written

as a differential inclusion x ∈A (x)≡ f (t,x,a) : a ∈Ua). Therefore, the level set equation for the time optimal

problem becomes

φt(t,x)+ infa∈A (x)

∇φ(t,x)a = 0 (2.24)

With similar considerations to the ones discussed previously in the formulation of differential games, the level set

equation for the time optimal differential game becomes

φt(t,x)+ infa∈Ua

supb∈Ub

∇φ(t,x) f (t,x,a,b) = 0 (2.25)

This formulation assumes an unity running cost. For the autonomous case, other running costs can be considered

by using the modified flow function f (x,a,b)/L(x,a,b) and taking t as cost.

In [19], the authors describe an algorithm for front propagation based on a nonlinear discretization scheme

(Ultra Bee). The Ultra Bee scheme is known to have good anti-diffusive properties. The algorithm is not a LSM

in the classical sense, because the implicit function is a discontinuous function, of the form:

φ(t,x) =−1,x ∈ Γ(t)∪Γ−

φ(t,x) = 1,x ∈ Γ+(t)

Instead of tracking φ(t,x) = 0, this algorithm tracks the changes of φ(t,x) from 1 to −1. The authors describe

an efficient dimension independent implementation, using a narrow band approach. The scheme is at most first

order, but numerical experiments showed that it tracks the front propagation much more accurately than a classical

Runge Kutta-ENO (both second order) LSM without reinitialization.

Semi-Lagrangian numerical schemes for LSM can also be found in the literature. This class of schemes is not

subject to the CFL condition. For instance, in [40] the authors combine a semi-Lagrangian scheme with a particle

based method; the particle method is used in order to support the detection and partial correction of distortions on

the front propagation. The authors use first order interpolation methods in order to ensure unconditional stability

and still claim acceptable levels of numerical diffusion.

In what concerns the choice between SL and finite difference schemes as building blocks for LSM, the same

considerations discussed previously for the general schemes apply. Therefore, no easy answer exist for this ques-

tion.

13

Page 22: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

2.4.3 Efficient methods for boundary value problems

Let us start by showing how level set methods may be related to stationary boundary value problems. Consider, for

instance, V (x) as the minimum time to reach x starting from ∂Ω. In the context of LSM, the position of the front

at time t is given by Γ(t) = x : V (x) = t, with Γ(0) = ∂Ω. As it was done for the general LSM, if we take the

derivatives of both members of V (x) = t we obtain ∇V (x)x = 1. When studying the motion of the hypersurfaces,

it is usual to consider the velocity normal to the front:

fn(t,x) = f (t,x) ·∇φ(t,x)

‖∇φ(t,x)‖2

where ‖.‖2 is the Euclidean norm. Therefore, after considering the boundary condition we obtain

‖∇V (x)‖2 fn(x) = 1,∀x ∈Ω (2.26)

V (x) = 0,∀x ∈ ∂Ω (2.27)

with x ∈ Rn and Ω ⊂ R

n. Equation (2.26) is designated as Eikonal equation. For physical systems, this equation

may be used to describe the propagation of a front on an isotropic medium, i.e., a medium where the propagation

speed is the same on all directions. The propagation speed fn(x) is assumed to be always positive (front expansion)

or always negative (front contraction). Without loss of generality, we will assume the former.

As for the general formulation of LSM, this formulation is of little interest for control applications, unless

we assume systems of the form x ∈ Bn0(r), where Bn

0(r) is a hypersphere of radius r centred on the origin of Rn.

However, in order to explain the basic idea of the algorithms, we will consider this simple case for the time being.

The structure of the Eikonal equation allows the implementation of very efficient algorithms. This is because

the characteristic lines of this PDE coincide with the gradient of the solution. One such class of algorithms was

first developed in [98], based on the Dijkstra’s algorithm [36] (a brief description of the Dijkstra’s algorithm is

presented on Appendix A.1). A similar class of algorithms was developed independently by the Level Set commu-

nity and coined Fast Marching Method (FMM) [83, 84]. The algorithms compute the solution in a “single pass”

(i.e., each node is visited at most r times, where r is the number of its neighbours) and in a monotone fashion,

resembling the propagation of a wave. The Dijkstra’s algorithm, the Tsitsiklis’s algorithm and the FMM are fun-

damentally dynamic programming algorithms differing only in the way that the values of new points are computed

(update procedure), which in the later two cases is suitable for continuous domains. This difference is of central

importance. In order to appreciate that fact, consider the problem of finding the minimum distance from the origin

to a point (x,y) in R2 for a fully actuated particle assuming an obstacle-free environment. Assuming an orthogonal

grid, and independently of how fine is the considered discretization of the domain, the Dijkstra’s algorithm will

always produce the wrong result |x|+ |y| (the Manhattan norm), as opposed to the expected Euclidean norm. On

the other hand, both Tsitsiklis’s algorithm and the FMM converge to the exact solution.

The update procedure in [98] differs from the one used in the general FMM. In the former, the update function

performs a local minimization over all points of the convex combination of the neighbouring points (i.e., interme-

diate points are considered), in a semi-Lagrangian fashion. The later case use an Eulerian scheme; only grid points

are considered, and the value of each point is computed using an upwind difference scheme (see Appendices A.2

and A.3 for examples of both cases).

The computational complexity of these methods is O(N log(N)), where N is the number of nodes and n is the

dimension of the domain. The log(N) term is due to the heap-sort data structure used to store the ordered list of

considered nodes. In [102] (see also [79]), the authors develop a FMM with O(N) complexity. They achieve that

by exchanging the heap-sort data structure with the untidy priority queue, which has O(1) complexity on insert

and remove operations.

Fast Sweeping Method

The fast sweeping method (FSM) is a numerical method with O(N) computational complexity. It was first pre-

sented under this designation in [97] (see also [78] for more recent work and additional references). As opposed

to the FMM, it is an iterative method. However, for some problems, the number of iterations is independent of the

grid size, hence the claimed O(N) computational complexity. The designation of the method comes from the fact

that each iteration consists of 2n sweeps of the defined grid, where n is the dimension of the problem. One sweep

consists of visiting and updating each grid point in a given order. The idea is to change the travel direction one

2a characteristic line can be interpreted as the optimal flow of a given particle

14

Page 23: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

dimension at a time until the 2n combinations are performed (e.g., for a one dimensional problem defined in [a,b]the sweeps would be from a to b and then from b to a). The local update scheme can be chosen as it most suits

the problem at hand: upwind finite-differences, finite-differences with artificial dissipation and semi-Lagrangian

schemes can be found in the literature. In [9], the authors propose an extension for the FSM, the locking method,

that avoids the processing of nodes whose value would not be changed by the update procedure in the current

sweep.

Obviously, the O(N) complexity does not imply that FSM will always behave better than other methods with

higher computational complexity. First, for problems of practical size, the constant factors not explicitly shown

in the O() classification play a determinant role in the computation time. Moreover, depending on the complexity

of the characteristic lines, the number of required iterations in order to find the exact solution may vary from one

to infinity. FSM have very good results for the Eikonal equation; the basic equation is solved in one iteration and

related problems are solved in a finite number of iteration (see, e.g., [62])). However, in the general case, the

method may not converge to the exact solution in a finite number of iterations.

In [103], the author shows several avenues for parallel implementations of FSM. One obvious approach is to

assign each sweep to a different processor. At the end of this first phase we will have 2n solutions, corresponding to

the 2n sweeps. The second and final phase of the iteration will consist of taking the minimum of the 2n solutions on

each grid point. This can also be done in parallel, by domain decomposition. The decomposition is straightforward

because no overlapping or communication between subdomains is required. However, it can be seen that, in the

sweeping phase, this parallel implementation will only take advantage of 2n processors at most. The authors do not

explore the fact that, in some upwind schemes, the local computation can explore further parallelism; in practice,

some of the max and min in the formulas correspond to different computation branches that could be be done in

parallel.

In order to further improve parallelism, domain decomposition must be applied in the sweeping phase. In

this case, communication between subdomains must occur. This is done by defining “thin” boundaries between

the subdomains. The nodes on those boundaries will have tentative values from each domain. In the second

phase, the minimum is taken and that value is used in the following iterations, again on each subdomain. This

way, information propagates naturally between subdomains. The authors base their results only on the Eikonal

equation. Numerical results confirm that the greater the number of subdomains the greater the number of required

iterations. However, in spite of the increase in the number of iterations, significant speedups may be achieved

(e.g., with 25 subdomains, the number of required iterations increased by a factor of 3 in the worst case example).

Anisotropic problems

When the system flow is given by x∈A (x) and A (x) is not an origin-centred hypersphere for some x, the problem

is designated as anisotropic. In this case, the maximum speed may depend not only on the position but also on the

direction of the path. Most control problems are anisotropic; for instance, the computation of minimum time paths

for the unicycle, with model (x1, x2, x3)∈A (x) with A (x) = (cos(x3),sin(x3),u) : u∈ [−1,1], is an anisotropic

problem.

The single pass methods described in the previous section were devised for isotropic problems. They can only

be applied to special cases of anisotropic problems. In [88], the authors show that those algorithms fail when the

characteristic direction is not in the same simplex used to compute the gradient of the solution. In [6], the authors

formally define a set of properties that anisotropic problems should meet in order to be treatable by the FMM.

That work is an extension of the results previously published on [74]. Robotics inspired examples are presented

in [6] to shown that FMM can be applied in situations where the Hamiltonian is described by norms other than the

Euclidean norm. However, the assumed underlying dynamics are still described by the trivial relation x∈Bn0(r(x)).

It is also shown that under appropriate linear transformation some anisotropic problems may become suitable for

the FMM. A similar idea had already been exposed in [97]. However, these remarks and advances are still of

limited applicability for general control systems.

The same problem affects FSM developed specifically for the Eikonal equation. Extensions of the local update

scheme were developed in order to handle general convex (e.g., [78] and [57]) and nonconvex Hamiltonians

(e.g., [26]). Nonconvex Hamiltonians may arise, for instance, in differential games. However, in general, in the

anisotropic case the FSM does not offer the same kind of overwhelming results it provides for Eikonal equation.

In [88] the authors present convergence proofs for a new method (presented earlier in [86]) that handles a larger

class of anisotropic problems. Two versions of the new method are described (semi-Lagrangian and fully Eulerian)

and they are designated “Ordered Upwind Methods” (OUM). This method is still classified as single-pass, in the

sense that once a node value is accepted as final, it is never computed again; however, as opposed to the classical

15

Page 24: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

FMM, a node may be evaluated more times than the number of adjacent nodes. The limiting assumption of this

methods is that A (x) must be a bounded compact set with the origin in its interior, for every x in the continuous

state space. Moreover, the performance of the OUM degrades as A (x) deviates from an hypersphere centred at the

origin, i.e., as ϒ = maxA (x)minA (x) (coefficient of anisotropy) deviates from one. This can be understood by the following

analysis of the method. The OUM define the concept of accepted front which is the set of points that define the

boundary between the set of accepted nodes (the nodes with their final value computed) and the set of considered

nodes (the neighbours of the accepted nodes). When computing the value of a node x, instead of using only the

direct neighbours (as in the FMM), all nodes from the accepted front at certain distance of x are considered. The

accepted front is a simplicial surface with vertices composed by all accepted nodes with non-accepted neighbours.

In order to assure the “ordered” acceptance of nodes (i.e., monotone, in the sense of their values) that distance

must be proportional to the coefficient of anisotropy. Hence the degradation of the algorithm as ϒ grows. The

authors describe the application of the method on two-dimensional minimum time problems. However, extension

to higher dimensions seems a difficult task. The main challenge would be the efficient implementation of the

upwinding update formula. This formula must find the trajectory with minimal cost among a set of trajectories

starting from the current point and finishing in the accepted front; as the dimension grows, it will become harder

to find the intersection of the trajectory with the (n-1)-simplices (i.e., simplices with n-1 vertices) of the accepted

front. The OUM have O(ϒ(n−1)N log(N)) computational complexity.

In [100], the author extends OUM in order to handle time-dependent problems without adding an extra dimen-

sion to accommodate the time variable. The author consider both cases: solution when starting from the boundary

(easier, because we have the reference t = 0); and solution when going to boundary (harder, because the time to

reach the boundary is not known a priori).

In [31], the author presents a new method with connections to FMM. The proposed method, the Buffered

Fast Marching Method (BFMM), can handle very general systems dynamics, of the form x(t) = f (x(t),a(t),b(t))with f continuous in both variables, f Lipschitz continuous with respect to x uniformly in a. This method can

handle target problems with a single control input and also target problems in the context of zero-sum two player

differential games. Like in the OUM, the value of the considered nodes may have to be recomputed more times

than the number of adjacent nodes. The method considers a set of intermediate nodes (the buffer between the

accepted set and the narrow band). At each iteration, the node with minimum value in the narrow band is moved

to the buffer (instead of being immediately accepted as having the final value) and its neighbours are sent to the

narrow band (considered nodes). Periodically, the algorithm uses the iterative algorithm from [30] to compute the

value of each node in the buffer, taking the narrow band as temporary boundary condition. That computation is

made twice, but on each computation a different constant value is assigned temporarily to the nodes in the narrow

band (the minimum value in the buffer and infinity). If the computed value of a node in the buffer is not affected

by the value of the narrow band then that node is removed from the buffer and marked as accepted. The efficiency

of this method seems to be very application dependent (for instance, for the lunar landing problem - based on

3 different grid sizes - it seems to be O(N2) while for other two dimensional problems it seems to be O(N))and the author warns that without careful tuning the computation times may be higher than in the basic iterative

scheme from [30]. The author compares the performance of the BFMM against a FSM with the SL local update

scheme from [30]. The results show that the performance differences are problem dependent. For instance, the

BFMM is slower on a problem with high coefficient of anisotropy while the FSM is slower on problems where the

characteristic directions take a “complex” form (e.g., finding the optimal path on a region with lots of obstacles).

As a final remark, for anisotropic static boundary value problems, none of these methods seem to provide

better (or much better) results than the narrow band LSM (even taking into account the extra computational cost

of the reinitialization step for the narrow band LSM). However, this is only a conjecture because, to our best

knowledge, there are no published comparisons between these algorithms.

It must also be remarked that, in spite of their limitations, the single pass methods for isotropic problems

(as also the FSM) have several applications. Even in robotics, some applications can be found (e.g., high-level

path-planning).

2.4.4 Applications to hybrid systems

In [21], the authors extend the FMM to solve optimal control problems on some classes of hybrid systems. One

example is the problem of finding the minimum time path to reach any point in a building with several floors

connected by stairs. The authors model each floor as a discrete state. The traversal of a stair is modelled as a

transition between discrete states with a fixed cost and with guard (necessary condition for the transition) defined

16

Page 25: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

by the physical position of the stair. The extension of the algorithm is straightforward: for each discrete state,

define a discretization of the respective continuous state; on each iteration, propagate the front and check if any

transition may be triggered; if the transition implies a change of the discrete state (the usual case, i.e., unless it is

a self-transition) propagate the front only if the destination is still unvisited.

In [87], the authors apply OUM to the solution of hybrid optimal control problems. The examples assume

anisotropic dynamics in R2. Otherwise, the handling of the hybrid dynamics is performed as in [21]. We highlight

two examples presented there. In the first example, state jumps with a predefined cost are allowed at certain

positions of the state space. This is analogous to the possibility of taking a bus at certain points instead of going

on foot. In the second example, the system may switch once between dynamics (e.g., walking versus skating) by

incurring a predefined cost.

In [56], the authors propose an algorithm, based on the fast sweeping method, to compute the value function

corresponding to the optimal control of systems with autonomous and controlled transitions between continuous

dynamics (no guards are considered for the controlled transitions). Switching costs may be considered and the

running cost may be a function of both continuous and discrete states. However, the described method requires the

Hamiltonian to be convex. No input constraints are assumed and only systems with one-dimensional unconstrained

input are considered. The input value is artificially limited by considering a quadratic penalization in the cost

function. These assumptions are taken in order to simplify the local minimization of the Hamiltonian. A Lax-

Friedrichs scheme is employed in order to obtain a consistent numerical approximation of the Hamiltonian. The

algorithm is illustrated on a model of a physical system (optimal control of hysteresis in smart actuators). The

system can be modelled by the following four state automaton: It can be observed that this example is very alike

a

b

c

d

?

?

?

?

![x2(t)≥ K]

![x2(t)≥ K]

x(t) = f1(x,u) x(t) = f1(x,u)

![x2(t)< K]

![x2(t)< K]

x(t) = f2(x,u) x(t) = f3(x,u)

Figure 2.1: Hybrid automaton modeling the benchmark system of [56].

to switched systems, in the sense that at any moment the system may switch either between states a and c or b and

d. Therefore, the discrete state is a function of the value of the discrete input (analogous to combinatorial logic

in digital systems). This is different from the decision of taking the bus in the examples above, where the event

of taking the bus triggers an irreversible change to a new discrete state. In the bus example, the discrete state is

actually used as a memory element (analogous to sequential logic in digital systems). The autonomous transitions

follow the logic of regional dynamics, i.e., the dynamics change as a function of the current state; once again, this

is combinatorial logic. Under these assumptions, the authors do not have to explicitly encode information about

the discrete state in the value function.

The application of the basic LSM to optimal control of hybrid systems is not straightforward. In [68], the

authors provide some hints regarding the application of the toolboxLS to such problems. However, that is done

in an ad hoc fashion, as the following comment by the authors summarizes: “Hybrid systems are not directly

supported by the toolbox, because there is no consistent modeling language or notation for general nondetermin-

istic, nonlinear hybrid systems. Until such time as one is available, we hope that the example in section 5 and

those in (edited) [66] make clear that analysis of such systems using the toolbox is still quite feasible even if the

discrete dynamics must coded by hand.” The example consists of a stochastic hybrid model, which will not be

analysed here. The toolbox is used for the reachability analysis of a three mode hybrid system in [66], see Fig. 2.2.

The system switches between states sequentially, with the first transition being controlled, and the second one au-

tonomous. The main idea of the solution approach is to propagate the unsafe or safe sets from discrete state to

discrete state, using the so called reach-avoid operator (“set of initial conditions from which trajectories can reach

one set while avoiding other along the way”). Other example of the same methodology can be found in [96].

17

Page 26: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Figure 2.2: Example from [65].

18

Page 27: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Chapter 3

Synthesis of the Feedback Controller

3.1 Solution of the two mode strategy by dynamic programming

In this section we describe the solution approach for the control strategy described in the introduction. Recall that

this approach involves the solution of three problems: computation of the MRCIS, an infinite horizon problem

and a minimum time problem. We assume time-independent (autonomous) systems.

Let us define VS(t,x) as the time dependent value function associated to the infinite horizon problem. We

assume the considered systems will meet the requirements such that VS(t,x)−(VS(x)−ct)→ 0 as t→−∞. Taking

this into account, our objective will be the computation of VS(x), so that the a static feedback control law may be

defined. This computation can be approached in two ways:

• First, find S ; then, define VS(x) = ∞,∀x 6∈S and compute VS(x),∀x ∈S .

• Define VS(x) = ∞,∀x 6∈ R and compute VS(x) for x ∈ R. As t →−∞, VS(x) will become infinity for all

x 6∈S .

The second approach underlines the idea that prior computation of S is not required, since the MRCIS can

be found as a sub-product of the solution of the infinite horizon problem, i.e., S = x : VS(x) 6= ∞. However, in

practice, those computations will be performed using numerical techniques. In general, numerical methods will

not be able to compute S exactly. The discrete representation of the state space is the main obstacle for this.

If the resulting set, S , is an over-approximation of S , then S cannot be used as the target when the system is

outside S . If S was used as target, the resulting trajectories could bounce in the band corresponding to S \Sand never go into S . In order to deal with this issue, we define an additional problem:

Problem 4. Find a suitable under-approximation T of the MRCIS S .

By suitable we mean that T should be as close to S as possible while ensuring that the discrete implementa-

tion of fS(x) is able make the system’s state remain inside S for all trajectories departing from T . On the other

hand, Problem 1 is recast as following:

Problem 5. Find a suitable over-approximation S ⊂R of the MRCIS.

With that in mind, T will be target for the minimum time problem. We define Vmt(x) as the value function

associated to this problem.

Given the value functions, the respective optimal control is found using (2.19): fmt(x) is derived from Vmt(x)and fS(x) is derived from VS(x). Finally, the following state feedback control strategy is defined:

f (q,x) =

fmt(x),q = q0

fS(x),q = q1(3.1)

with the discrete dynamics defined as

q′ = ζ (q,x)≡

q0,x /∈S

q0,q = q0∧ x /∈T

q1,q = q0∧ x ∈T

q1,q = q1∧ x ∈S

(3.2)

19

Page 28: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Intuitively, the two modes of operation, q0 and q1, correspond to “travelling into T (in minimal time)” and

“staying inside S ”, respectively. The control system selects the appropriate feedback control law (either fmt(x)or fS(x)) for each mode of operation.

3.2 Numerical algorithm

The dynamic programming community has been extensively studying the target, the finite horizon and the infinite

horizon problems. For time-independent systems, the target problem is cast as the computation of a static (time-

independent) value function in a straightforward way; the classical minimum time and pursuer-evader problems

fall on this category.

The finite horizon problem can be solved by computing a time-dependent value function for an initial value

problem. In fact, the problem is solved backward in time. Thus the procedure begins with the terminal condition

V (0,x) = Ψ(x), where Ψ(x) is the terminal cost, and the value function is computed for t ∈ [−T,0), where T is

the horizon.

Under very general assumptions, the solution of the discounted (vanishing running cost) infinite horizon prob-

lem is known to converge to a static value function. However, in the context of this work and due to the envisaged

applications, our interest lies on the undiscounted version (non-vanishing running cost) of the constrained infinite

horizon problems. In general, theoretical results for this problem assume some form of compactness of the state

space (see [12]). Since we are considering operation inside the MRCIS, our formulation meets this assumption.

The approximate solution for this problem will be computed as in the finite horizon case, by taking a sufficiently

large horizon.

At the time this research was initiated, the only publicly available numerical solver for problems of arbitrary

dimension we were aware of was the ToolboxLS [67]. Preliminary experiments were performed with this tool [90],

assuming infinite sampling rate. However, no satisfactory results were found for the solution of the infinite horizon

problem. Two approaches were used for the infinite horizon problem. The first approach is based on the LSM but

it requires the consideration of a new state variable, in order to cast time as a space variable. The second approach

consists of using the toolbox as an ordinary finite difference scheme for time-dependent HJI PDE. The former

approach leads to increased processing and memory requirements, due to the additional dimension; the later, lead

to very diffusive results due to the Lax-Friedrichs scheme employed by the ToolboxLS.

The numerical experiments of this research are computed using a numerical solver implemented by the author.

This solver is based on the semi-Lagrangian scheme developed by Maurizio Falcone and co-authors (see [30]

and references therein). As stated before, this scheme implements the discrete-time DPP. This is suitable for our

approach to data-sampled systems.

We implemented a dimension independent version of this scheme using the C++ programming language. The

current implementation uses a regular grid for space discretization. Let us denote by Ω the set of grid nodes

covering the desired region of the state space X .

The user must define f (x,a,b), L(x,a), the grid parameters, the time-step and the initial or boundary condi-

tions. The boundary elements are marked in a grid with the same structure as the one used to store the value

function.

In order to perform the local optimization of (2.10), the user must provide a discrete representation of the input

space (e.g., a set of uniformly spaced values). In fact, for many practical applications, the input space is a discrete

one. For instance, computer controlled systems have digital interfaces with the actuators. Therefore this approach

is suitable from the application point of view.

Except for periodic state variables (such as angular position, for instance), there is no simple way to compute

V (y∆(x,∆,a)) if y∆(x,∆,a) lies outside the defined computational space. In general, extrapolation schemes will

introduce large errors. The safer approach consists of defining a sufficiently large computational space such that

its boundaries do not affect the region of interest. In our implementation, the algorithm assumes that the system

must respect x∈X . If the underlying problem does not naturally lead to the definition of such state constraint, the

computational space must be enlarged such that the effect of these artificial constraints do not affect the solution

in the region of interest. Additional state constraints can be defined by setting the value of the forbidden nodes to

a special constant defined as infinity.

20

Page 29: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

3.3 The semi-Lagrangian scheme

The solver has two slight different versions, directed to the target problem and for the finite/infinite horizon

problems, respectively. However, the kernel of the algorithm is the same for both versions. The central procedure

consists of the application of the DPP (2.10) to every grid node. This procedure returns an approximation of

the value function and the corresponding optimal input at every grid node. For each considered input value, the

procedure computes

∫ ∆0 L(y∆(x,τ ,a),a)dτ +V (y∆(x,∆,a))

and then chooses the minimum. This entails two

main sub-procedures:

1. The numerical solution of the augmented set or ordinary differential equations (x, c) = ( f (x,a),L(x,a)) in

order to obtain y∆(x,∆,a) and the cost to go from x to y∆(x,∆,a);

2. Interpolation of V (x) in order to compute V (y∆(x,∆,a)) from the values at the neighbouring nodes. The

following interpolation schemes were implemented: multilinear and second and third order ENO/WENO

schemes.

In what concerns the solution of the ordinary differential equations (ODE) for the computation of y∆(x(t),∆,a,b),the current implementation provides the Euler scheme, a fixed step fourth order Runge-Kutta scheme and allows

the choice of any of the variable step ODE solvers from the GNU Scientific Library (GSL) ( [48]). As the time step

becomes larger, the variable time step ODE solvers become the only acceptable choice; however, the computation

time also increases and it may become the main contribution for the total CPU time.

As pointed out in [11], it is possible to compute these trajectories just once and to store every y∆(x(t),∆,a,b)

and the respective∫ ∆

0 L(y∆(x,τ ,a),a)dτ in the main memory, thus avoiding repeating the computation in every

iteration. In fact, it is even more efficient to store y∆(x(t),∆,a,b) in terms of barycentric coordinates with respect

to the nodes used for interpolation. However, either approach may require a prohibitive amount of memory: for a

grid with N nodes, a system with M possible input values and n state variables, the extra required memory will be

the size of the floating point data type (e.g., 8 bytes) multiplied either by NM(n+1), if only every y∆(x(t),∆,a,b)is stored, or by NM2n, if the barycentric coordinates for the multilinear scheme are stored.

The numerical solution for the target problem is obtained as described in [30] and other related publications.

The algorithm uses Gauss-Seidel iterations, i.e., the computation always uses the most recent values as opposed

to relying only on the values from the previous iteration. More specifically, the nodes are visited as in the Fast

Sweeping Method. The main challenge for the target problem, in what concerns numerical methods, resides in

the fact that the respective value function may be discontinuous even for the unconstrained case. In general, even

with ENO/WENO interpolation schemes, the SL scheme will be diffusive near the discontinuities. This will be a

topic of future work. In what follows, we discuss the constrained finite and infinite horizon problems with more

detail.

3.4 Constrained finite and infinite horizon problems

The finite and infinite horizon problems are associated to time-dependent value functions V (t,x). The computation

of V (t,x) starts from the terminal condition V (0,x). Each posterior iteration corresponds to a progression back-

ward in time. For a time-step ∆, the result of the first iteration, V1, will be the approximation of V (−∆,x), the result

of the second iteration, V2, will be the approximation of V (−2∆,x) and so on. In practice, we are only interested

in the final iteration. For the infinite horizon case, we assume that for a sufficient large number of iterations the

value function will converge to V (t,x) =V (x)− ct, where c is a constant. Therefore, for a basic implementation,

only two grids must be kept: one holding the result from the previous iteration (for the first iteration, this will

be V (0,x)) and other for the result of the current iteration. The results of iteration i are based exclusively on the

results of iteration i−1 (Jacobi iterations).

The Jacobi iterations facilitate the parallel execution of the algorithm. At the present, parallelism is explored

by creating a partition of the computational space and assigning each subset of the partition to a different proces-

sor. Notice, however, that each processor may have to access data outside its own subset of nodes, namely near the

boundaries. A multi-thread version was implemented in order to take advantage of the now common multiproces-

sor systems with shared memory (e.g., the typical “dual core” personal computers fall on this classification). The

Jacobi iterations eliminates potential data access conflicts between the threads of computation: on each iteration

i, the main function creates a new buffer to store Vi, starts the threads and waits their termination; in order to

compute Vi every thread accesses only the results obtained at iteration i− 1, without changing, it and stores the

21

Page 30: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

results on the new buffer (Vi). Since there is a partition of the computational space, each node of Vi can only be

written by a single thread. We remark that, except for eventual bottlenecks on the memory bus, this scheme scales

linearly with the number of processors.

For the unconstrained case with smooth dynamics and running cost, the value function is continuous. However,

the state constraints introduce discontinuities in the value function. By definition, V (t,x) =∞ for x 6∈X. Moreover,

the introduction of state-constraints leads to the consideration of the maximal controlled invariant set (MCIS).

Depending on the system dynamics, state constraints and input constraints, some of the trajectories beginning on

X cannot be kept inside it for all times. Therefore, the cost associated to the respective initial state is defined

as infinity. This is well captured by the infinite horizon problem. In this sense, the MCIS is defined as x :

limt→−∞ V (t,x) 6= ∞.

If one or more vertexes of the cell containing y∆(x,∆,a) are defined as infinity, special care has to be taken

when performing the interpolation. One possibility is to define V (t,x) = ∞ whenever one or more vertexes of the

cell containing y∆(x,∆,a) are defined as infinity. This is a conservative approximation. Only trajectories passing

through well defined cells (i.e., not containing vertexes defined as infinity) will be accepted. However, for certain

systems and grid resolutions, this may lead to useless solutions. The forbidden nodes will tend to absorb nodes

corresponding to eventually valid trajectories. Moreover, the absorption of nodes will have a cascading effect, i.e.,

nodes marked as forbidden will tend to absorb in their neighbourhood. This will happen if the reach set from node

x, over the defined time-step, does not intersect a well defined cell. This effect is more pronounced as the Courant

number decreases. After some iterations, the computed value function may be finite only on a small region or

even infinite everywhere. Therefore, this method will produce a sub-approximation of the MCIS. Whenever the

intersection of the reach set from x with the MCIS is a very narrow region (less than one grid cell), the method

will disregard x and the result will be a sub-approximation of the exact solution. As explained, this will have

a cascading effect. Moreover, some of the resulting trajectories will be sub-optimal in order to escape from the

absorbing grid cells.

On the other hand, we may relax the definition of infinity and, instead, consider a very large value Kinf when

performing the interpolation. This will produce an over-approximation of the MCIS. The cost near the boundary

of the MCIS will be higher than the optimal. This is because the trajectories will try to avoid the cells with

the artificially high costs. The extent of this diffusive effect (sometimes informally designated as smearing is

analogous to the absorption effect of the previous approach. It will also depend on the reach set from each

node and on the grid resolution. In general, the higher the grid resolution, the smaller the region affected by the

smearing. Analogously to the previous case, the region affected by the smearing will produce sub-optimal results.

This happens because the trajectory tries to escape at all costs from the region with the artificially higher value

introduced by the smearing effect.

This method implicitly retains all the feasible trajectories but, on the other hand, it may exhibit ”false pos-

itives”. In this context, a false positive is a state x marked as finite (i.e., Vi(x) < Kinf) for which no feasible

trajectory can be generated by the resulting control law. This means that a false positive can either correspond to

a state outside the exact MCIS or to a state belonging to the MCIS for which the synthesized controller cannot

produce a feasible trajectory. If the system is known to have a nonempty equilibrium set Se or a nonempty set

Ss of cells free of smearing effects with finite values at their vertexes, these false positives can be identified by

exhaustive simulation of the system trajectories. The trajectories are simulated until they either reach Ss or Se.

However, this is a time consuming procedure. Moreover, for nonzero ergodic cost and for a sufficiently high num-

ber of iterations, the finite values of V (t,x) might start approaching K∞. This may lead to very inaccurate results

(affecting the decision for both inputs) and to slow convergence. In order to avoid that the algorithm computes the

approximation of V (x) =V (t,x)− ct in the following way:

1. Start with i = 0 and no target set. ∀x set V0(x) = 0 and fi(x) = 0.

2. Apply the DPP on each grid node, in order to obtain V ′i+1(x) and fi+1(x).

3. Set Vi+1 =V ′i+1−minV ′i+1.

4. If ‖ fi+1(x)− fi(x)‖p < δtol,p, f and ‖Vi+1(x)−Vi(x)‖p < δtol,p,v, where δtol,p, f and δtol,p,v are given thresh-

olds, stop. Otherwise, set i = i+1 and go to step 2.

We consider the norms 0 (number of nonzero components), 1 and infinity for the stopping condition. Ideally

we would like ‖ fi+1(x)− fi(x)‖p to become zero and ‖Vi+1(x)−Vi(x)‖p to converge to the machine precision.

22

Page 31: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

However, in most cases, that may require an unreasonable amount of time. Additionally, due to the space dis-

cretization, the final solution may not be the exact solution even if those norms go to zero. Therefore, in practice,

it is acceptable to define more relaxed thresholds.

3.5 Numerical experiments

3.5.1 Continuous-time LQR: unconstrained case

Consider the infinite horizon optimal control of the continuous time linear system (state x ∈ R2 and input u ∈ R)

x(t) =

[

0 1

0 −1

]

x(t)+

[

0

1

]

u(t) (3.3)

with cost functional

minJ(x,u) =

∫ ∞

0(xT (τ)x(τ)+u2(τ))dτ (3.4)

for any given x(0) = x0. It is well known that the optimal control for this problem is given by a time-invariant linear

state feedback control law, i.e., u(t) = −Kx(t). The associated value function, V (x0) = minJ(x,u) : x(0) = x0,can also be derived in closed form. For the current example, we have

V (x) = xT

[

2 1

1 1

]

x (3.5)

and K =[

1 1]

.

3.5.2 State and input constraints

In what follows, we consider the state constraint x(t) ∈ X, where X = x ∈ R2 : ‖x‖∞ ≤ 10, and the input

constraint |u(t)| ≤ umax. The input space is composed of 801 equally spaced values between −umax and umax. The

numerical computations used the Euler scheme and multilinear interpolation.

It is possible to derive an analytical description of the MRCIS for this problem. Analysis of the extremal

trajectories shows that the minimum distance to the boundary of X along the x1 dimension, in the first and third

quadrants of the x1− x2 phase plane, is given by

|x2|−umax ln

(

umax + |x2|

umax

)

. (3.6)

The boundary of the boundary set is represented on Fig. 3.1(a). Fig. 3.1(b) presents a detail of the boundary of

MCIS for different values of umax. For the unconstrained case, the optimal control will never exceed u = 20 on the

considered domain. Let us define ΩLQR as the maximal invariant set, with respect to X, of the closed loop system

composed of (3.3) with feedback control u=−Kx. We compute this particular set by integration of the trajectories

emanating from each grid node on X and elimination of those nodes corresponding to trajectories going through

R2 \X. When designing for the constrained case, the controller will use all the available range of controls in order

to avoid the forbidden domain. As expected, the MCIS for umax = 20 is a superset of ΩLQR. This means that the

absolute value of the control input will be higher than |−Kx| when going through Ωumax=20 \ΩLQR; for x ∈ΩLQR

the optimal control will coincide.

The main challenge of this problem, in terms of numerical computation, resides in the fact that the optimal

trajectories are tangent to the boundary of the MCIS in the curved segments depicted on figure 3.1(a) and, with

more detail, on 3.1(b). Therefore, the interpolation procedure uses points near the discontinuity. As expected, this

introduces some smearing. This smearing leads to inaccurate computation of the optimal control near the boundary

of the MCIS. This effect can be attenuated by using a finer grid on these regions, as illustrated on Fig. 3.2. Fig. 3.2

shows that, at the boundary of the MCIS, the absolute value of the computed control is higher than the optimal as

the resolution decreases. Fig. 3.3 shows the trajectories obtained by integration of the system equations with input

u(t) given by the numerical dynamic programming control law.

23

Page 32: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

x1

x 2

-10 -5 0 5 10-10

-5

0

5

10

(a) umax = 40.

x1

x 2

-10 -9 -8 -7 -6-10

-8

-6

-4

-2

0

(b) From left to right: |u(t)| ≤ 40 (the same as

Fig. 3.1(a)); |u(t)| ≤ 20; using the unconstrained LQR;

|u(t)| ≤ 10.

Figure 3.1: Boundary of the MCIS for different scenarios.

-10-8

-6-4 -10

-50-10

0

10

20

30

x2x

1

Figure 3.2: Difference between the optimal control derived from the 201×201 grid and the one derived from the

401× 401 grid. The lower resolution leads to a sub-optimal higher actuation near the boundary of the invariant

set.

0 2 4 6 8 10-10

0

10

20

30

40

(a) 201×201 grid

0 2 4 6 8 10-10

0

10

20

30

40

(b) 401×401 grid

Figure 3.3: Evolution of x1(t) (dash-dot line) and x2(t) (dashed line) with u(t) (solid line) given by the numerical

DP control law for the LQR problem.

24

Page 33: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

3.5.3 Sampled data system

As discussed previously, computer controlled systems are subject to finite control rates. Typically, a sample and

hold control scheme is employed. For linear systems, there are systematic methodologies to obtain an exact

discrete-time equivalent model for the composition of the original continuous-time model with the sample and

hold scheme. However, the same is not true for nonlinear systems. In the following example we obtain the

optimal control for the discrete-time system without using the actual discrete-time model. All computations are

based on the continuous-time model and the time step of the SL scheme is set to the value of the control period.

Let us consider again the example of the LQR problem. It is well known that, if a sample and hold control

scheme is used, the optimal gain K changes according to the control rate. The discrete-time version of the LQR

can be computed on Matlab using the following command:

lqr(c2d(ss([0 1; 0 -1], [0; 1], [1 1], 0), ts), [1 0; 0 1], 1)

where ts is the control period. Table 3.1 shows the optimal K for different control rates. We remark that using

the gains derived for the continuous-time LQR in the time sampled system may lead to an unstable closed loop

system (in this example, instability would occur for 1/∆ < 0.5 Hz).

Table 3.1: Optimal K for different control rates

Control rate (Hz) K

0.2 (0.2193, 0.2183)

2 (0.7827, 0.7767)

20 (0.9753, 0.9753)

200 (0.9975, 0.9975)

The value function and the corresponding optimal control map were computed for each of the control rates

indicated on table 3.1. The computational domain was [−10,10]× [−10,10] and the grid resolution was 301×101;

the inputs were chosen from a set of 151 uniformly spaced values in [−20,20]. The differential equations were

solved using the Runge-Kutta-Fehlberg method from the GSL, with absolute and relative tolerances of 10−12. All

interpolations were performed with the multilinear scheme.

The resulting numerical control map fc(x) is very similar to the one given by the analytical solution. This

comparison was based on the numerical approximation of the gradient of fc(x). A finite difference method was

used for that purpose. We observed that the numerical solution is somewhat noisy in the sense that the numerical

gradient is not constant when computed between adjacent grid nodes. If the numerical gradient is computed using

more distant nodes (e.g., 20 nodes apart), the numerical results become similar to those presented on table 3.1.

Those observations are also supported by the simulation of the trajectories of the closed loop system. The trajecto-

ries are computed by applying the piecewise constant optimal feedback control for the considered control period.

The trajectories obtained with the numerical control law are very similar to those obtained by the exact discrete

time LQR (see Fig 3.4). As expected, the simulations show degradation of the system performance when the gains

of the continuous time LQR are employed.

3.5.4 Minimum time

We consider again system (3.3) but with a different control objective and input constraints:

minJ(T,x,u) =∫ T

0dτ (3.7)

such that x(0) = x0, x(T ) = 0, x(t) ∈ X (as defined before) and |u(t)|< 10.

As is well known, the optimal control law for this problem is of bang-bang type. The switching surface,

corresponding to the states where the control action should change sign, is defined as

x1 =−x2 +umax ln

(

x2

umax+1

)

,x2 ≥ 0 (3.8)

x1 =−x2−umax ln

(

−x2

umax+1

)

,x2 ≤ 0 (3.9)

25

Page 34: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

0 1 2 3 4 5-2.5

-2

-1.5

-1

-0.5

0

0.5

x1

x 2

(a) Trajectory on the x1− x2 plane

1 2 3 4 5 6

42

44

46

48

50

52

t

∫ 0t L(x

(τ),

u(τ)

)dτ

(b) Accumulated cost

Figure 3.4: Numerical solution for ∆ = 0.5 s (dashed thick line) vs exact discrete-time LQR solution (thin line) vs

solution using the gains from the continuous-time LQR problem (dashed thin line).

The value function on the left side of the switching surface is defined as:

V (x) =−x1 + x2

umax+2ln

(√

(

x2−umax

umax

)

e

(

x1+x2umax

)

+1+1

)

(3.10)

The value function on the right side of the switching surface is defined as:

V (x) =x1 + x2

umax+2ln

(√

(

−x2−umax

umax

)

e

(

−x1+x2umax

)

+1+1

)

(3.11)

Figure 3.5 shows several trajectories using the controller derived from the numerical value function on a

201×201 grid. We observe that the trajectories consistently start to “bend” several grid nodes before hitting the

switching surface. This means that the control switch is made before the optimal instant. Once again, this is

caused by smearing near a non-smooth region, the switching surface, in this case. As described for the boundary

of the invariant set (figure 3.2), this effect is minimized when we define finer grid on the region containing the

switching surface, i.e., the numerical switching surface converges monotonically to the exact switching surface as

the grid is refined.

x1

x 2

-10 -5 0 5 10-10

-5

0

5

10

Figure 3.5: The thin lines represent the system trajectories from different initial states using the control law

derived from the numerical solution on a 201×201 grid. It can be seen that the control law produces a change in

the actuation before reaching the ideal switching surface (thick line).

The computations were performed on a grid of 801×401 nodes and with time-step ∆= 0.02. The computation

of the value function, using FSM-like iterations and assuming the bang-bang control (only two possible actuation

values), took 17 seconds to converge to a maximum change between iterations of 10−9 on an Intel T7250 CPU

based computer. If we assume no knowledge about the optimal control signal, we must consider a larger number

of actuation values. For 101 equally spaced actuation values, the computation time increased to 320 seconds when

26

Page 35: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

using value-iteration and 170 seconds when using policy-iteration. Note that the increase in computation time is

not proportional to the number of input values.

For the simulation of the system trajectories, we also allowed the control law to take intermediate values (see

figure 3.6). Due to the linear interpolation, the DP controller actually chooses intermediate control values near the

discontinuity of the switching surface.

As expected, the numerical control law produces some chattering when the trajectory is near the switching

surface. Since the control value is switched before time, the trajectory tends to deviate from the neighbourhood of

the switching surface; when that deviation becomes too large, with respect to the the accuracy and resolution of

the numerical value function, the controller takes a corrective action to bring the trajectory nearer the switching

surface.

0 0.5 1 1.5 2 2.5-10

-5

0

5

10

0 0.5 1 1.5 2 2.5-10

-5

0

5

10

Figure 3.6: Evolution of x1(t) (dash-dot line) and x2(t) (dashed line) with u(t) (solid line) given by the numerical

DP control law for the optimal time problem on a 801×401 grid with ∆ = 0.02. On the left figure, the controller

is only allowed to use |u(t)|= 10 (bang-bang control action); On the right figure, the controller is allowed to use

|u(t)| ≤ 10.

3.6 Semi-Lagrangian receding horizon scheme

3.6.1 The algorithm

In order to overcome the limitations of the approaches described in the previous section, we propose an algorithm

(SLRH) based on concepts of receding horizon control. Recall that, for each x, the basic semi-Lagrangian scheme

computes the corresponding optimal trajectory over one time-step. The key idea of our algorithm is to compute

the optimal trajectory for an horizon of N∆ > 1 time-steps when the trajectory does not end in a well defined cell.

By optimizing over a larger horizon, the endpoint of the trajectory will have more chances of reaching a well

defined cell. Let us define Vi,N∆≡ Vi,Vi−1, . . . ,Vi−N∆+1, i≥N∆−1 and Vi,N∆

≡ Vi,Vi−1, . . . ,V0, i < N∆−1. For

N∆ = 1, this algorithm is equivalent to the basic SL scheme with the strict definition of infinity.

At the present, we only consider the control input (single player). The case of competing inputs (zero-sum two

player differential game) will be a topic a future work.

As for the basic scheme, for solving the optimal control problem for a time horizon T , the algorithm starts with

V0(x) = Ψ(x),∀x ∈ Ω. It then proceeds iteratively, calling Algorithm 1 for i = 1, . . . ,NT where NT is the smaller

integer greater than T/∆. The main algorithm is based on the procedure ComputeVx; this procedure is described

by Algorithm 2. Ku is a constant that marks the node’s value as undefined. All the cells to which that node belongs

will be identified as not well defined.

With a sufficiently large N∆, this algorithm would be able to avoid any distortion (i.e., sub-optimality) intro-

duced by the discontinuity at the boundary of the MCIS. However, the processing time and storage requirements

grow exponentially with N∆. In order to further minimize these requirements, we consider a reduced input space

Ua,i for the larger horizons. For many problems, one sensible approach is to consider only extremal controls when

optimizing for more than one time-step (e.g., Ua,1 = Ua and Ua,i = min(Ua),max(Ua), i > 1). Additionally,

as soon as one valid cell is found, the procedure stops extending the optimization horizon. This accelerates the

computation but, on the other hand, it may prevent the evaluation of better trajectories. In general, these two

optimizations will lead to sub-optimality near the boundary.

27

Page 36: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Input: Ω, i, Vi−1,N∆, N∆

Output: Vi,N∆

foreach x ∈ x : x ∈Ω∧Vi−1(x) 6= ∞ do

Vi(x),defined← ComputeVx(x,Vi−1,N∆, N∆);

if defined = f alse then

if i≥ N∆ then

j← i−1;

while j > i−N∆∧ j > 0 do

Vj(x) = ∞;

j← j−1;

end

else

Vi(x)← Ku;

end

end

end

Vi,N∆←Vi∪Vi−1,N∆

\Vi−N∆

Algorithm 1: Compute Vi using the semi-Lagrangian receding horizon scheme.

Input: x, Vi−1,N∆, N∆

Output: Vi(x), defined

Sconsidered←x;Vi(x)← ∞;

level← 1;

finished← false;

defined← false;

repeatSnew

considered← /0

foreach x ∈ Sconsidered do

foreach a ∈Ua,level do

Calculate y∆(x,∆,a);

c←∫ ∆

0 L(y∆(x,τ ,a),a)dτ;

v←Vi−level(y∆(x,∆,a));if v is well defined then

if Vi(x)> c+ v then

Vi(x)← c+ v;

defined← true;

end

else

Snewconsidered← Snew

considered∪y∆(x,∆,a);end

end

end

if level = N∆∨ i− level = 0∨defined thenfinished← true

else

Sconsidered← Snewconsidered;

if Sconsidered = /0 thenfinished← true

end

end

until finished;

.Algorithm 2: Compute Vi(x) using up to N∆ time-steps

28

Page 37: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

The memory requirements for this implementation may seem higher than for the basic implementation. In

fact, the basic implementation only requires the storage of two sets of values: the values compute at each grid

node at the current iteration, Vi, and the values from the previous iteration Vi−1. The proposed algorithm requires

the storage of N + 1 sets of values. However, by careful tuning of the maximum optimization horizon, N∆, and

Ua,i1, the grid resolution can be lower than in the basic semi-Lagrangian scheme while still providing a better

sub-approximation of the MCIS. Nevertheless, for the real-time implementation of the control law, the worst case

optimization time should be kept within the processing capabilities. Therefore, the value of N∆ must be chosen

with that in mind.

3.6.2 Numerical example

Consider the following system:

x =

sin(x2)a

(3.12)

with x =[

x1 x2

]′and input a ∈Ua, where Ua is a set of 31 equally spaced values from −0.25 to 0.25. We define

Ua,i = −0.25,0.25, i > 1. The system is constrained to [−2,2]× [−π/2,π/2]. The grid resolution is 201×201.

All the experiments are performed with ∆ = 0.1.

Fig 3.7 shows the contour of the approximated MCIS for different values of N∆. For N∆ = 1 a significant

contraction of the MCIS is observed. The results improve significantly even with small values of N∆. However,

we remark that this is highly dependent on the Courant number. A change in the grid resolution or time step leads

to different results. In this case, the classical SL (using the relaxed definition of infinity) produces a noticeably

rough over-approximation of the MCIS.

-2 -1 0 1 2-1.5

-1

-0.5

0

0.5

1

1.5

x1

x 2

Figure 3.7: Computation of the MCIS. The thicker line corresponds to the exact MCIS. The inner contours cor-

respond to the results of the SLRH with different N∆: 1, 2 and 4 (from the inner to the outer contour). The outer

contour corresponds to the over-approximation obtained with the classical SL scheme.

Table 3.2 shows the number of nodes in the approximation of the MCIS for different values of N∆. It also

shows the CPU time for 68 iterations. This is the number of iterations required by the algorithm to converge to

the approximation of the MCIS for N∆ = 1. For N∆ ≥ 40, the computation times do not differ significantly. This

is because the extra computation time near the boundary is negligible when compared to the computation of the

remaining nodes. Moreover, the number of nodes for which the entire optimization horizon N∆∆ must be used

gets smaller as N∆ increases.

29

Page 38: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Table 3.2: Approximation of the MCIS with the SLRH algorithm

N∆ Number of nodes CPU Time (s)

1 17197 32

2 23351 38

3 23715 39

4 24303 39

10 25726 41

20 26035 42

40 26108 44

50 26120 44

60 26121 44

70 26121 45

30

Page 39: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Chapter 4

A Motivating Application

In this sections we describe the application of the dynamic programming based two mode control approach to

the motion control of an Autonomous Underwater Vehicle (AUV) in two different settings: path following and

trajectory tracking.

The developments are targeted at small sized AUV (Fig. 4.1) developed at the Underwater Systems and Tech-

nologies Laboratory from Porto University. These AUV have a torpedo shaped hull about 1.6 m long, with a

diameter of 20 cm and weighting about 35 kg in air. Their maximum forward speed is 2 m/s.

The control design is based on simplified models. The simplification consists of assuming slightly coupled

motions and modelling some dynamical terms as disturbances. In order to clarify the assumptions and simplifica-

tions, we start by presenting a brief description of a full six degree of freedom model for AUV.

4.1 System model

4.1.1 Six degree of freedom model

The motion model for autonomous underwater vehicles is best described by a nonlinear model. In what follows,

we follow. with minor variations, the formulation of [47] and the notation from the Society of Naval Architects and

Marine Engineers (SNAME) [61]. In order to define the model, two coordinate frames are considered: body-fixed

and earth-fixed. The motions in the body-fixed frame are described by 6 velocity components ν = [νT1 ,ν

T2 ]

T =[u,v,w, p,q,r]T respectively, surge, sway, heave, roll, pitch, and yaw, relative to a constant velocity coordinate

frame moving with the ocean current. The six components of position (x, y, z, typically in the north, east, down

convention) and attitude (roll, pitch, yaw) in the earth-fixed frame are η = [ηT1 ,η

T2 ]

T = [x,y,z,φ ,θ ,ψ]T . The

earth-fixed reference frame can be considered inertial for the AUV.

Assuming, the velocities in both reference frames are related through the Euler angle transformation

η = J(η2)ν + c (4.1)

Figure 4.1: NAUV autonomous underwater vehicle from University of Porto, waiting to be deployed.

31

Page 40: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

where c is the velocity of slow varying currents and

J(η2) =

[

J1(η2) 0

0 J2(η2)

]

J1(η2) =

cψcθ (cψsθsφ − sψcφ) (sψsφ + cψcφsθ)sψcθ (cψcφ + sφsθsψ) (sθsψcφ − cψsφ)−sθ cθsφ cθcφ

J2(η2) =

1 sφ tanθ cφ tanθ0 cφ −sφ

0sφcθ

cφcθ

The equations of motion are composed of the standard terms for the motion of an ideal rigid body and, ad-

ditionally, the terms due to hydrodynamic forces and moments. We skip the details regarding the hydrodynamic

terms; they can be found in [47]. The main relevant aspect is that most of those terms are hard to identify. The

identification requires either expensive in-water tests or very specialized software packages. Heuristic formulas

are available in the literature but, in practice, they lead to significant modelling errors.

The dynamics of the AUV are described in the body-fixed frame by the following nonlinear equations of

motion:

Mν +C(ν)ν +D(ν)ν +L(ν)ν +g(η) = τ (4.2)

where M is the constant inertia and added mass matrix (hydrodynamic effect) of the vehicle, C(ν) is the Coriolis

and centripetal matrix, D(ν) is the damping matrix, L(ν) is the lift matrix, g(η2) is the vector of restoring forces

and moments and τ is the vector of body-fixed forces and moments from the actuators.

The considered AUVs are not fully actuated. They use a propeller for actuation in the longitudinal direction

and 4 tail fins, arranged in a cruciform fashion, for attitude control. The thrust produced by the propeller is

approximately proportional to the square of its angular speed; the steady state surge is practically proportional to

the propeller angular speed. Each fin produces a lift force perpendicular to the fluid and a quadratic damping force

in the direction of the the fluid flow; the lift force is approximately proportional to the fin’s deflection angle with

respect to the fluid flow, for small deflection angles (less than 30 degrees). For instance, the formulas for the top

rudder fin’s force and moment are the following:

Yf = ρCLF S f in(u2δr−uv− x f inur) (4.3)

N f = x f inYf (4.4)

where ρ is the water density, S f in is the fin’s face area, CLF depends essentially on the geometrical aspect of the

fin, x f in is the position of the fin relatively to the origin of the body fixed coordinate frame and δr is the fin’s

deflection angle with respect to the plane generated by the x and z axis of the body fixed coordinate frame.

For safety reasons, the AUVs are in general slightly buoyant. The center of gravity is slightly below the center

of buoyancy, providing a restoring moment in pitch and roll which is useful for these underactuated vehicles. The

restoring term g(η) is described by the following expression:

g(η2) =

(W −B)sinθ−(W −B)cosθ sinφ−(W −B)cosθ cosφ

zGW cosθ sinφzGW sinθ

0

where W is the vehicle’s weight and B is the buoyancy force.

In what concerns parameter identification, the most challenging parameters are those in D(ν), L(ν), the added

mass parameters and those related to the actuation system. In general, the elements of the damping matrix are

defined so that linear and quadratic components arise (e.g., Xu+Xu|u||u| for D11). For an AUV with port/starboard,

top/bottom and fore/aft symmetries, M and D(ν) and L(ν) can be assumed to be diagonal. However, that is not

often the case. The AUVs considered in this work can be assumed to be port/starboard symmetric. On the other

hand, significant asymmetries are found both from the fore to the aft and from top to bottom; this is due to

the actuation system in the tail, the sensors on the nose, the antennas on top and the internal disposition of the

hardware. In [33], we compute the numerical parameters for a model of one of the AUVs from the University of

Porto using basic measurements and empirical formulas from the literature. The procedure can be easily adapted

to AUVs of similar shape.

32

Page 41: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

4.1.2 Model simplification

In [52, 53], Healey and co-authors first reported the approach of decoupling AUV control design into loosely

interacting control modes. Since then, this practice has become common (see, e.g., [92] and [1]). Three modes

of control are generally considered: (forward) speed control, steering (or lateral control) and diving (or depth

control); additionally, roll control is sometimes also considered.

Let us consider operation at the horizontal plane (constant depth) with constant speed. Let us start by consid-

ering a reduced model, with state described by (x,y,ψ,v,r); u is assumed to be a nonnegative constant uref and the

remaining variables, (z,φ ,θ ,w, p,q), are assumed to be 0. However, in practice, even with active compensation,

the variables (z,φ ,θ ,u,w, p,q) are not constant; depending on the disturbances and on the reference trajectory,

those states will have more or less pronounced variations. In order to encompass the effect of such variations,

(z,φ ,θ ,u∆ ≡ u− uref,w, p,q) can be considered as bounded disturbances. Moreover, a low cost AUV may not

have sensors for measuring v. Since an accurate model for the v dynamics may not be available, we opt to model v

as an additional bounded disturbance. The z variable has no effect on the considered type of motion; even so, we

are left with a disturbance vector composed of seven inputs (φ ,θ ,u∆,v,w, p,q). In order to simplify the control

design, we consider the following model:

x = ucos(ψ)+ cx (4.5)

y = usin(ψ)+ cy (4.6)

ψ = r+ cψ (4.7)

r = k1r+ k2r|r|+ k3δr + cr (4.8)

where cx, cy, cψ and cr are inputs used to model the effect of environmental disturbances and model uncertainty.

Note that cx and cy include the effect of currents. The term k3δr models the torque produced by the top and bottom

fin of the AUV with respect to the body fixed z axis.

Notice that this simplified model is also applicable for wheeled and aerial vehicles operating on the horizontal

plane.

4.2 Path following problem

Operations with autonomous vehicles often entail the following of prescribed paths such as roads, aerial corridors

or simple concatenations of way-points. The basic idea of path following is to follow some given geometric path

with no specific time parametrization. In its simplest setting, a path following controller must ensure forward

motion along the given path while making the cross-track error converge to zero. For paths with zero curvature,

the cross-track error is the shortest distance between the vehicle and the path. The main characteristic of path

following control relies on its independence with respect to time, as opposed to trajectory tracking problems, where

the desired reference is a function of the time variable. This time independence provides additional freedom which

can be explored to provide a control strategy with increased robustness. The change of variables corresponding

to the formalization of the path following problem may lead either to a system of reduced dimension, while

maintaining the same number of inputs, (as in [54], for instance) or to a system with the same dimension, but with

an additional input corresponding to the free time parametrization (see [2], for instance). After this formalization,

the path following problem can be seen as an ordinary nonlinear (or even linear) control problem; however, due

to its practical applicability, this problem has been targeted as an application for several control approaches in the

more diverse settings (e.g., [99], [72], [3]).

We will consider the case of single vehicle operation on the horizontal plane and the problem of following a

path within a given maximum cross-track error. Additionally, the motion will be optimized with respect to a given

cost function. That means that the objective is to maintain the vehicle inside a neighbourhood of the reference

path; this neighbourhood is sometimes designated as tube. The idea of this control strategy is to enable a more

relaxed control for systems subject to persistent measurement errors and disturbances: the control effort will be

stronger when the system is near the boundary of the tube and more relaxed when it is near the reference path.

For instance, autonomous underwater vehicles have strong limitations in what concerns the estimation of their

exact position. Given that uncertainty, it does not make sense to be very stringent in what concerns following a

path that the vehicle cannot even determine very accurately. Moreover, persistent disturbances may also prevent

stabilization to the reference path, as will be verified later. Finally, in practice, if the control system is implemented

in a computer based platform, limitations due to the finite control rate and discretization of the actuator commands

can make stabilization to a single point (in the Lyapunov sense, e.g., [91]) an impossible task.

33

Page 42: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

u

v

T

N

East

North

x

y

d

ψr

ψt

Figure 4.2: Coordinate system (North-East-Down convention).

The path following problem has been studied for wheeled mobile robots (see [80] and [94] for early ap-

proaches), for under-actuated marine vehicles (see [39] and [37] for instance) and aerial vehicles (see [70] for

instance). Some tube based approaches can be found in the literature for the path following problem (see,

e.g., [4], [2]); however, those approaches do not provide straightforward procedures for the definition of the

tube and for tuning the performance of the system when inside the tube. In the context of path-following for ma-

rine vehicles, robustness with respect to model uncertainty was approached in [59]; however, robustness against

disturbances is not handled by those results.

4.2.1 Model

Consider a frame with its origin at the nearest point of the path. We assume there is some scheme to disambiguate

the choice in case of multiple candidates. This frame (the Frenet frame) has a T axis tangent to the path and a N

axis normal to the path. The orientation of the T axis with respect to the earth fixed frame is ψt and we define the

vehicle’s angle relative to the path as ψr = ψ −ψt (see Fig. 4.2). The cross-track error d is the shortest distance

between the vehicle (i.e., the origin of the body fixed frame) and the path. Thus, the cross-track error dynamics

may be described by the following model:

x = f (x,a,b) =

usin(ψr)+ cn

r+ cψr

k1r+ k2r|r|+ k3δr + cr

(4.9)

where x =[

d ψr r]′

, a =[

δr

]

and b =[

cn cψr cr

]′. This model incorporates the following terms: cn =

−cx sin(ψt) + cy cos(ψt), ct = cx cos(ψt) + cy sin(ψt) and cψr = −(ucos(ψr) + ct)κ + cψ where κ is the path’s

curvature at the origin of the Frenet frame (for a straight path κ = 0). The bounds for cn, cy and cψr are defined

accordingly.

Furthermore, in order to encompass the penalization of chattering in the design of the control laws, we consider

another model with δr as an additional state variable. In this case, the reference for the servomotors, δr,d , becomes

the system input; δr,d is modelled as a discrete variable, with the same resolution as the one provided by the actual

target system. We assume first order linear dynamics for the servomotors. Therefore, the system model with

actuator dynamics comes as follows:

x = f (x,a,b) =

usin(ψr)+ cn

r+ cψr

k1r+ k2r|r|+ k3δr + cr

k4(−δr +δr,d)

(4.10)

where x =[

d ψr r δr

]′, a =

[

δr,d

]

and b =[

cn cψr cr

]′.

34

Page 43: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

4.2.2 Control design

We consider the two mode control strategy. For normal operation, the control law ( fS(x)) should be derived taking

into account the following state constraints:

• |d(t)| ≤ dmax, where dmax is the maximum acceptable cross-track error.

• |ψr(t)| ≤ ψmax ≡ arccos(ct,max/u); this constraint avoids backward motion along the path.

• |r(t)| ≤ rmax, where rmax is the maximum attainable yaw rate solely under the influence of the control input.

Therefore we have

x(t) ∈R ≡ [−dmax,dmax]× [−ψmax,ψmax]× [−rmax,rmax] (4.11)

Additionally, for some inspection applications (e.g., when using a camera), it is important to minimize the

occurrence of fast changes in the vehicle’s heading. Therefore, we define the following expression for the running

cost:

L(x,a,b) = Kdd2 +Kψr ψ2r +Krr

2 (4.12)

where x =[

d ψr r]′

, a =[

δr

]

and b =[

cv cψr cr

]′

In order to derive the two control laws, two value functions are defined:

• VS(x), associated to the infinite horizon DG defined by (2.16) with t0 = −∞, t f = 0, L(x,a,b) given by

(4.12), system dynamics given by (4.10) and state constraints given by (4.11);

• Vmt(x), associated to a minimal time to reach (MTTR) DG defined by (2.16) with t0 = 0, t f = mint : x(t) ∈T , L(x,a,b) = 1, system dynamics given by (4.9) and target set T .

At this point, the control laws fS(x) and fmt(x) are obtained by applying (2.19) to VS(x) and Vmt(x), respectively.

It can be seen that the actuator dynamics are neglected for the MTTR OCP. Recall that the actuator dynamics

were modelled in order to allow the minimization of chattering. Otherwise, their dynamics do not have a strong

impact in the solution of the OCPs. However, the minimization of chattering conflicts with the minimization of

the time to reach. Therefore, taking into account that the system is not expected to enter q0 mode very often,

we give preference to the later objective. As a side effect, we get a solution more amenable to computational

implementation (one dimension less).

Since the control law will not be defined in analytical form, its domain should be carefully defined. Obviously,

it is not possible to compute or store a numerical control law for an unbounded domain. However, by exploring

the structure of the solution and observing practical aspects of the problem, that problem can be avoided. In this

application, this only applies to fmt(x), since the domain for fS(x), S ∈R, will be a bounded set.

Therefore, let us consider the domain of fmt(x). For the periodic ψr ∈ S1 state variable, only the range [−π,π]

has to be considered. For the r state variable, any measurements exceeding rmax in absolute value will be assumed

as spurious and truncated to sign(r(t))rmax. Frequent occurrences of this truncation will point to abnormal con-

ditions and, ultimately, will trigger safety measures not discussed here (e.g., vehicle surfacing). Thus, the only

remaining unbounded state variable is the cross-track error (state d).

Physical insight tell us that, for absolute values of the cross-track error above a certain threshold, the optimal

action will always be to make the vehicle approach a direction perpendicular to the path. Therefore, it is expected

that the projection of the computed control map on the subspace defined by ψr×r will be constant for all |d|> dπ/2

where dπ/2 is some constant whose value depends on the model parameters. Thus, the optimal control may be

obtained by truncating d(t) to sign(d(t))dπ/2, whenever |d(t)|> dπ/2. This means that the optimal control may be

determined for the unbounded domain using only the values stored on the grid covering [−dπ/2,dπ/2]× [−π,π]×[−rmax,rmax].

The constant dπ/2 will be found by inspection of the control law; however, in order to verify this, the compu-

tational space along the d dimension must, at least, include [−dπ/2,dπ/2]. Therefore, an estimate of dπ/2 must be

known. A rough estimate for dπ/2 can be found by considering the 2 dimensional reduced model:

dπ/2 = (u+ cn,maxπ/2)/(rmax− cr,max) (4.13)

This is the minimum radius of curvature, u/(rmax− cr,max), expanded to account the drift imposed by the worst

case disturbance.

35

Page 44: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

0 10 20 30 40-0.1

0

0.1

0.2

t

δ rFigure 4.3: Fin angle with (thick line) and without (thin line) penalization of fin angular velocity.

4.2.3 Simulation and experimental results

The controllers were implemented in C++ and integrated in the AUV control software. During the development

phase, the control software was also tested in the simulation environment described in [33]. This simulation envi-

ronment consists of replacing the sensor and actuator drivers by a simulation task offering the exact same software

interfaces. The CPU time consumed by the simulation task is sufficiently low so that the whole application can

be tested on the target CPU while standing on the bench. This allows us to detect potential problems in the

implementation of the control system before deploying the vehicles on water.

The model parameters for the NAUV are the following: u = 1.5, k1 = −3, k2 = −3, k3 = 5.5, k4 = 20,

rmax = 0.4, δr,max = 0.26, cv,max = 0.5, rv,max = 0 (straight paths); the input δr,d(t) is chosen from a set of 31

uniformly spaced values, ranging from −δr,max to δr,max; cv(t) is chosen from a set of 21 uniformly spaced values,

ranging from −cv,max to cv,max; the control period is ∆ = 0.05 seconds; the maximum acceptable cross-track error

is defined as dmax = 6 meters; after some experiments, we found that Kd = 1, Kr = 300 and Kδr= 300 provided

satisfactory results.

The value function VS(x) was computed on a 301× 361× 21× 31 grid. The choice of this grid size was

trade-off between: a) computation time and memory requirements; b) accuracy. In general, the grid resolution

has a direct impact in the accuracy of the final solution of the numerical computation. This is because errors may

propagate and even accumulate during the computation of the value function. However, after its computation, the

value function may be resampled by a less dense grid or by a multiresolution grid, such as in [95]. The objective

of such resampling is to reduce the storage requirements in the target control system and, eventually, to enable

loading of all data to main memory; for instance, NAUV’s computer system has 1GB of RAM and therefore it

allows that approach. In cases when not enough main memory is available, the data can loaded as needed. This

is a reasonable alternative, specially if the data is stored on a solid-state drive (SSD). In general, the seek time for

this type of devices is small enough to meet the real-time requirements of most applications.

In what concerns the numerical solution of the MTTR, the range of the d dimension was defined as [−15,15].The threshold dπ/2 was found to be dπ/2 = 9. Therefore, in what concerns the d dimension, we only store data for

the range [−9,9] in the target system.

Figure 4.3 illustrates the effect of omitting the penalization of δr (i.e., of setting Kδr= 0). As expected, higher

oscillation is observed in the fin angle (δr).

Figure 4.4 shows one simulation for the hybrid controller (3.1), considering typical measurement errors. Note

that the initial cross-track error is larger than dπ/2. The controller starts in mode q0 and switches to q1 after 13

seconds. We assume that the estimated state is affected by an offset whose value is taken from a standard normal

distribution at every 3 seconds. This mimics the behaviour of the acoustic navigation system.

Figure 4.5 shows the results obtained for an underwater operation with the NAUV using the designed DP

control strategy. The operations were performed at the marina of the APDL harbour, Matosinhos, Portugal. The

operation consisted of following a rectangular path. The control system was configured to switch to a new segment

of the path as soon as the vehicle was at distance of 5 meters from the current segment’s endpoint (the vertexes of

the rectangle, in this case). The marine currents were small (about 10 cm/s) and constant during the operations.

We must remark that the most of the disturbances observed in the trajectories of Fig. 4.5(a) and Fig. 4.5(b)

are mainly due to navigation errors introduced by the noisy updates of the acoustic system. Nevertheless, it

can be observed that the results are similar to those obtained in simulation. The cross-track error only exceeds

36

Page 45: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

0 50 100 150 200 250 300

0

5

10

x

y

(a) Trajectory on the x-y plane

0 50 100 150 200

-0.2

-0.1

0

0.1

0.2

0.3

t

δ r

(b) Fin angle

Figure 4.4: AUV starting with a large cross-track error and subject to measurement errors. The objective is

to follow the x axis. The thick line represents the actual trajectory and the thin line represents the estimated

trajectory, as used by the controller.

the expected maximum (6 meters, as defined above) in some of the sharp turns corresponding to the changes of

segment. The actuation signal was also relatively smooth, except at the sharp turns. This is not surprising since

the controller was not designed to handle such sharp turns. On those instants, the controller switched to the MTTR

control law which, as expected, produces a more aggressive actuation.

4.3 Formation control

4.3.1 Model

We consider the problem of formation control. In this case the surge velocity will be considered as a control input.

The turning radius of AUVs is almost velocity independent for the typical range of velocities. In order to model

that fact we consider the following slightly modified model:

x

y

ψ

=

ucos(ψ)+ cx

usin(ψ)+ cy

κiu+ cr

(4.14)

where κ is a control input that determines the vehicle curvature. Note that κu is the angular velocity r as before.

We are interested in the problem of keeping a static formation of N vehicles along a reference trajectory

(x0(t),y0(t),ψ0(t)). The reference trajectory is generated by a system of the form (4.14); we will designate this

system as virtual vehicle. By rewriting the system equations with respect to the virtual vehicle’s body fixed frame,

it is possible to formulate the tracking problem for each vehicle in a 3-dimensional space. This is a common

procedure, that can be traced as far back as [55]. With that in mind, we consider the following model for the

motion of each vehicle i ∈ . . . ,N:

si

di

ψr,i

=

−u0 +ui cos(ψr,i)+ r0di

ui sin(ψr,i)+ r0si + cd

κiui− r0

(4.15)

where xi =[

si di ψr,i

]′defines the pose of vehicle i > 0 with respect to the frame fixed on the virtual vehicle;

u0 is the speed of the virtual vehicle with added disturbances; cd models the effects of disturbances and model

uncertainties; all inputs - ui, κi, u0, r0, and cd - are bounded. We assume the vehicles do not know the current

values of u0, r0 and cd .

Define O = oi : i = 1, ..,N. Consider the problem of tracking a reference trajectory (x0(t),y0(t),ψ0(t)) with

a static formation of N vehicles. The static formation is defined by offsets (oi) = (oi cos(ψo,i),oi sin(ψo,i)) with

respect to the body fixed frame of the virtual vehicle. The reference trajectory for each vehicle is:

[

xd,i(t)yd,i(t)

]

=

[

x0(t)+oi(cos(ψo,i)cos(ψ0(t))− sin(ψo,i)sin(ψ0(t)))y0(t)+oi(cos(ψo,i)sin(ψ0(t))+ sin(ψo,i)cos(ψ0(t)))

]

(4.16)

37

Page 46: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

-280 -260 -240 -220 -200 -180

40

60

80

100

120

Nor

th (

m)

East (m)

(a) Trajectory on the x-y plane. The rectangle is the

reference path.

0 50 100 150 200 250-10

-5

0

5

10

Time (s)

Cro

ss-t

rack

err

or (

m)

(b) Cross-track error

0 50 100 150 200 250

-0.2

-0.1

0

0.1

0.2

0.3

Time (s)

Fin

ang

le (

rad)

(c) Fin angle

Figure 4.5: Underwater operation at the APDL harbour, Matosinhos, Portugal.

Problem: Given O , find the decentralized feedback control laws fi(xi) and the regions Si such that once xi

is inside a region Si, the vehicles will not collide, independently of the bounded actions of the virtual vehicle and

remaining disturbances. The operation inside Si should be optimized with respect to the following running cost:

Li(xi,ai) = ks,i(si−oi cos(ψo,i))2 + kd,i(di−oi sin(ψo,i))

2 + ku,iu2i + kri,i(κiui)

2, (4.17)

ai>0 =[

ui ri

]′, a0 =

[

u0 r0 cd

]′and f (xi(t),ai(t),a0(t)) is the right hand side of (4.15).

We do not discuss the problem of safely driving each vehicle into the respective Si. Still, this problem

formulation leaves some ambiguity in what concerns the definition of Si. We assume that the (si,di) position of

each vehicle will be constrained to a circle centred in oi; therefore, Si will be the MRCIS with respect to this

circle.

If no symmetries are found, this problem implies the computation of N value functions of the form:

Vi(x) = supβ∈∆0

infai(.)∈Ai

Ji(xi(.),ai(.),β [ai(.)]),xi(0) = x (4.18)

where ∆0 is the set of nonticipating strategies for the adversarial input a0; Ai is the set of piecewise constant input

sequences such that ai(t) belongs to a compact set.

4.3.2 Simulation results

We consider a formation of two identical AUV moving in parallel with the virtual vehicle. The offsets are o1 =

(0,6) and o2 = (0,−6). The first AUV is allowed to stay in R1 = x :

s21 +(d1−6)2 ≤ 6. The second is

allowed to stay in R2 = x :

s21 +(d1 +6)2 ≤ 6. Given the symmetry, in this case we just have to compute the

value function for one of the vehicles. By appropriate change of coordinates, this value function is also used for

the second vehicle. The remaining problem data is u0 = 1 m/s, 1 < ui < 2 (m/s), −0.25 < κi < 0.25 and cd = 0.

A control rate of 20 Hz is assumed.

38

Page 47: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

-5 0 50

2

4

6

8

10

12

s

dFigure 4.6: Sample trajectories on the s−d plane for the tracking of a straight path at constant speed. The initial

positions are marked by a cross. The star marks the desired position of the AUV in the s−d plane. The AUV is

required to stay inside the circle.

We start with the case of straight line trajectory r0. The running cost was defined as L1(x1,u1) = s21+(d1−6)2.

A regular grid of 61×61×73 nodes was used. A discrete space of controls with 11×11 distinct values was used.

With this settings, the computation of V1(x) takes almost two hours (approximately 300 iterations) on a workstation

with a Intel(R) Xeon(R) X3220 CPU (2.40GHz), using four threads of computation. Fig. 4.6 depicts trajectories

on the s−d plane for different starting configurations ((−3,3,π/4), (3,3,π/4), (−3,9,π/4) and (3,9,π/4)). All

trajectories converge to the desired reference and remain inside R1 at all times.

These AUVs have a positive lower bound for velocity. This is because, as seen before, the fin actuation depends

on the vehicle’s velocity. Without sufficient velocity the horizontal fins will not be able to maintain the desired

depth. This is analogous to the stall speed on airplanes. In this case, the considered velocity of the vertical vehicle

equals this lower bound. Therefore, it can be observed that when the AUV starts ahead of the desired position, it

follows an oscillatory trajectory. The AUV does that in order to loose speed with respect to the reference trajectory

(the actual longitudinal speed remains constant).

The computations were repeated with r0 ∈ −0.05,0.05 (rad), u0 = 1.5 m/s and L1(x1,u1) = s21 + (d1 −

6)2+600(uiκi).2. Fig. 4.7 shows the simulation results for a trajectory involving several turns using the maximum

curvature. In this case, the vehicles are able to take advantage of the whole range of input ui. Both vehicles follow

the desired trajectory with a bounded error (see Fig. 4.7(b)).

39

Page 48: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

0 50 100 150-20

0

20

40

60

x-y trajectories

y (m

)

x (m)

(a) Trajectories on the x−y plane. Middle trajectory

is the reference.

-5 0 50

2

4

6

8

10

12

d 1 (m

)

s1 (m)

(b) Trajectory on the s1−d1 plane.

0 50 100 1500

0.5

1

1.5

2

2.5

Time (s)

u 1 (m

/s)

(c) Velocity command for AUV 1 (top trajectory on

Fig. 4.7(a)).

0 50 100 150

-0.2

-0.1

0

0.1

0.2

0.3

Cur

vatu

re c

omm

and

(rad

ians

)

Time (s)

(d) Curvature command for AUV 1 (top trajectory on

Fig. 4.7(a)).

Figure 4.7: Simulation with u0 = 1.5 m/s.

40

Page 49: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Chapter 5

Research Plan

In the case of the constrained finite and infinite horizon problems, it is easy to identify the discontinuities of the

value function. In general, the same is not true for the minimum time problem. We plan to develop an algorithm

for the minimum time problem based on the ideas of single pass algorithms (e.g., [88]). The new algorithm will

still assume a finite control rate. The main challenge is the development of a method to identify and to track the

discontinuities. We plan to use elements from the algorithm developed in section 3.6, namely the concept of well

defined cells and multiple time-steps on each iteration.

The use of the discrete-time dynamic programming principle to handle the control design of the sampled data

system is a sound approach, at least from the theoretical point of view. However, when adversarial inputs are

considered, this is not true. A straightforward application of the discrete-time dynamic programming principle

for differential games will assume the same finite sampling rate for both control and adversarial inputs. Given

that the disturbances can change its value at any time, such approach may not be most suitable to model the

adversarial input in certain scenarios. Namely, the sampling rate should be higher than the upper frequency of the

disturbance; otherwise, such approach is optimistic rather than conservative. A more suitable formulation of the

dynamic programming principle would be as follows:

V (t,x) = infa∈Ua

supb∈Ub

∫ ∆

0L(y(x,τ , [0,∆]×a,b),a,b(τ))dτ +V (y(x,∆, [0,∆]×a,b))

(5.1)

In this case, the feedback control law would be given by

f (x) ∈ argmina∈Ua

supb∈Ub

∫ ∆

0L(y(x,τ , [0,∆]×a,b),a,b(τ))dτ +V (y(x,∆, [0,∆]×a,b))

(5.2)

When using numerical approaches, it would be desirable to let the corresponding sampling time to be as high

as possible or physically meaningful for the adversarial input. We propose to study efficient implementations of

this approach and to identify scenarios in which it produces results which are clearly more accurate than the ones

obtained assuming the same sampling rate for both inputs. We will adapt the developed algorithms to take into

account this formulation. We remark that the results obtained on chapter 4 assume the same sampling time for the

control and adversarial inputs. We plan to employ the algorithm developed in section 3.6 as the building block for

the numerical solution of the infinite horizon differential game.

Let us recall the two-mode control strategy presented in chapter 1. From the theoretical point of view, this

control strategy has nice global properties: the domain of attraction for the closed loop system is the MRCIS.

We plan to characterize the stability of the closed loop system for the considered applications when inside the

MRCIS. This is important since for a general form of the running cost, there is no simple way to predict the large

time behaviour of the controlled system. Moreover, the design methodology provides just an approximation of the

infinite horizon value function. We plan to base such study on the numerical approximation of the invariant set of

the closed loop system. In this case, it will be essential to obtain a good over-approximation of the invariant set.

As described in the previous sections, our design follows a pessimistic approach; during the computation of

the value function, informational advantage is given to the adversarial input. In practice, the disturbances do not

always take the worst case value. Moreover, in certain cases, the control system can estimate tighter bounds for

the disturbances during system operation. One practical example of that would be the robustness of AUV motion

with respect to marine currents c. In this case, the value function is computed assuming a range ‖c‖ ≤ cmax;

41

Page 50: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

however, occasionally, a tighter bound can be estimated for that disturbance during system operation. On-the-

fly computation of a new value function is, in general, infeasible. Storing several value functions, tailored for

different scenarios, can also be expensive. A simpler alternative is the computation of the optimal control taking

into account the gathered information. This will reduce the range of actions that must be considered for the

adversarial input. Even with complete information and in the absence of disturbances, this will not ensure that the

resulting trajectory is the same as the one that would be obtained should the value function had been computed

assuming an undisturbed system. The main question is how much can be gained by following such approach.

We also plan to consider measurement errors. These errors may be due to sensor accuracy, noise and commu-

nication delays. For instance, in the case of networked applications, the delays can be much larger than the period

of each local controller. The problem of delays has been addressed in the context of dynamic programming in [63]

but this is still an active field of research [45]. For the continuous-time case, this becomes an infinite dimensional

problem and few practical results seem to exist. We plan to address some of these aspects in the final design

methodology.

Another research avenue is the optimal control of hybrid systems [20] for which the discrete state variable acts

as a memory. This is different from basic switched systems (see [8,82,101] and references therein) because in this

case the controlled transitions will be allowed only between certain discrete states. Moreover, autonomous (state

dependent) transitions will also be considered. Some preliminary work has already been done on this topic [34,35],

namely on the numerical computation of hybrid value functions. The considered problems combine the following

aspects: a) both controlled (discrete input) and autonomous (state dependent) transitions between discrete states

are allowed; b) controlled transition are allowed to be triggered only at certain points of the hybrid state space; c)

running cost depends on the discrete state; d) nonlinear continuous-time dynamics are assumed. We are not aware

of any published work on DP for hybrid systems combining all of the above mentioned aspects and offering a

general efficient computation method. In [20], where the author presented a taxonomy for hybrid systems which

subsumed previous models, optimal control of hybrid systems in the setting of DP is discussed. However, no

general efficient computation method is devised there. That line of development is still an open problem today.

Most of the results toward efficient methods are obtained by restricting their range of applicability just to certain

classes of problems. In [81] the authors develop a Hybrid Bellman Equation for systems with regional dynamics

(i.e., only autonomous transitions are considered). The results are demonstrated only for problems featuring two

dimensional continuous dynamics, with quadratic cost. Practical applicability of the method to more complex

systems is not discussed. A similar problem for piecewise affine systems is presented in [10], also using ideas

from DP. There is a vast body of work on dynamic optimization of switched systems and multi-model systems

(switched systems with state jumps) but none seems to meet all the above mentioned aspects (see [8, 82, 101] and

references therein).

Fig 5.1 presents the Gantt Diagram for the the thesis research. The research tasks are independent from each

other. In case of project slip or unpredicted difficulties, the order of execution of the tasks can be easily changed.

42

Page 51: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

2011 2012

The case of adversarial inputs

Applications to hybrid systems

Algorithm for the minimum time problem

Verification of the closed loop system

Sub-optimal behaviour of the adversarial input

Measurement errors

Thesis redaction

Figure 5.1: Gantt diagram for the thesis research

43

Page 52: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

44

Page 53: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Bibliography

[1] Pascoal A., Silvestre C., and Oliveira P. Vehicle and mission control of single and multiple autonomous ma-

rine robots. In G. N. Roberts and Robert Sutton, editors, Advances in Unmanned Marine Vehicle. Institution

of Electrical Engineers Press, 2006.

[2] A. Pedro Aguiar and Joao P. Hespanha. Trajectory-tracking and path-following of underactuated au-

tonomous vehicles with parametric modeling uncertainty. IEEE Transactions on Automatic Control,

52(8):1362–1379, 2007.

[3] L. E. Aguilar, T. Hamel, and P. Soueres. Robust path following control for wheeled robots via sliding

mode techniques. In Proceedings of the 1997 IEEE/RSJ Int. Conf. Intell. Robot. Syst., pages 1389–1395,

September 1997.

[4] M. Aicardi, G. Casalino, G. Indiveri, A. Aguiar, P. Encarnacao, and A. Pascoal. A planar path follow-

ing controller for underactuated marine vehicles. In Proc. 9th Mediterranean Conference on Control and

Automation, Dubrovnik, Croatia, 2001.

[5] Alessandro Alessio and Alberto Bemporad. A survey on explicit model predictive control. In Lalo Magni,

Davide Raimondo, and Frank Allgower, editors, Nonlinear Model Predictive Control, volume 384 of Lec-

ture Notes in Control and Information Sciences, pages 345–369. Springer Berlin - Heidelberg, 2009.

[6] Ken Alton and Ian M. Mitchell. Fast marching methods for stationary hamilton-jacobi equations with

axis-aligned anisotropy. SIAM Journal of Numerical Analysis, 47(1):363–385, 2008.

[7] J.P. Aubin. Viability Theory. Systems & Control: Foundations & Applications. Birkhauser Boston Inc.,

Boston, 1991.

[8] H. Axelsson, M. Boccadoro, M. Egerstedt, P. Valigi, and Y. Wardi. Optimal mode-switching for hybrid

systems with varying initial states. Nonlinear Analysis: Hybrid Systems, 2(3):765 – 772, 2008.

[9] Stanley Bak, Joyce McLaughlin, and Daniel Renzi. Some improvements for the fast sweeping method.

SIAM J. Sci. Comput., 32:2853–2874, September 2010.

[10] M. Baotic, F.J. Christophersen, and M. Morari. Constrained optimal control of hybrid systems with a linear

performance index. IEEE Transactions on Automatic Control, 51(12):1903–1919, Dec. 2006.

[11] M. Bardi, T. E. S. Raghavan, and T. Parthasarathy, editors. Stochastic and Differential Games: Theory

and Numerical Methods, volume 4 of Annals of the International Society of Dynamic Games. Birkhauser

Boston, Basel, Switzerland, 1999.

[12] Martino Bardi. On differential games with long-time-average cost. In P. Bernhard, V. Gaitsgory, and

O. Pourtallier, editors, Advances in Dynamic games and their Applications, volume 10 of Annals of the

International Society of Dynamic Games. Birkhauser Boston, 2009.

[13] Martino Bardi and I. Capuzzo-Dolcetta. Optimal control and viscosity solutions of Hamilton-Jacobi-

Bellman equations. Birkhauser Boston, 1997.

[14] Martino Bardi, Maurizio Falcone, and Pierpaolo Soravia. Numerical methods for pursuit-evasion games

via viscosity solutions. In M. Bardi, T. E. S. Raghavan, and T. Parthasarathy, editors, Stochastic and

Differential Games: Theory and Numerical Methods, volume 4 of Annals of the International Society of

Dynamic Games, chapter 3, pages xvi + 380. Birkhauser Boston, Basel, Switzerland, 1999.

45

Page 54: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

[15] Alfredo Bellen and Marino Zennaro. Numerical Methods for Delay Differential Equations. Numerical

Mathematics and Scientific Computation, Oxford Science Publications. Oxford University Press, Oxford,

2003.

[16] R. Bellman. Dynamic programming. Princeton University Press, Princeton, 1957.

[17] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 1995.

[18] Franco Blanchini and Stefano Miani. Set-Theoretic Methods in Control. Birkhauser, Boston, 2008.

[19] O. Bokanowski, E. Cristiani, and H. Zidani. An efficient data structure and accurate scheme to solve front

propagation problems. J. Sci. Comput., 42:251–273, February 2010.

[20] Michael Branicky. Studies in Hybrid Systems: Modeling, Analysis and Control. PhD thesis, MIT, 1995.

[21] M.S. Branicky, R. Hebbar, and G Zhang. A fast marching algorithm for hybrid systems. In Proceedings of

the IEEE Conference on Decision and Control, pages 4897–4902, Phoenix, AZ, December 1999.

[22] Michele Breton and Krzysztof Szajowski, editors. Advances in Dynamic Games: Theory, Applications,

and Numerical Methods for Differential and Stochastic Games, volume 11 of Annals of the International

Society of Dynamic Games. Birkhauser Boston, 2011.

[23] E.F. Camacho and C. Bordons. Model predictive control. Advanced textbooks in control and signal pro-

cessing. Springer, 2004.

[24] P. Cardaliaguet, M. Quincampoix, and P. Saint-Pierre. Set-valued numerical analysis for optimal control and

differential games. In M. Bardi, T. E. S. Raghavan, and T. Parthasarathy, editors, Stochastic and Differential

Games: Theory and Numerical Methods, volume 4 of Annals of International Society of Dynamic Games.

Birkhauser Boston, 1999.

[25] Elisabetta Carlini, Maurizio Falcone, and Roberto Ferretti. An efficient algorithm for hamilton-jacobi

equations in high dimension. Computing and Visualization in Science, 7(1):15–29, 2004.

[26] Thomas C. Cecil, Stanley J. Osher, and Jianliang Qian. Simplex free adaptive tree fast sweeping and

evolution methods for solving level set equations in arbitrary dimension. Journal of Computational Physics,

213(2):458 – 473, 2006.

[27] Wang Chi-Shu. Essentially non-oscillatory and weighted essentially non-oscillatory schemes for hyper-

bolic conservation laws. Technical report, Institute for Computer Applications in Science and Engineering

(ICASE), 1997.

[28] Frank Christophersen, Mato Baotic, and Manfred Morari. Optimal control of piecewise affine systems: A

dynamic programming approach. In Thomas Meurer, Knut Graichen, and Ernst Gilles, editors, Control and

Observer Design for Nonlinear Finite and Infinite Dimensional Systems, volume 322 of Lecture Notes in

Control and Information Sciences, pages 183–198. Springer Berlin - Heidelberg, 2005.

[29] M. Crandall and P.-L. Lions. Viscosity solution of hamilton-jacobi equations. Transactions of the American

Mathematical Society, 277:1–42, 1983.

[30] E. Cristiani and M. Falcone. Fully-discrete schemes for the value function of pursuit-evasion games with

state constraint. In Pierre Bernhard, Vladimir Gaitsgory, and Odile Pourtallier, editors, Advances in Dy-

namic Games and Their Applications: Analytical and Numerical Developments, volume 10 of Annals of

International Society of Dynamic Games, pages 178–205. Birkhauser Boston, 2009.

[31] Emiliano Cristiani. A fast marching method for hamilton-jacobi equations modeling monotone front prop-

agations. J. Sci. Comput., 39(2):189–205, 2009.

[32] Jorge Estrela da Silva and Joao Borges de Sousa. Dynamic programming techniques for feedback control.

In Proceedings of the IFAC 18th World Congress, Milan, August 2011.

[33] Jorge Estrela da Silva, Bruno Terra, Ricardo Martins, and Joao Borges de Sousa. Modeling and simulation

of the lauv autonomous underwater vehicle. In 13th IEEE IFAC International Conference on Methods and

Models in Automation and Robotics, Szczecin, Poland, August 2007.

46

Page 55: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

[34] J. Borges de Sousa, Jorge Estrela da Silva, and Fernando Lobo Pereira. New problems of optimal path

coordination for multi-vehicle systems. In Proceedings of the 10th European Control Conference, Budapest,

Hungary, August 2009.

[35] Joao Borges de Sousa and Jorge Estrela da Silva. Cooperative path planning in the presence of adversarial

behavior. In 4th International Scientific Conference on Physics and Control, Catania, Italy, September

2009.

[36] E. W. Dijkstra. A note on two problems in connection with graphs. Numerische Mathematik I, pages

269–271, 1959.

[37] K.D. Do and J. Pan. Underactuated ships follow smooth paths with integral actions and without velocity

measurements for feedback: theory and experiments. IEEE Transactions on Control Systems Technology,

14(2):308 – 322, march 2006.

[38] R.J. Elliott and N.J. Kalton. The existence of value in differential games. Memoirs of the American

Mathematical Society. American Mathematical Society, 1972.

[39] P. Encarnacao, A. Pascoal, and M. Arcak. Path following for marine vehicles in the presence of unknown

currents. In Proceedings of SYROCO’2000 - 6th International IFAC Symposium on Robot Control, vol-

ume II, pages 469–474, Vienna, Austria, 2000.

[40] Douglas Enright, Frank Losasso, and Ronald Fedkiw. A fast and accurate semi-lagrangian particle level set

method. Computers & Structures, 83(6-7):479 – 490, 2005.

[41] M. Falcone. A numerical approach to the infinite horizon problem of deterministic control theory. Applied

Mathematics and Optimization, 15:1–13, 1987.

[42] M. Falcone. Numerical methods for differential games via pdes. International Game Theory Review,

8(2):231–272, 2006.

[43] M. Falcone and R. Ferretti. Semi-lagrangian schemes for hamilton-jacobi equations, discrete representation

formulae and godunov methods. J. Comput. Phys., 175:559–575, January 2002.

[44] M. Falcone and P. Stefani. Advances on parallel algorithms for the isaacs equation. In Andrzej S. Nowak

and Krzysztof Szajowski, editors, Advances in Dynamic Games: Applications to Economics, Finance,

Optimization, and Stochastic Control, volume 7 of Annals of the International Society of Dynamic Games,

pages 515–544. Birkhauser Boston, 2005.

[45] Salvatore Federico, Ben Goldsys, and Fausto Gozzi. Hjb equations for the optimal control of differential

equations with delays and state constraints, i: regularity of viscosity solutions. SIAM Journal of Control

and Optimization, 48(8):4910–4937, 2010.

[46] Wendell H. Fleming and Halil Mete Soner. Controlled Markov Processes and Viscosity Solutions. Springer,

New York, 2006.

[47] T. I. Fossen. Guidance and Control of Ocean Vehicles. John Wiley and Sons Ltd., 1994.

[48] Brian Gough. GNU Scientific Library Reference Manual - 2nd Edition. Network Theory Ltd., 2003.

[49] A. Grancharova and T. A. Johansen. Computation, approximation and stability of explicit feedback min-

max nonlinear model predictive control. Automatica, 45:1134–1143, 2009.

[50] Lars Grune and Jurgen Pannek. Nonlinear Model Predictive Control - Theory and Algorithms. Communi-

cations and Control Engineering. Springer Verlag, London, 2011.

[51] Ami Harten, Bjorn Engquist, Stanley Osher, and Sukumar R. Chakravarthy. Uniformly high order accurate

essentially non-oscillatory schemes, 111. J. Comput. Phys., 71:231–303, August 1987.

[52] A. J. Healey and D. B. Marco. Slow speed flight control of autonomous underwater vehicles: Experimen-

tal results with the nps auv ii. In Proceedings of the 2nd International Offshore and Polar Engineering

Conference (ISOPE), San Francisco, CA, 1992.

47

Page 56: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

[53] A.J. Healey and D. Lienard. Multivariable sliding mode control for autonomous diving and steering of

unmanned underwater vehicles. IEEE Journal of Oceanic Engineering, 18(3):327–339, jul 1993.

[54] G. Indiveri, M. Aicardi, and G. Casalino. Nonlinear time-invariant feedback control of an underactuated

marine vehicle along a straight course. In Proceedings of the 5th IFAC Conference on Manoeuvring and

Control of Marine Crafts, MCMC 2000, pages 221–226, Aalborg, Denmark, August 2000.

[55] Rufus Isaacs. Differential games; a mathematical theory with applications to warfare and pursuit, control

and optimization. John Wiley & Sons, New York, 1965.

[56] Chiu Yen Kao, Carmeliza Navasca, and Stanley Osher. The lax-friedrichs sweeping method for optimal

control problems in continuous and hybrid dynamics. Nonlinear Analysis (Invited Talks from the Fourth

World Congress of Nonlinear Analysts - WCNA 2004), 63:e1561–e1572, 2005.

[57] Chiu-Yen Kao, Stanley Osher, and Jianliang Qian. Legendre-transform-based fast sweeping methods for

static hamilton-jacobi equations on triangulated meshes. J. Comput. Phys., 227(24):10209–10225, 2008.

[58] N.N. Krasovskii and A.I. Subbotin. Game-theoretical control problems. Springer-Verlag, New York, 1988.

[59] L. Lapierre and B. Jouvencel. Robust nonlinear path-following control of an auv. IEEE Journal of Oceanic

Engineering, 33(2):89 –102, April 2008.

[60] Shingyu Leung and Hongkai Zhao. A grid based particle method for moving interface problems. Journal

of Computational Physics, 228:2993–3024, 2009.

[61] E. Lewis, editor. Principles of Naval Architecture. Society of Naval Architects and Marine Engineers,

1989. 2nd revision.

[62] Fengyan Li, Chi-Wang Shu, Yong-Tao Zhang, and Hongkai Zhao. A second order discontinuous galerkin

fast sweeping method for eikonal equations. J. Comput. Phys., 227(17):8191–8208, 2008.

[63] David G. Luenberger. Cyclic dynamic programming: A procedure for problems with fixed delay. Opera-

tions Research, 19(4):1101–1110, July-August 1971.

[64] Hiroyoshi Mitake. Asymptotic solutions of hamilton-jacobi equations with state constraints. Applied Math-

ematics and Optimization, 58(3):393–410, 2008.

[65] Ian Mitchell. Application of Level Set Methods to Control and Reachability Problems in Continuous and

Hybrid Systems. PhD thesis, Stanford University, 2002.

[66] Ian M. Mitchell. A toolbox of level set methods. Technical Report TR-2007-11, UBC Department of

Computer Science, June 2007.

[67] Ian M. Mitchell. The flexible, extensible and efficient toolbox of level set methods. Journal of Scientific

Computing, 35(2-3):300–329, June 2008.

[68] Ian M. Mitchell and Jeremy A. Templeton. A toolbox of hamilton-jacobi solvers for analysis of nondeter-

ministic continuous and hybrid systems. In HSCC, pages 480–494, 2005.

[69] I.M. Mitchell, A.M. Bayen, and C.J. Tomlin. A time-dependent hamilton-jacobi formulation of reachable

sets for continuous dynamic games. IEEE Transactions on Automatic Control, 50(7):947–957, July 2005.

[70] D.R. Nelson, D.B. Barber, T.W. McLain, and R.W. Beard. Vector field path following for miniature air

vehicles. IEEE Transactions on Robotics, 23(3):519 –529, June 2007.

[71] D. Nesic, A. Lorıa, E. Panteley, and A. R. Teel. On stability of sets for sampled-data nonlinear inclusions via

their approximate discrete-time models and summability criteria. SIAM J. Control Optim., 48:1888–1913,

June 2009.

[72] Urbano Nunes and L. Bento. Data fusion and path-following controllers comparison for autonomous vehi-

cles. Nonlinear Dynamics, 49:445–462, 2007.

48

Page 57: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

[73] S. Osher and R. Fedkiw. Level Set Methods and Dynamic Implicit Surfaces. Springer-Verlag, New York,

2002.

[74] S. Osher and R. P. Fedkiw. Level set methods: An overview and some recent results. Journal of Computa-

tional Physics, 169:463–502, 2001.

[75] S. Osher and J. A. Sethian. Fronts propagating with curvature-dependent speed: Algorithms based on

hamilton–jacobi formulations. Journal of Computational Physics, 79:12–49, 1988.

[76] Stanley Osher and Nikos Paragios, editors. Geometric Level Set Methods in Imaging, Vision, and Graphics.

Springer-Verlag, 2003.

[77] Stanley Osher and Chi-Wang Shu. High-order essentially nonoscillatory schemes for hamilton-jacobi equa-

tions. SIAM Journal on Numerical Analysis, 28(4):907–922, August 1991.

[78] Jianliang Qian, Yong-Tao Zhang, and Hong-Kai Zhao. A fast sweeping method for static convex hamilton-

jacobi equations. J. Sci. Comput., 31(1-2):237–271, 2007.

[79] Christian Rasch and Thomas Satzger. Remarks on the o(n) implementation of the fast marching method.

IMA Journal of Numerical Analysis, 29(3), 2007.

[80] Claude Samson. Path following and time-varying feedback stabilization of a wheeled mobile robot. In Int.

Conf. ICARCV’92, Singapore, 1992.

[81] A. Schollig, P.E. Caines, M. Egerstedt, and R. Malhame. A hybrid bellman equation for systems with

regional dynamics. In Proceedings of the 46th IEEE Conference on Decision and Control, pages 3393–

3398, Dec. 2007.

[82] C. Seatzu, D. Corona, A. Giua, and A. Bemporad. Optimal control of continuous-time switched affine

systems. IEEE Transactions on Automatic Control, 51(5):726–741, May 2006.

[83] J. A. Sethian. A fast marching level set method for monotonically advancing fronts. Proceedings of the

National Academy of Sciences of the United States of America, 93:1591–1595, 1996.

[84] J. A. Sethian. A review of theory, algorithms, and applications of level set methods for propagating inter-

faces. Acta Numerica, 5:309–395, 1996.

[85] J. A. Sethian. Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational

Geometry, Fluid Mechanics, Computer Vision and Materials Sciences. Cambridge University Press, 1999.

[86] J. A. Sethian and A. Vladimirsky. Ordered upwind methods for static hamilton-jacobi equations. Proceed-

ings of the National Academy of Sciences of the USA, 98(20):11069–11074, September 2001.

[87] J.A. Sethian and A. Vladimirsky. Ordered upwind methods for hybrid control. In M.R. Greenstreet

C.J. Tomlin, editor, Hybrid Systems: Computation and Control: 5th International Workshop (Proceedings),

volume 2289, pages 393–406, Stanford, CA, USA, March 2002. Springer-Verlag GmbH.

[88] James A. Sethian and Alexander Vladimirsky. Ordered upwind methods for static hamilton-jacobi equa-

tions: Theory and algorithms. SIAM J. Numer. Anal., 41(1):325–363, 2003.

[89] Jorge Estrela Silva and Joao Borges de Sousa. A dynamic programming approach for the control of au-

tonomous vehicles on planar motion. In Proceedings of International Conference on Autonomous and

Intelligent Systems (AIS 2010), Povoa de Varzim, Portugal, June 2010.

[90] Jorge Estrela Silva and Joao Borges de Sousa. A dynamic programming approach for the motion control

of autonomous vehicles. In Proceedings of the 49th IEEE Conference on Decision and Control, December

2010.

[91] J.J.E. Slotine and W. Li. Applied Nonlinear Control. Prentice-Hall, Englewood Cliffs, New Jersey 07632,

1991.

[92] D.A. Smallwood and L.L. Whitcomb. Model-based dynamic positioning of underwater robotic vehicles:

theory and experiment. IEEE Journal of Oceanic Engineering, 29(1):169 – 186, jan. 2004.

49

Page 58: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

[93] E.D. Sontag. A lyapunov-like characterization of asymptotic controllability. SIAM J. Control Optim.,

21(3):462–471, 1983.

[94] O.J. Sordalen and C. Canudas de Wit. Exponential control law for a mobile robot: extension to path

following. pages 2158 –2163 vol.3, May 1992.

[95] S. Summers, C.N. Jones, J. Lygeros, and M. Morari. A multiresolution approximation method for fast

explicit model predictive control. IEEE Transactions on Automatic Control, 2011.

[96] Claire Tomlin, Ian Mitchell, Alexandre Bayen, and Meeko Oishi. Computational techniques for the verifi-

cation of hybrid systems. Proceedings of the IEEE, 91(7):986–1001, July 2003.

[97] Yen-Hsi Richard Tsai, Li-Tien Cheng, Stanley Osher, and Hong-Kai Zhao. Fast sweeping algorithms for a

class of hamilton-jacobi equations. SIAM Journal on Numerical Analysis, 41:673–694, 2003.

[98] J. N. Tsitsiklis. Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic

Control, 40(9):1528–1538, September 1995.

[99] Diederik Verscheure, Bram Demeulenaere, Jan Swevers, Joris De Schutter, and Moritz Diehl. Time-

optimal path tracking for robots: A convex optimization approach. IEEE Transaction on Automatic Control,

54(10):2318–2327, October 2009.

[100] A. Vladimirsky. Static pdes for time-dependent control problems. Interfaces and Free Boundaries,

8(3):281–300, 2006.

[101] S. Wei, K. Uthaichana, M. Zefran, R. A. DeCarlo, and S. Bengea. Applications of numerical optimal control

to nonlinear hybrid systems. Nonlinear Analysis: Hybrid Systems, 1(2):264–279, 2007.

[102] Liron Yatziv, Alberto Bartesaghi, and Guillermo Sapiro. O(n) implementation of the fast marching algo-

rithm. J. Comput. Phys., 212(2):393–399, 2006.

[103] Hongkai Zhao. Parallel implementation of the fast sweeping method. Journal of Computational Mathe-

matics, 25(4):421–429, 2007.

50

Page 59: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

Appendix A

Appendix

A.1 Dijkstra’s algorithm

The following description of the Dijkstra’s algorithm is essentially a quote of [88]. All nodes are divided into

three classes (three possible labels): Far (no information about the correct value of V is known), Accepted (the

correct value of V has been computed), and Considered (adjacent to Accepted).

1. Start with all the nodes in Far.

2. Move the boundary nodes to Accepted, with V (x) = g(x).

3. Move all the nodes x adjacent to the boundary into Considered and evaluate the tentative values.

4. Find the node x with the smallest value among all the Considered.

5. Move x to Accepted.

6. Move the Far nodes adjacent to x into Considered.

7. Reevaluate the value of all the Considered x adjacent to x.

8. If Considered is not empty, then go to 4.

The described algorithm has the computational complexity of O(M log(M)); the factor of log(M) reflects the

necessity to maintain a sorted list of the Considered values to determine the next Accepted node.

A.2 Semi-Lagrangian scheme

As an example of semi-Lagrangian schemes, we describe the method employed in [98], for regular grids. Given a

simplex composed of x, xi and x j, where x is the point to be updated, the tentative value of V (x) is given by

minθ

θV (xi)+(1−θ)V (x j)+ c(x)d(θ)

where d(θ) = ‖x− (θxi + (1− θ)x j)‖2. In a two-dimensional domain, four simplices are considered: x,x+(h,0),x+(0,h), x,x+(−h,0),x+(0,h), x,x+(−h,0),x+(0,−h) and x,x+(h,0),x+(0,−h), where h

is the spacing of the grid; the value of V (x) is given by the minimum of the respective four tentative values.

A.3 Upwind difference scheme

The equations below illustrate the application of the Godunov difference scheme to the Eikonal equation in a

two-dimensional domain (see [97], for instance).

[max(max(D−xi j V,0),−min(D+x

i j V,0))2+

max(max(D−yi j V,0),−min(D+y

i j V,0))2] = c2(x) (A.1)

51

Page 60: Dynamic programming for computer based feedback control of ...dee07006/TRP-jes.pdf · an infinite horizon optimal control problem (OCP). However, for general systems with input and

with D−xi j V = (Vi, j−Vi−1, j)/∆x, D+x

i j V = (Vi+1, j−Vi, j)/∆x; ∆x is the grid spacing in x dimension; the definitions

for the y dimension are similar. The upwind concept consists of choosing the one-sided finite difference Di j

corresponding to the proper propagation of the front. In the case of single-pass methods for the Eikonal equation,

this will be analogous to picking the appropriate nodes from the already computed set while avoiding the still non

considered nodes.

A.4 Essentially Non-oscillatory Scheme

We briefly describe the ENO technique [51] for the second order unidimensional interpolation (see also [27]).

Consider a stencil of four grid nodes xi− 3

2,x

i− 12,x

i+ 12,x

i+ 32

such that xi− 1

2≤ x < x

i+ 12. Consider the interpolation

using the 3 leftmost stencil nodes:

S(x) = P1(x)+V [xi− 3

2,x

i− 12,x

i+ 12](x− x

i− 12)(x− x

i+ 12) (A.2)

where V [xi− 3

2,x

i− 12,x

i+ 12] is the Newton’s second degree divided difference. Likewise, consider the interpolation

using the 3 rightmost stencil nodes:

S(x) = P1(x)+V [xi− 1

2,x

i+ 12,x

i+ 32](x− x

i− 12)(x− x

i+ 12) (A.3)

For a regular grid, (x− xi− 1

2)(x− x

i+ 12) = (x− x

i− 12)(x− x

i− 12− 1) = dx(dx− 1)(∆x)2. The interpolation

formulas change accordingly; we illustrate only for the left case:

S(x) = P1(x)+V < xi− 3

2,x

i− 12,x

i+ 12> dx(dx−1) (A.4)

where V < xi− 3

2,x

i− 12,x

i+ 12> is the Newton’s second degree undivided difference (undivided in the sense that the

division by (∆x)2 is not performed).

The expressions for the left and right Newton’s second degree undivided differences V−≡V < xi− 3

2,x

i− 12,x

i+ 12>

and V+ ≡V < xi− 1

2,x

i+ 12,x

i+ 32> are as follows:

V− = (vi− 3

2−2v

i− 12+ v

i+ 12)/2 (A.5)

V+ = (vi− 1

2−2v

i+ 12+ v

i+ 32)/2 (A.6)

The ENO scheme uses the difference with smaller absolute value. This is because, in general, a smaller difference

corresponds to a stencil sampling a smoother part of the function to be interpolated. Notice that the idea is to avoid

discontinuities.

The scheme can be extended to higher dimensions in a tree-like fashion. For a system of dimension n, the

second order ENO scheme requires 4n−13

computations such as the one described for the unidimensional case.

52