numerical optimal control part 3: function space...

Numerical Optimal Control– Part 3: Function space methods –

SADCO Summer School and Workshop on Optimal and Model Predictive Control– OMPC 2013, Bayreuth –

Matthias Gerdts

Institute of Mathematics and Applied ComputingDepartment of Aerospace Engineering

Universitat der Bundeswehr Munchen (UniBw M)[email protected]

http://www.unibw.de/lrt1/gerdts

Fotos: http://de.wikipedia.org/wiki/MunchenMagnus Manske (Panorama), Luidger (Theatinerkirche), Kurmis (Chin. Turm), Arad Mojtahedi (Olympiapark), Max-k (Deutsches Museum), Oliver Raupach (Friedensengel), Andreas Praefcke (Nationaltheater)

Numerical Optimal Control – Part 3: Function space methods –Matthias Gerdts

Schedule and Contents

Time Topic

9:00 - 10:30 Introduction, overview, examples, indirect method

10:30 - 11:00 Coffee break

11:00 - 12:30 Discretization techniques, structure exploitation, calcula-tion of gradients, extensions: sensitivity analysis, mixed-integer optimal control

12:30 - 14:00 Lunch break

14:00 - 15:30 Function space methods: Gradient and Newton typemethods

15:30 - 16:00 Coffee break

16:00 - 17:30 Numerical experiments


Contents

Introduction

Necessary Conditions

Adjoint Formalism

Gradient MethodGradient Method in Finite DimensionsGradient Method for Optimal Control ProblemsExtensionsGradient Method for Discrete ProblemExamples

Lagrange-Newton MethodLagrange-Newton Method in Finite DimensionsLagrange-Newton Method in Infinite DimensionsApplication to Optimal ControlSearch DirectionExamplesExtensions


Overview on Solution Methods

Indirect approach based on finite dimensional

optimality system

DAE Optimal Control Problem

Discretization:Approximation by finite dimensional problem Function space approach

Indirect approach based on infinite dimensional

optimality system

Direct Approach: Methods for discretized

optimization problem

Direct Approach: Methods for infinite dimensional

optimization problem

SQP methodsInteriorPoint Methods

Gradient methodsPenalty methods

Multiplier methodsDynamic programming

Reduced approach (direct shooting)

Full discretization (collocation)

Reduced approach Full approach

SQP methodsInteriorPoint Methods

Gradient methodsPenalty methods

Multiplier methodsDynamic programming

Semianalytical methods (indirect method, boundary value problems)

Methods for infinite dimensional

complementarity problems and

variational inequalities

(semismooth Newton, JosephyNewton, fixedpoint iteration, projection

methods)

Methods for finite dimensional

complementarity problems and

variational inequalities

(semismooth Newton, JosephyNewton, fixedpoint iteration, projection

methods)


Function Space Methods

ParadigmAnalyze and develop methods for some infinite dimensional optimization problem (e.g.an optimal control problem) of type

Minimize J(z) subject to G(z) ∈ K , H(z) = 0

in the same Banach or Hilbert spaces where z, G and H live.

Why is this useful?

I no immediate approximation error: algorithms work in the same spaces as the problemI massive exploitation of structure possibleI subtle requirements can get lost in discretizations (e.g. smoothing operator for semismooth

methods)I methods can be very fast

What are the difficulties?

I detailed functional analytic background necessary (cannot be expected in an industrialcontext)

I discretizations become necessary at lower level anyway; so, why not discretize right away?I theoretical difficulties with, e.g., state constraints (multipliers are measures; how to handle

them numerically?)






Why is this useful?I no immediate approximation error: algorithms work in the same spaces as the problem

I massive exploitation of structure possibleI subtle requirements can get lost in discretizations (e.g. smoothing operator for semismooth





them numerically?)






Why is this useful?I no immediate approximation error: algorithms work in the same spaces as the problemI massive exploitation of structure possible

I subtle requirements can get lost in discretizations (e.g. smoothing operator for semismoothmethods)

I methods can be very fast




them numerically?)






Why is this useful?I no immediate approximation error: algorithms work in the same spaces as the problemI massive exploitation of structure possibleI subtle requirements can get lost in discretizations (e.g. smoothing operator for semismooth

methods)

I methods can be very fast




them numerically?)











them numerically?)








What are the difficulties?I detailed functional analytic background necessary (cannot be expected in an industrial

context)


them numerically?)









context)I discretizations become necessary at lower level anyway; so, why not discretize right away?

I theoretical difficulties with, e.g., state constraints (multipliers are measures; how to handlethem numerically?)









context)I discretizations become necessary at lower level anyway; so, why not discretize right away?I theoretical difficulties with, e.g., state constraints (multipliers are measures; how to handle

them numerically?)


Functional Analysis

I Banach space: complete normed vector space (X , ‖ · ‖X )

I Hilbert space: complete vector space X with inner product 〈·, ·〉X×X

I Dual space X∗ of a vector space X is defined by

X∗ := f : X → R | f linear and continuous, ‖f‖X∗ = sup‖x‖=1

|f (x)|

(X∗, ‖ · ‖X∗ ) is a Banach space, if (X , ‖ · ‖X ) is a Banach space.I For Banach spaces X and Y , the Frechet derivative of f : X −→ Y at x is a linear

and continuous operator f ′(x) : X −→ Y with the property

‖f (x + h)− f (x)− f ′(x)h‖Y = o(‖h‖X ).

The Frechet derivative of f at x in direction d is denoted by f ′(x)d .I The Frechet derivative of a function f : X × U → Y at (x, u) is given by

f ′(x, u)(x, u) = f ′x (x)x + f ′u(u)u,

where f ′x and f ′u are the partial derivatives of f w.r.t. x and u, respectively.


Functional Analysis





|f (x)|

(X∗, ‖ · ‖X∗ ) is a Banach space, if (X , ‖ · ‖X ) is a Banach space.

I For Banach spaces X and Y , the Frechet derivative of f : X −→ Y at x is a linearand continuous operator f ′(x) : X −→ Y with the property

‖f (x + h)− f (x)− f ′(x)h‖Y = o(‖h‖X ).


f ′(x, u)(x, u) = f ′x (x)x + f ′u(u)u,



Functional Analysis





|f (x)|



‖f (x + h)− f (x)− f ′(x)h‖Y = o(‖h‖X ).

The Frechet derivative of f at x in direction d is denoted by f ′(x)d .

I The Frechet derivative of a function f : X × U → Y at (x, u) is given by

f ′(x, u)(x, u) = f ′x (x)x + f ′u(u)u,



Functional Analysis





|f (x)|



‖f (x + h)− f (x)− f ′(x)h‖Y = o(‖h‖X ).


f ′(x, u)(x, u) = f ′x (x)x + f ′u(u)u,



Functional Analysis

Lebesgue spacesFor 1 ≤ p ≤ ∞ the Lebesgue spaces are defined by

Lp(I,Rn) := f : I −→ Rn | ‖f‖p <∞

with

‖f‖p :=

(∫I‖f (t)‖p

)1/p(1 ≤ p <∞),

‖f‖∞ := ess supt∈I

‖f (t)‖.

Properties:I L2(I,Rn) is a Hilbert space with inner product 〈f , g〉 =

∫I f (t)>g(t)dt

I L∞(I,Rn) is the dual space of L1(I,Rn), but not vice versaI Lp(I,Rn) and Lq(I,Rn) are dual to each other, if 1/p + 1/q = 1


Functional Analysis

Sobolev spacesFor 1 ≤ p ≤ ∞ and q ∈ N the Sobolev spaces are defined by

W q,p(I,Rn) := f : I −→ Rn | ‖f‖q,p <∞

with

‖f‖q,p :=

q∑j=0

‖f (j)‖p

1/p

(1 ≤ p ≤ ∞)

Properties:I W 1,2([t0, tf ],Rn) is a Hilbert space with inner product

〈f , g〉 = f (t0)>g(t0) +

∫ tf

t0f (t)>g(t) dt

I W 1,1(I,Rn) is the space of absolutely continuous functions, i.e functions satisfy

f (t) = f (t0) +

∫ t

t0f ′(τ )dτ in I = [t0, tf ].


Contents

Introduction


Adjoint Formalism




Infinite Optimization Problem

Throughout we restrict the discussion to the following class of optimization problems.

Problem (Infinite Optimization Problem (NLP))Given:

I Banach spaces (Z , ‖ · ‖Z ), (Y , ‖ · ‖Y )

I mappings J : Z −→ R, H : Z −→ YI convex set S ⊆ Z

Minimize J(z) subject to z ∈ S, H(z) = 0



Theorem (KKT Conditions)Assumptions:

I z is a local minimum of NLPI J is Frechet-differentiable at z, H is continuously Frechet-differentiable at zI S is convex with non-empty interiorI Mangasarian-Fromowitz Constraint Qualification (MFCQ): H′(z) is surjective and

there exists d ∈ int(S − z) with H′(z)(d) = 0

Then there exists a multiplier λ∗ ∈ Y∗ such that

J′(z)(z − z) + λ∗(H′(z)(z − z)) ≥ 0 ∀z ∈ S

For a proof see [1, Theorems 3.1,4.1].

[1] S. Kurcyusz.On the Existence and Nonexistence of Lagrange Multipliers in Banach Spaces.Journal of Optimization Theory and Applications, 20(1):81–110, 1976.



Lagrange function:L(z, λ∗) := J(z) + λ∗(H(z))

Special cases:

I If S = Z , then

J′(z)(z) + λ∗(H′(z)(z)) = 0 ∀z ∈ Z

or equivalently

L′z (z, λ∗) = 0

I If H is not present, then

J′(z)(z − z) ≥ 0 ∀z ∈ S

If Z is a Hilbert space, then this condition is equivalent with the nonsmoothequation

z = ΠS(z − αJ′(z)

)(α > 0,ΠS projection onto S)

I If S = Z and H is not present, then

J′(z)(z − z) = 0 ∀z ∈ S



Lagrange function:L(z, λ∗) := J(z) + λ∗(H(z))

Special cases:I If S = Z , then

J′(z)(z) + λ∗(H′(z)(z)) = 0 ∀z ∈ Z

or equivalently

L′z (z, λ∗) = 0

I If H is not present, then

J′(z)(z − z) ≥ 0 ∀z ∈ S

If Z is a Hilbert space, then this condition is equivalent with the nonsmoothequation

z = ΠS(z − αJ′(z)

)(α > 0,ΠS projection onto S)

I If S = Z and H is not present, then

J′(z)(z − z) = 0 ∀z ∈ S


Contents

Introduction


Adjoint Formalism




Adjoint Formalism

Equality constrained NLP (EQ-NLP)

Minimize J(x, u) subject to Ax = Bu.

A ∈ Rn×n, B ∈ Rn×m, J : Rn × Rm −→ R differentiable

Solution operator: Let A be nonsingular. Then:

Ax = Bu ⇔ x = A−1Bu =: Su

S : Rm −→ Rn, u 7→ x = Su is called solution operator.

Reduced objective functional:

J(x, u) = J(Su, u) =: j(u), j : Rm −→ R.

Reduced NLP (R-NLP)

Minimize j(u) = J(Su, u) w.r.t. u ∈ Rm.


Adjoint Formalism

TaskCompute gradient of j at a given u, for instance in a gradient based optimizationmethod or for the evaluation of necessary optimality conditions.

Differentiation yields

j′(u) = J′x (Su, u)S + J′u(Su, u)

= J′x (Su, u)A−1B + J′u(Su, u)

Computation of A−1 is expensive. Try to avoid it!

To this end define adjoint vector λ by

λ> := J′x (Su, u)A−1


Adjoint Formalism

Adjoint equation

A>λ = ∇x J(Su, u)

The gradient of j at u reads

j′(u) = λ>B + J′u(Su, u)

Connection to KKT conditions: Lagrange function of EQ-NLP

L(x, u, λ) := J(x, u) + λ>(Bu − Ax)

If λ solves adjoint equation, then

L′x (x, u, λ) = J′x (x, u)− λ>A = 0

This choice of λ automatically satisfies one part of the KKT conditions. The secondpart

0 = L′u(x, u, λ) = J′u(x, u) + λ>B = j′(u)

is only satisfied at a stationary point u of the reduced problem R-NLP!


Adjoint Formalism

Adjoint equation



j′(u) = λ>B + J′u(Su, u)


L(x, u, λ) := J(x, u) + λ>(Bu − Ax)


L′x (x, u, λ) = J′x (x, u)− λ>A = 0

This choice of λ automatically satisfies one part of the KKT conditions.

The secondpart

0 = L′u(x, u, λ) = J′u(x, u) + λ>B = j′(u)



Adjoint Formalism

Adjoint equation



j′(u) = λ>B + J′u(Su, u)


L(x, u, λ) := J(x, u) + λ>(Bu − Ax)


L′x (x, u, λ) = J′x (x, u)− λ>A = 0

This choice of λ automatically satisfies one part of the KKT conditions. The secondpart

0 = L′u(x, u, λ) = J′u(x, u) + λ>B = j′(u)



Adjoint Formalism

Let X and Y be Banach spaces and X∗ and Y∗ their topological dual spaces.

Adjoint operatorLetA : X −→ Y be a bounded linear operator. The operatorA∗ : Y∗ −→ X∗ withthe property

(A∗y∗)(x) = y∗(Ax) ∀x ∈ X , y∗ ∈ Y∗

is called adjoint operator (it is a bounded linear operator).Notion: (A∗y∗, x) = (y∗,Ax) “dual pairing”

Equality constrained NLP (EQ-NLP)

Minimize J(x, u) subject to Ax = Bu.

A : X −→ Y , B : U −→ Y bounded linear operators, J : X × U −→ R Frechetdifferentiable, X , U, Y Banach spaces


Adjoint Formalism

Solution operator: LetA be nonsingular. Then:

Ax = Bu ⇐⇒ x = A−1Bu =: Su

with solution operator S : U −→ X , u 7→ x = Su.

Reduced objective functional:

J(x, u) = J(Su, u) =: j(u), j : U −→ R.

Reduced NLP (R-NLP)

Minimize j(u) = J(Su, u) w.r.t. u ∈ U.


Adjoint Formalism

Differentiation at u in direction u yields

j′(u)u = J′x (Su, u)Su + J′u(Su, u)u

= J′x (Su, u)A−1Bu + J′u(Su, u)u

Computation ofA−1 is expensive. Try to avoid it!

Define adjoint λ ∈ Y∗ by

y∗ := J′x (Su, u)A−1 ⇔ y∗(Ax)︸︷︷︸=(A∗y∗)x

= J′x (Su, u)x

Adjoint equation

A∗y∗ = J′x (Su, u) (operator equation in X∗)

The gradient of j at u in direction u reads

j′(u)u = y∗Bu + J′u(Su, u)u


Adjoint Formalism

Connection to KKT conditions:I Lagrange function of EQ-NLP

L(x, u, y∗) := J(x, u) + y∗(Bu −Ax)

I y∗ solves adjoint equation =⇒

L′x (x, u, y∗)x = J′x (x, u)x − y∗(Ax) = J′x (x, u)x − (A∗y∗)x = 0

This choice of y∗ automatically satisfies one part of the KKT conditions.I The second part

0 = L′u(x, u, y∗) = J′u(x, u) + y∗B = j′(u)



Adjoint Formalism


L(x, u, y∗) := J(x, u) + y∗(Bu −Ax)



This choice of y∗ automatically satisfies one part of the KKT conditions.

I The second part

0 = L′u(x, u, y∗) = J′u(x, u) + y∗B = j′(u)



Adjoint Formalism


L(x, u, y∗) := J(x, u) + y∗(Bu −Ax)



This choice of y∗ automatically satisfies one part of the KKT conditions.I The second part

0 = L′u(x, u, y∗) = J′u(x, u) + y∗B = j′(u)



How to compute the adjoint operator?

Theorem (Riesz)Given: Hilbert space X, inner product 〈·, ·〉, dual space X∗.For every f∗ ∈ X∗ there exists a unique f ∈ X such that ‖f∗‖X∗ = ‖f‖X and

f∗(x) = 〈f , x〉 ∀x ∈ X .

Example (linear ODE)Given: I := [0, T ], linear mappingA : W 1,2(I, Rn) =: X −→ Y := L2(I, Rn)× Rn,

(Ax)(·) =

(x′(·)− A(·)x(·)

x(0)

).

X and Y are Hilbert spaces. According to Riesz’ theorem for y∗ ∈ Y∗ and x∗ ∈ X∗ there exist(λ, σ) ∈ Y and µ ∈ X with

y∗(h, k) =

∫Iλ(t)>h(t) dt + σ>k ∀(h, k) ∈ Y

x∗(x) = µ(0)>x(0) +

∫Iµ′(t)>x′(t) dt ∀x ∈ X


How to compute the adjoint operator?

Example (linear ODE, continued)We intend to identify λ, σ, and µ such that y∗(Ax) = (A∗y∗)(x) holds for all x ∈ X andy∗ ∈ Y∗. Note, thatA∗y∗ ∈ X∗.

By partial integration we obtain

y∗(Ax) =

∫Iλ(t)>

(x′(t)− A(t)x(t)

)dt + σ>x(0)

= −[(−∫ T

tλ(s)>A(s)ds

)x(t)]T

0+ σ>x(0)

+

∫I

(λ(t)> −

∫ T

tλ(s)>A(s)ds

)x′(t) dt

=

(σ> −

∫ T

0λ(s)>A(s)ds

)︸︷︷︸

=µ(0)>

x(0) +

∫I

(λ(t)> −

∫ T

tλ(s)>A(s)ds

)︸︷︷︸

=µ′(t)>

x′(t) dt

= (A∗y∗)(x).

The latter equation defines the adjoint operator since y∗(Ax) = (A∗y∗)(x) holds for all x ∈ X ,y∗ ∈ Y∗.


How to compute the adjoint equation?

Example (OCP and adjoint equation)Optimal control problem: (assumption: ϕ Frechet differentiable)

Minimize J(x, u) := ϕ(x(T )) s.t.

(x′(t)− A(t)x(t)

x(0)

)︸︷︷︸

=A(x)(t)

=

(B(t)u(t)

0

)︸︷︷︸

=B(u)(t)

Adjoint equation: For every x ∈ W 1,2(I, Rn) we have

0 = (A∗y∗)(x)− J′(x, u)(x)

=

(σ> −

∫ T

0λ(s)>A(s)ds

)x(0) +

∫I

(λ(t)> −

∫ T

tλ(s)>A(s)ds

)x′(t) dt

− ϕ′(x(T ))x(T )

Application of variation lemma (DuBois-Reymond) yields

λ(t)> = λ(T )> +

∫ T

tλ(s)>A(s)ds

andλ(0) = σ, λ(T )> = ϕ

′(x(T )).


How to compute the adjoint equation?

Example (OCP and adjoint equation, continued)

I Adjoint equation:λ′(t) = −A(t)>λ(t), λ(T ) = ∇ϕ(x(T )).

I Gradient of reduced objective functional j(u) = J(x(u), u):

j′(u)(u) = y∗(Bu) + J′u(x, u)u

=

∫Iλ(t)>B(t)u(t) dt + σ>0

=

∫Iλ(t)>B(t)u(t) dt (u ∈ L2(I, Rm))

I Stationary point:

0 = j′(u)(u) =

∫Iλ(t)>B(t)u(t) dt ∀u ∈ L2(I, Rm)

implies0 = B(t)>λ(t) a.e. in I


Contents

Introduction


Adjoint Formalism




Gradient Method in Finite Dimensions

Unconstrained minimization problem

Minimize J(u) w.r.t. u ∈ Rn

J : Rn −→ R continuously differentiable

Gradient Method for Finite Dimensional Problems

(0) Let u(0) ∈ Rn, β ∈ (0, 1), σ ∈ (0, 1), and k := 0.

(1) Compute d (k) := −∇J(u(k)).

(2) If ‖d (k)‖ ≈ 0, STOP.

(3) Perform line-search: Find smallest j ∈ 0, 1, 2, . . . with

J(u(k) + βj d (k)) ≤ J(u(k))− σβj‖∇J(u(k))‖22

and set αk := βj .

(4) Set u(k+1) := u(k) + αk d (k), k := k + 1, and go to (1).


Gradient Method

Pro’s:I requires only first derivativesI easy to implementI global convergence achieved by Armijo linesearchI extension: projected gradient method for simple constraints

Con’s:I only linear convergence rateI only unconstrained minimization (except simple bounds)


Gradient Method for Optimal Control Problems

Optimal control problem (OCP)Given: I := [t0, tf ], x ∈ Rnx , continuously differentiable functions

ϕ : Rnx −→ R,f0 : Rnx × Rnu −→ R,f : Rnx × Rnu −→ Rnx .

MinimizeΓ(x, u) := ϕ(x(tf )) +

∫I

f0(x(t), u(t))dt

w.r.t. x ∈ W 1,∞(I,Rnx ), u ∈ L∞(I,Rnu ) subject to the constraints

x′(t) = f (x(t), u(t)) a.e. in I,x(t0) = x.

Define X := W 1,∞(I,Rnx ) and U := L∞(I,Rnu ).


Gradient Method for Optimal Control Problems

Assumptions: (existence of solution operator)I For every u ∈ U the initial value problem

x′(t) = f (x(t), u(t)) a.e. in I,x(t0) = x

has a unique solution x = S(u) ∈ X .I The solution mapping S : U −→ X is continuously Frechet-differentiable.

Reduced optimal control problem (R-OCP)

Minimize J(u) := Γ(S(u), u) subject to u ∈ U.

Gradient method requires gradient of reduced objective function J at some u ∈ U.

How does the gradient look like?


Computation of Gradient

I In Rn:

∇J(u) := J′(u)> =

∂J∂u1

(u)

...∂J∂un

(u)

, J′(u)(u) = ∇J(u)>u = 〈J(u), u〉Rn×Rn ,

I In a Hilbert space setting, i.e J : U −→ R, U Hilbert space:

J′(u) ∈ U∗ Riesz=⇒ ∃η(u) ∈ U : J′(u)(u) = 〈η(u), u〉U×U

Hence,∇J(u) := η(u) ∈ U is the gradient of J at u.

I But, in our case U = L∞(I,Rnu ) is not a Hilbert space and hence the functionalJ′(u) ∈ U∗ does not a priori have such a nice representation as in the abovecases.

How to define the gradient in this case?−→ use formal Lagrange technique (seealso metric gradient, see [1])

[1] M. Golomb and R. A. Tapia.The metric gradient in normed linear spaces. Numerische Mathematik, 20:115–124, 1972.


Computation of Gradient by Formal Lagrange Technique

Hamilton function:H(x, u, λ) := f0(x, u) + λ>f (x, u)

Auxiliary functional: (u, x = S(u) given; λ ∈ X to be specified later)

J(u) := J(u) +⟨λ, f (x, u)− x′

⟩L2(I,Rnx )×L2(I,Rnx )

= ϕ(x(tf )) +

∫IH(x(t), u(t), λ(t))− λ(t)>x′(t) dt

p.I.= ϕ(x(tf ))−

[λ(t)>x(t)

]tf

t0+

∫IH(x(t), u(t), λ(t)) + λ′(t)>x(t) dt.

Frechet derivative: (exploit S′(u)(t0) = 0)

J′(u)(u) =(ϕ′(x(tf ))− λ(tf )>

)S′(u)(tf )

+

∫I

(H′x [t] + λ′(t)>

)S′(u)(t) +H′u [t]u(t) dt.

Computation of S′(u) is expensive! Eliminate red terms:

λ′(t) = −H′x (x(t), u(t), λ(t))>

λ(tf ) = ϕ′(x(tf ))>

adjoint ODE



“Gradient form”:

J′(u)(u) =

∫IH′u [t]u(t)dt = 〈∇uH, u〉L2(I,Rnu )×L2(I,Rnu ).

Theorem (steepest descent)The direction

d(t) := −1

‖H′u‖2H′u [t]>

solvesMinimize J′(u)(u) w.r.t. u subject to ‖u‖2 = 1 .

Proof: For every u with ‖u‖2 = 1 we have by the Schwarz inequality,

|J′(u)(u)| ≤ ‖H′u‖2 · ‖u‖2 = ‖H′u‖2

and for d we haveJ′(u)(d) = −‖H′u‖2.



Relation between J′(u) and J′(u), for a proof see [5, Chapter 8]:

TheoremLet u ∈ U and x = S(u) be given. Let λ satisfy the adjoint ODE

λ′ = −H′x (x, u, λ)>, λ(tf ) = ϕ′(x(tf ))>.

Then:J′(u)(u) = J′(u)(u) ∀u ∈ U.

Owing to the theorem we may define the gradient of J at u as follows.

Definition (Gradient of reduced objective functional)Let u ∈ U and x = S(u) be given. Let λ satisfy the adjoint ODE.The gradient∇J(u) ∈ U of J at u is defined by

∇J(u)(·) := H′u(x(·), u(·), λ(·))>.


Gradient Method

Gradient method for R-OCP

(0) Choose u(0) ∈ U, β ∈ (0, 1), σ ∈ (0, 1), and set k := 0.

(1) Solve the ODEx′ = f (x, u(k)), x(t0) = x,

and the adjoint ODE

λ′ = −H′x (x, u(k), λ)>, λ(tf ) = ϕ′(x(tf ))>.

Denote the solution by x (k) = S(u(k)) and λ(k).

(2) If ‖H′u‖2 ≈ 0, STOP.

(3) Setd (k)(t) := −H′u(x (k)(t), u(k)(t), λ(k)(t))>.

(4) Perform an Armijo line-search: Find smallest j ∈ 0, 1, 2, . . . with

J(u(k) + βj d (k)) ≤ J(u(k))− σβj‖H′u(x (k), u(k), λ(k))‖22




Gradient Method

Theorem (Convergence)Suppose that the gradient method does not terminate. Let u∗ be an accumulation pointof the sequence u(k)k∈N generated by the gradient method and x∗ := S(u∗).Then:

‖∇J(u∗)‖2 = 0.


Gradient Method – Examples

ExampleMinimize x2(1) subject to the constraints

x′1(t) = −x1(t) +√

3u(t), x1(0) = 2,

x′2(t) =12

(x1(t)2 + u(t)2

), x2(0) = 0.

Output of gradient method: (u(0) ≡ 0, β = 0.9, σ = 0.1, symplectic Euler, N = 100)

k αk J(u(k)) ‖H′u‖∞ ‖H′u‖22

0 0.00000000E+00 0.87037219E+00 0.14877655E+01 0.65717322E+00

1 0.10000000E+01 0.72406641E+00 0.61765831E+00 0.21168343E+00

2 0.10000000E+01 0.68017548E+00 0.35175249E+00 0.72633493E-01

3 0.10000000E+01 0.66515486E+00 0.20519966E+00 0.24977643E-01

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

23 0.10000000E+01 0.65728265E+00 0.47781223E-05 0.13406009E-10

24 0.10000000E+01 0.65728265E+00 0.28158031E-05 0.46203574E-11

25 0.10000000E+01 0.65728265E+00 0.16459056E-05 0.15877048E-11

26 0.10000000E+01 0.65728265E+00 0.97422070E-06 0.54326016E-12



Example (continued)Some iterates (red) and converged solution (blue):

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x1(t

)

time t [s]

state x1

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

u(t

)

time t [s]

control u

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

lam

bda1(t

)

time t [s]

adjoint lambda1



ExampleMinimize x2(1) + 2.5(x1(1)− 1)2 subject to the constraints

x′1(t) = u(t)− 15 exp(−2t), x1(0) = 4,

x′2(t) =12

(u(t)2 + x1(t)3

), x2(0) = 0.

Output of gradient method: (u(0) ≡ 0, β = 0.9, σ = 0.1, symplectic Euler, N = 100)

k αk J(u(k)) ‖H′u‖∞ ‖H′u‖22

0 0.00000000E+00 0.32263509E+02 0.17750257E+02 0.23786744E+03

1 0.31381060E+00 0.21450571E+02 0.16994586E+02 0.18987670E+03

2 0.25418658E+00 0.15604667E+02 0.91485607E+01 0.78563930E+02

3 0.28242954E+00 0.11414005E+02 0.75417222E+01 0.39835695E+02

4 0.25418658E+00 0.97695774E+01 0.42526367E+01 0.15677990E+02

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

61 0.28242954E+00 0.84261386E+01 0.22532128E-05 0.28170368E-11

62 0.28242954E+00 0.84261386E+01 0.16626581E-05 0.17267323E-11

63 0.28242954E+00 0.84261386E+01 0.16639237E-05 0.13398784E-11

64 0.22876792E+00 0.84261386E+01 0.83196204E-06 0.21444710E-12




-3

-2

-1

0

1

2

3

4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x1(t

)

time t [s]

state x1

-1

0

1

2

3

4

5

6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

u(t

)

time t [s]

control u

-20

-15

-10

-5

0

5

10

15

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

lam

bda1(t

)

time t [s]

adjoint lambda1


Projected Gradient Method

We add “simple” constraints u ∈ U with a convex set U ⊂ U to the reduced problem.

Reduced optimal control problem (R-OCP)

Minimize J(u) := Γ(S(u), u) subject to u ∈ U ⊂ U.

Assumptions:I the projection ΠU : U −→ U is easy to compute

Example: For box constraints

U = u ∈ R | a ≤ u ≤ bthe projection computes to

ΠU (u) = maxa,minb, u =

a, if u < a,

u, if a ≤ u ≤ b,

b, if u > b

Optimality:

J′(u)(u − u) ≥ 0 ∀u ∈ U ⇐⇒ u = ΠU(u − αJ′(u)

)(α > 0)


Projected Gradient Method

The projected gradient method requires a feasible initial u(0) and differs from thegradient method in one of the following two components:

Version 1: In iteration k compute

u(k) := ΠU(

u(k) + d (k))

and use the direction d (k) := u(k) − u(k) instead of d (k) in steps (4) and (5), i.e.

J(

u(k) + βj d (k))≤ J(u(k)) + σβj J′(u(k))d (k), u(k+1) = u(k) + αk d (k)

Version 2: In iteration k use the projection within the Armijo linesearch in step (4), i.e.

J(

ΠU(

u(k) + βj d (k)))≤ J(u(k)) + σβj J′(u(k))d (k)

and set u(k+1) := ΠU(u(k) + αk d (k)

)in step (5)


Projected Gradient Method – Examples

ExampleMinimize x2(1) + 2.5(x1(1)− 1)2 subject to the constraints

x′1(t) = u(t)− 15 exp(−2t), x1(0) = 4,

x′2(t) =12

(u(t)2 + x1(t)3

), x2(0) = 0,

u(t) ∈ U := u ∈ R | 1 ≤ u ≤ 3.

Output of projected gradient method (version 1): (u(0) ≡ 1, β = 0.9, σ = 0.1,symplectic Euler, N = 100)

k αk J(u(k)) ‖u(k) − ΠU (u(k) − J′(u(k)))‖∞ J′(u(k))d(k)

0 0.00000000E+00 0.19145273E+02 0.20000000E+01 -0.21446656E+02

1 0.10000000E+01 0.87483967E+01 0.20000000E+01 -0.10674827E+01

2 0.47829690E+00 0.86918573E+01 0.95659380E+00 -0.39783867E+00

3 0.10000000E+01 0.85622908E+01 0.16165415E+01 -0.34588659E+00

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

56 0.81000000E+00 0.84783285E+01 0.13795810E-05 -0.18974642E-12

57 0.72900000E+00 0.84783285E+01 0.13795797E-05 -0.98603024E-13

58 0.81000000E+00 0.84783285E+01 0.90526344E-06 -0.25693860E-13


Projected Gradient Method – Examples


-2

-1

0

1

2

3

4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x1(t

)

time t [s]

state x1

1

1.5

2

2.5

3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

u(t

)

time t [s]

control u

-14

-12

-10

-8

-6

-4

-2

0

2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

lam

bda1(t

)

time t [s]

adjoint lambda1


Gradient Method for the Discretized Problem

Instead of applying the “function space” gradient method to OCP, we could firstdiscretize OCP and apply the standard gradient method to the discretized problem.

Problem (Discretized optimal control problem (D-OCP))Given: x ∈ Rnx , grid GN , and continuously differentiable functions defined by

ϕ : Rnx −→ R, f0 : Rnx × Rnu −→ R, f : Rnx × Rnu −→ Rnx ,

GN := ti | ti = t0 + ih, i = 0, 1, . . . ,N, h = (tf − t0)/N, N ∈ N.

Minimize

ΓN (x, y, u) := ϕ(xN ) + hN−1∑i=0

f0(xi , ui )

w.r.t. x = (x0, x1, . . . , xN )> ∈ R(N+1)nx , u = (u0, . . . , uN−1)> ∈ RNnu subject to theconstraints

xi+1 − xi

h= f (xi , ui ) i = 0, . . . ,N − 1,

x0 = x.



Denote byS : RNnu −→ R(N+1)nx , u 7→ x = S(u),

the solution operator, that maps the control input u to the solution x of the discretedynamics.

Problem (Reduced Discretized Problem (RD-OCP))

Minimize JN (u) := ΓN (S(u), u) w.r.t. u ∈ RNnu .



Auxiliary functional: (u, x = S(u) given; λ ∈ R(N+1)nx to be defined later)

JN (u) := ϕ(xN ) + hN−1∑i=0

(H(xi , ui , λi+1)− λ>i+1

(xi+1 − xi

h

))

Derivative: (S′i (u) denotes the sensitivity matrix ∂ xi∂u ; exploit S′0(u) = 0)

J′N (u) = ϕ′(xN )S′N (u)

+hN−1∑i=0

(H′x [ti ]S′i (u) +H′u [ti ]

∂ui

∂u− λ>i+1

(S′i+1(u)− S′i (u)

h

))

=(ϕ′(xN )− λ>N

)S′N (u) +

(hH′x [t0] + λ>1

)S′0(u)

+hN−1∑i=1

(H′x [ti ] +

1h

(λ>i+1 − λ

>i

))S′i (u) + h

N−1∑i=0

H′u [ti ]∂ui

∂u

Avoid calculation of sensitivities S′i ! Choose λ such that red terms vanish.



Discrete adjoint ODE:

λi+1 − λi

h= −H′x (xi , ui , λi+1), i = 0, . . . ,N − 1,

λN = ϕ′(xN )>

Gradient of auxiliary functional:

J′N (u) = hN−1∑i=0

H′u [ti ]∂ui

∂u= h

(H′u [t0] H′u [t1] · · · H′u [tN−1]

).



Link to gradient of reduced objective functional JN :

TheoremLet u ∈ RNnu be given and let λ ∈ R(N+1)nx satisfy the discrete adjoint ODE. Then,

∇JN (u) = J′N (u)>.

Consequence:I The gradient method for RD-OCP uses in iteration k the search direction

d (k) = −∇JN (u(k)) = −h

H′u [t0]>

...

H′u [tN−1]>

.I This is the same search direction as in the “function space” gradient method,

except that it is scaled by h. Slower convergence expected!



Gradient method for RD-OCP

(0) Choose u(0) ∈ RNnu , β ∈ (0, 1), σ ∈ (0, 1), and set k := 0.

(1) Solve

xi+1 − xi

h= f (xi , u(k)

i ) (i = 0, . . . , N − 1), x0 = x,

λi+1 − λi

h= −H′x (xi , u(k)

i , λi+1) (i = 0, . . . , N − 1), λN = ϕ′(xN )>

(2) Set

d (k) := −∇JN (u(k)) = −h

H′u [t0]>

...

H′u [tN−1]>

(3) If ‖d (k)‖2 ≈ 0, STOP.

(4) Perform an Armijo line-search: Find smallest j ∈ 0, 1, 2, . . . with

JN (u(k) + βj d (k)) ≤ JN (u(k))− σβj‖d (k)‖22




Gradient Method for the Discretized Problem – Example

Example (compare “function space” equivalent example)Minimize x2(1) subject to the constraints

x′1(t) = −x1(t) +√

3u(t), x1(0) = 2,

x′2(t) =12

(x1(t)2 + u(t)2

), x2(0) = 0.

Output of gradient method for RD-OCP: (u(0) = 0, N = 100, β = 0.9, σ = 0.1)

k αk J(u(k)) ‖H′u‖∞ ‖H′u‖22

0 0.00000000E+00 0.87037219E+00 0.14877677E-01 0.65717185E-02

1 0.10000000E+01 0.86385155E+00 0.14671320E-01 0.63689360E-02

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

813 0.10000000E+01 0.65728265E+00 0.10356781E-05 0.76292003E-11

814 0.10000000E+01 0.65728265E+00 0.10655524E-05 0.77345706E-11

815 0.10000000E+01 0.65728265E+00 0.98588398E-06 0.74390524E-11

Observation: Since the search direction in RD-OCP is scaled by h, we need muchmore iterations compared to the “function space” gradient method, which required only26 iterations at the same accuracy.


Contents

Introduction


Adjoint Formalism




Lagrange-Newton Method

Pro’s:I locally quadratic convergence rate (fast convergence)I can handle nonlinear equality constraintsI global convergence achieved by Armijo linesearch

Con’s:I requires second derivativesI higher implementation effort compared to gradient method


Lagrange-Newton Method in Finite Dimensions

Consider the equality constrained nonlinear optimization problem:

Equality constrained optimization problem (E-NLP)

Minimize J(x, u) subject to H(x, u) = 0

J : Rnx × Rnu −→ R, H : Rnx × Rnu −→ RnH twice continuously differentiable

Lagrange function;L(x, u, λ) := J(x, u) + λ>H(x, u)

Theorem (KKT conditions)Let (x, u) be a local minimum of E-NLP and let H′(x, u) have full rank. Then thereexists a multiplier λ ∈ RnH such that

∇(x,u)L(x, u, λ) = 0.



Idea of Lagrange-Newton method: Apply Newton’s method to the optimality system

T (x, u, λ) = 0 with T (x, u, λ) :=

(∇(x,u)L(x, u, λ)

H(x, u)

)=

∇x L(x, u, λ)

∇uL(x, u, λ)

H(x, u)

Newton system:

T ′(x (k), u(k), λ(k))d (k) = −T (x (k), u(k), λ(k))

with

T ′(x, u, λ) =

L′′xx (x, u, λ) L′′xu(x, u, λ) H′x (x, u)>

L′′ux (x, u, λ) L′′uu(x, u, λ) H′u(x, u)>

H′x (x, u) H′u(x, u) 0



Theorem (Nonsingularity)T ′(x, u, λ) is nonsingular, if the following conditions hold:

I L′′(x,u),(x,u)

(x, u, λ) is positive definite on the nullspace of H′(x, u), that is

v>L′′(x,u),(x,u)(x, u, λ)v > 0 ∀v : H′(x, u)v = 0.

I H′(x, u) has full rank

Note: These conditions are actually sufficient, if (x, u, λ) is a stationary point of L.


Lagrange-Newton Method in Infinite Dimensions

Consider the equality constrained nonlinear optimization problem:

Equality constrained optimization problem (E-NLP)

Minimize J(x, u) subject to H(x, u) = 0

J : X × U −→ R, H : X × U −→ Λ twice continuously Frechet differentiable, X , U, ΛBanach spaces

Lagrange function; L : X × U × Λ∗ −→ R with

L(x, u, λ∗) := J(x, u) + λ∗(H(x, u))

Theorem (KKT conditions)Let (x, u) be a local minimum of E-NLP and let H′(x, u) be surjective. Then thereexists a multiplier λ∗ ∈ Λ∗ such that

L′(x,u)(x, u, λ∗) = 0.



Idea of Lagrange-Newton method:Use Newton’s method to find a zero z = (x, u, λ∗) ∈ Z := X × U × Λ∗ of theoperator T : Z −→ Y defined by

T (x, u, λ∗) :=

L′x (x, u, λ∗)

L′u(x, u, λ∗)

H(x, u)

Observe:

L′x (x, u, λ∗)x = J′x (x, u)x + λ∗(H′x (x, u)x) = J′x (x, u)x + (H′x (x, u)∗λ∗)x

L′u(x, u, λ∗)u = J′u(x, u)u + λ∗(H′u(x, u)u) = J′u(x, u)u + (H′u(x, u)∗λ∗)u

whereH′x (x, u)∗ : Y∗ −→ X∗ and H′u(x, u)∗ : Y∗ −→ U∗

denote the respective adjoint operators.


Local Lagrange-Newton Method


(0) Choose z(0) ∈ Z and set k := 0.

(1) If ‖T (z(k))‖Y ≈ 0, STOP.

(2) Compute the search direction d (k) from

T ′(z(k))(d (k)) = −T (z(k)).

(3) Set z(k+1) := z(k) + d (k), k := k + 1, and go to (1).



Theorem (Nonsingularity)T ′(x, u, λ∗) is nonsingular, if the following conditions hold:

I L′′(x,u),(x,u)

(x, u, λ∗) is uniformly positive definite on the nullspace of H′(x, u), i.e.there exists C > 0 such that

v>L′′(x,u),(x,u)(x, u, λ∗)v ≥ C‖v‖2X×U ∀v ∈ X × U : H′(x, u)v = 0.

I H′(x, u) is surjective

Note: These conditions are actually sufficient, if (x, u, λ) is a stationary point of L,see [1, Theorem 5.6], [2, Theorem 2.3].

[1] H. Maurer and J. Zowe.First and Second-Order Necessary and Sufficient Optimality Conditions for Infinite-DimensionalProgramming Problems.Mathematical Programming, 16:98–110, 1979.

[2] H. Maurer.First and Second Order Sufficient Optimality Conditions in Mathematical Programming andOptimal Control.Mathematical Programming Study, 14:163–177, 1981.



Theorem (local convergence)Let z∗ be a zero of T . Suppose there exist constants ∆ > 0 and C > 0 such that forevery z ∈ B∆(z∗) the derivative T ′(z) is non-singular and

‖T ′(z)−1‖L(Y ,Z ) ≤ C.

(a) If ϕ, f0, f , ψ are twice continuously differentiable, then there exists δ > 0 suchthat the local Lagrange-Newton method is well-defined for every z(0) ∈ Bδ(z∗)and the sequence z(k)k∈N converges superlinearly to z∗ for everyz(0) ∈ Bδ(z∗).

(b) If the second derivatives of ϕ, f0, f , ψ are locally Lipschitz continuous, then theconvergence in (a) is quadratic.

(c) If in addition to the assumption in (a), T (z(k)) 6= 0 for all k, then the residualvalues converge superlinearly:

limk−→∞

‖T (z(k+1))‖Y

‖T (z(k))‖Y= 0.


Global Lagrange-Newton Method

Merit function for globalization:

γ(z) :=12‖T (z)‖2

2

Globalized Lagrange-Newton Method

(0) Choose z(0) ∈ Z , β ∈ (0, 1), σ ∈ (0, 1/2).

(1) If γ(z(k)) ≈ 0, STOP.

(2) Compute the search direction d (k) from T ′(z(k))(d (k)) = −T (z(k)).

(3) Find smallest j ∈ 0, 1, 2, . . . with

γ(z(k) + βj d (k)) ≤ γ(z(k)) + σβjγ′(z(k))(d (k))


(4) Set z(k+1) := z(k) + αk d (k), k := k + 1, and go to (1).

Note: γ : Z −→ R is Frechet-differentiable with

γ′(z(k))(d (k)) = −2γ(z(k)) = −‖T (z(k))‖22


Lagrange-Newton Method – Application to Optimal Control

Problem (Optimal control problem (OCP))Given: I := [t0, tf ], twice continuously differentiable functions ϕ : Rnx × Rnx −→ R,

f0 : Rnx × Rnu −→ R, f : Rnx × Rnu −→ Rnx , ψ : Rnx × Rnx −→ Rnψ .

MinimizeJ(x, u) := ϕ(x(t0), x(tf )) +

∫I

f0(x(t), u(t))dt

with respect to x ∈ X := W 1,∞(I,Rnx ) and u ∈ U := L∞(I,Rnu ) subject to theconstraints

x′(t) = f (x(t), u(t)) a.e. in I,0 = ψ(x(t0), x(tf )).

Remark:I A partially reduced approach is possible, where x = x(·; u, x0) is expressed as a

function of initial value x0 and u. The constraint ψ(x0, x(tf ; u, x0)) = 0 remains.This is the function space equivalent of the direct shooting method.



Hamilton function:H(x, u, λ) := f0(x, u) + λ>f (x, u)

Defineκ := ϕ + σ>ψ

Theorem (Minimum principle, KKT conditions)Let (x∗, u∗) be a local minimizer of OCP. Let H′(x∗, u∗) be surjective. Then there existmultipliers λ∗ ∈ W 1,∞(I,Rnx ) and σ∗ ∈ Rnψ with

x′∗(t)− f (x∗(t), u∗(t)) = 0

λ′∗(t) +H′x (x∗(t), u∗(t), λ∗(t))> = 0

ψ(x∗(t0), x∗(tf )) = 0

λ∗(t0) + κ′x0(x∗(t0), x∗(tf ), σ∗)> = 0

λ∗(tf )− κ′xf(x∗(t0), x∗(tf ), σ∗)> = 0

H′u(x∗(t), u∗(t), λ∗(t))> = 0

Root finding problem:T (z∗) = 0, T : Z −→ Y



Definiton of operator T :

T (z)(·) :=

x′(·)− f (x(·), u(·))

λ′(·) + H′x (x(·), u(·), λ(·))>

ψ(x(t0), x(tf ))

λ(t0) + κ′x0(x(t0), x(tf ), σ)>

λ(tf )− κ′xf(x(t0), x(tf ), σ)>

H′u(x(·), u(·), λ(·))>

with

z := (x, u, λ, σ)

Z := X × U × X × Rnψ

Y := L∞(I,Rnx )× L∞(I,Rnx )× Rnψ × Rnx × Rnx × L∞(I,Rnu )


Lagrange-Newton Method – Computation of Search Direction

Newton direction:

T ′(z(k))d = −T (z(k)), d = (x, u, λ, σ) ∈ Z

Frechet derivative: (evaluated at z(k))

T ′(z(k))(z) =

x′ − f ′x x − f ′uu

λ′ +H′′xx x +H′′xuu +H′′xλλψ′x0

x(t0) + ψ′xfx(tf )

λ(t0) + κ′′x0x0x(t0) + κ′′x0xf

x(tf ) + κ′′x0σσ

λ(tf )− κ′′xf x0x(t0)− κ′′xf xf

x(tf )− κ′′xfσσ

H′′ux x +H′′uuu +H′′uλλ

.



Newton direction is equivalent to the linear DAE boundary value problemx′

λ′

0

−

f ′x 0 f ′u−H′′xx −H′′xλ −H′′xu

−H′′ux −H′′uλ −H′′uu

x

λ

u

= −

(x (k)

)′ − f(λ(k)

)′+ (H′x )>

(H′u)>

with boundary conditions

ψ′x00 0

κ′′x0x0Id κ′′x0σ

−κ′′xf x00 −κ′′xfσ

x(0)

λ(t0)

σ

+

ψ′xf

0 0

κ′′x0xf0 0

−κ′′xf xfId 0

x(1)

λ(tf )

σ

= −

ψ

λ(k)(t0) + (κ′x0)>

λ(k)(tf )− (κ′xf)>

.Hence: In each Newton iteration, we need to solve the above linear BVP.



TheoremThe differential-algebraic equation (DAE) in BVP has index one (i.e. the last equalitycan be solved for u), if the matrix function

M(t) := H′′uu [t]

is non-singular for almost every t ∈ [0, 1] and ‖M(t)−1‖ ≤ C for some constant Cand almost every t ∈ [0, 1].

If M(·) is singular:I BVP contains a differential-algebraic equation of higher index, which is

numerically unstable.I boundary conditions may become infeasible


Lagrange-Newton Method – Examples

Example (Trolley)

x

z

u

(x1, z)

x2

−m2g

`



Example (Trolley, continued)Dynamics:

x′1 = x3

x′2 = x4

x′3 =m2

2`3 sin(x2)x2

4 − m2`2u + m2Iy`x2

4 sin(x2)− Iy u−m1m2`2 − m1Iy − m2

2`2 − m2Iy + m2

2`2 cos(x2)2

+m2

2`2g cos(x2) sin(x2)

−m1m2`2 − m1Iy − m22`

2 − m2Iy + m22`

2 cos(x2)2

x′4 =m2`

(m2` cos(x2)x2

4 sin(x2)− cos(x2)u + g sin(x2)(m1 + m2))

−m1m2`2 − m1Iy − m22`

2 − m2Iy + m22`

2 cos(x2)2

Parameters:

g = 9.81, m1 = 0.3, m2 = 0.5, ` = 0.75, r = 0.1, Iy = 0.002.



Example (Trolley, continued)Minimize

12

∫ tf

0u(t)2 + 5x4(t)2 dt

subject to the ODE, the initial conditions

x1(0) = x2(0) = x3(0) = x4(0) = 0,

and the terminal conditions

x1(tf ) = 1, x2(tf ) = x3(tf ) = x4(tf ) = 0

within the fixed time tf = 2.7.



Example (Trolley, continued)Output of Lagrange-Newton method: (N = 1000, Euler)

k αk ‖T (z(k))‖22 ‖d(k)‖∞

0 0.000000E+00 0.100000E+01 0.451981E+01

1 0.100000E+01 0.688773E-03 0.473501E-02

2 0.100000E+01 0.809983E-12 0.118366E-06

3 0.100000E+01 0.160897E-24 0.141058E-11

Iterations for different stepsizes N: mesh independence; linear CPU

N CPU time [s] Iterations

100 0.022 3

200 0.050 3

400 0.093 3

800 0.174 3

1600 0.622 3

3200 0.822 3

6400 1.900 4

12800 3.771 4

25600 7.939 4



Example (Trolley, continued)

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3

x1(t

)

time t [s]

state x1

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

0 0.5 1 1.5 2 2.5 3

x2(t

)

time t [s]

state x2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0 0.5 1 1.5 2 2.5 3

x3(t

)

time t [s]

state x3

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.5 1 1.5 2 2.5 3

x4(t

)

time t [s]

state x4

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0 0.5 1 1.5 2 2.5 3

lam

bda1(t

)

time t [s]

adjoint lambda1

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

0 0.5 1 1.5 2 2.5 3

lam

bda2(t

)

time t [s]

adjoint lambda2

-1.5

-1

-0.5

0

0.5

1

1.5

0 0.5 1 1.5 2 2.5 3

lam

bda3(t

)

time t [s]

adjoint lambda3

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

0 0.5 1 1.5 2 2.5 3

lam

bda4(t

)

time t [s]

adjoint lambda4

-1.5

-1

-0.5

0

0.5

1

1.5

0 0.5 1 1.5 2 2.5 3

u(t

)

time t [s]

control u


Lagrange-Newton Method – Example from Chemical EngineeringDAE index-1 optimal control problem: (chemical reaction of substances A, B, C, and D)

Minimize

−MC (tf ) + 10−2∫ tf

0FB(t)2 + Q(t)2dt

w.r.t. controls FB (feed rate of substance B) and cooling power Q subject to the index-1 DAE

M′A = −V · A1 · e−E1/TR · CA · CB

M′B = FB − V(

A1e−E1/TR · CA · CB + A2 · e−E2/TR · CB · CC

)M′C = V

(A1e−E1/TR · CA · CB − A2 · e−E2/TR · CB · CC

)M′D = V · A2 · e−E2/TR · CB · CC

H′ = 20FB − Q − V(−A1e−E1/TR · CA · CB − 75A2 · e−E2/TR · CB · CC

)0 = H −

∑i=A,B,C,D

Mi

(αi (TR − Tref ) +

βi

2

(T 2

R − T 2ref

))where

V =∑

i=A,B,C,D

Mi

ρi, Ci = Mi/V , i = A, B, C, D.

Source: V. C. Vassiliades, R. W. H. Sargent, and C. C. Pantelides. Solution of a class of multistage dynamic optimization problems. 2.

Problems with path constraints. Industrial & Engineering Chemistry Research, 33:2123–2133, 1994.


Lagrange-Newton Method – Example from Chemical Engineering

Lagrange-Newton method:(N = 20000 intervals)

k αk ‖T (z(k))‖22 ‖d(k)‖∞

0 0.000000E+00 0.465186E+12 0.599335E+05

1 0.100000E+01 0.759821E+10 0.523150E+07

2 0.147809E-01 0.755228E+10 0.127262E+07

3 0.423912E-01 0.745716E+10 0.835212E+05

4 0.100000E+01 0.351002E+09 0.344908E+06

5 0.121577E+00 0.340325E+09 0.667305E+05

6 0.100000E+01 0.131555E+08 0.370395E+05

7 0.100000E+01 0.295389E+07 0.169245E+04

8 0.100000E+01 0.114958E+01 0.336606E+00

9 0.100000E+01 0.852576E-11 0.104227E-02

10 0.100000E+01 0.125527E-11 0.658317E-03

11 0.100000E+01 0.283055E-14 0.605453E-05

0

20

40

60

80

100

0 5 10 15 20

F_B

(t)

time t [s]

control F_B

-0.02

-0.015

-0.01

-0.005

0

0.005

0.01

0.015

0.02

0 5 10 15 20

Q(t

)

time t [s]

control Q



8000

8200

8400

8600

8800

9000

0 5 10 15 20

M_A

(t)

time t [s]

state M_A

-5

0

5

10

15

20

25

30

35

0 5 10 15 20M

_B

(t)

time t [s]

state M_B

0

200

400

600

800

1000

0 5 10 15 20

M_C

(t)

time t [s]

state M_C

-0.2

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

M_D

(t)

time t [s]

state M_D

150000

155000

160000

165000

170000

175000

180000

0 5 10 15 20

H(t

)

time t [s]

state H

346

348

350

352

354

356

358

360

362

0 5 10 15 20

T_R

(t)

time t [s]

algebraic variable T_R



-0.001

0

0.001

0.002

0.003

0.004

0.005

0 5 10 15 20

lam

bda1(t

)

time t [s]

adjoint lambda1

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0 5 10 15 20la

mbda2(t

)

time t [s]

adjoint lambda2

-0.002

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 5 10 15 20

lam

bda3(t

)

time t [s]

adjoint lambda3

-0.002

0

0.002

0.004

0.006

0.008

0.01

0 5 10 15 20

lam

bda4(t

)

time t [s]

adjoint lambda4

-0.0002

-0.00015

-0.0001

-5e-05

0

5e-05

0.0001

0 5 10 15 20

lam

bda5(t

)

time t [s]

adjoint lambda5

-0.00025

-0.0002

-0.00015

-0.0001

-5e-05

0

5e-05

0.0001

0 5 10 15 20

lam

bda_g(t

)

time t [s]

adjoint lambda_g


Lagrange-Newton Method – Navier-Stokes Example

Optimal control problem:

Minimize

12

∫Q‖y(t, x)− yd (t, x)‖2 dx dt +

δ

2

∫Q‖u(t, x)‖2 dx dt

w.r.t. velocity y , pressure p and control u subject to the 2D Navier-Stokes equations

yt =1

Re∆y − (y · ∇)y −∇p + u in Q := (0, tf )× Ω,

0 = div(y) in Q,

0 = y(0, x) for x ∈ Ω := (0, 1)× (0, 1),

0 = y(t, x) for (t, x) ∈ (0, tf )× ∂Ω.

Given: desired velocity field

yd (t, x1, x2) = (−q(t, x1)q′x2(t, x2), q(t, x2)q′x1

(t, x1))>,

q(t, z) = (1− z)2(1− cos(2πzt))

M. Gerdts and M. Kunkel. A globally convergent semi-smooth Newton method for control-state constrained DAE optimal control problems.

Computational Optimization and Applications, 48(3):601–633, 2011.



Discretization by method of lines: (details omitted)

Minimize12

∫ tf

0‖yh(t)− yd,h(t)‖2 dt +

δ

2

∫ tf

0‖uh(t)‖2 dt

subject to the index-2 DAE

yh(t) =1

ReAhyh(t)−

12

yh(t)>Hh,1yh(t)

...

yh(t)>Hh,2(N−1)2 yh(t)

− Bhph(t) + uh(t),

0 = B>h yh(t),

yh(0) = 0.



Pressure p at t = 0.6, t = 1.0, t = 1.4 and t = 1.967:

(Parameters: tf = 2, δ = 10−5, Re = 1, N = 31, Nt = 60, nx = 2(N − 1)2 = 1800,ny = (N − 1)2 = 900, nu = 1800 controls)



Desired flow (left), controlled flow (middle), control (right) at t = 0.6, t = 1.0, t = 1.4and t = 1.967:



Output of Lagrange-Newton method:

Solve Stokes problem as initial guess:

k∫ 1

0 f0[t]dt αk−1 ‖T (z(k))‖2

0 1.763432802e+03 5.938741958e+01

1 3.986109778e+02 1.0000000000000 7.255927209e-10

Solve Navier-Stokes problem:

k∫ 1

0 f0[t]dt αk−1 ‖T (z(k))‖2

0 3.986109778e+02 1.879632471e+04

1 3.988553051e+02 1.0000000000000 1.183777963e+01

2 3.988549264e+02 1.0000000000000 4.521653291e-04

3 3.988549264e+02 1.0000000000000 9.576226032e-10


Lagrange-Newton Method – Extensions

Treatment of inequality constraints:I sequential-quadratic programming (SQP)

Idea: solve linear-quadratic optimization problems to obtain a search directionI interior-point methods (IP)

Idea: solve a sequence of barrier problems (equivalent: perturbation ofcomplementarity conditions in KKT conditions)

I semismooth Newton methodsIdea: transform complementarity conditions into equivalent (nonsmooth!) equation


References.

[1] W. Alt.The Lagrange-Newton method for infinite dimensional optimization problems.Numerical Functional Analysis and Optimization, 11:201–224, 1990.

[2] W. Alt.Sequential Quadratic Programming in Banach Spaces.In W. Oettli and D. Pallaschke, editors, Advances in Optimization, pages 281–301, Berlin, 1991. Springer.

[3] W. Alt and K. Malanowski.The Lagrange-Newton method for nonlinear optimal control problems.Computational Optimization and Applications, 2:77–100, 1993.

[4] W. Alt and K. Malanowski.The Lagrange-Newton method for state constrained optimal control problems.Computational Optimization and Applications, 4:217–239, 1995.

[5] M. Gerdts.Optimal Control of ODEs and DAEs.Walter de Gruyter, Berlin/Boston, 2012, 2012.

[6] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich.Optimization with PDE constraints.Mathematical Modelling: Theory and Applications 23. Dordrecht: Springer. xi, 270 p., 2009.

[7] K. Ito and K. Kunisch.Lagrange multiplier approach to variational problems and applications.Advances in Design and Control 15. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM), 2008.

[8] K. C. P. Machielsen.Numerical Solution of Optimal Control Problems with State Constraints by Sequential Quadratic Programming in Function Space.Volume 53 of CWI Tract, Centrum voor Wiskunde en Informatica, Amsterdam, 1988.

[9] F. Troltzsch.Optimale Steuerung partieller Differentialgleichungen.Vieweg, Wiesbaden, 2005.


Resources

Software: (available for registered users; free for academic use)I OCPID-DAE1 (optimal control and parameter identification with

differential-algebraic equations of index 1):

http://www.optimal-control.deI sqpfiltertoolbox: SQP method for dense NLPs

http://www.optimal-control.deI WORHP (Buskens/Gerdts): SQP method for sparse large-scale NLPs

http://www.worhp.deI QPSOL: interior-point and nonsmooth Newton methods for sparse large-scale

convex quadratic programs

available upon request

Robotics lab at UniBw M: research stays with use of lab equipment upon requestI KUKA youBot robot (2 arm robot on a platform); 3 scale rc cars; LEGO

Mindstorms robots; quarter car testbench; quadcopter


More ResourcesFurther optimal control software:

I CasADI, ACADO: M. Diehl et al.; http://casadi.org; http://sourceforge.net/p/acado/I NUDOCCCS: C. Buskens, University of BremenI SOCS: J. Betts, The Boeing Company, Seattle; http://www.boeing.com/boeing/phantom/socs/I DIRCOL: O. von Stryk, TU Darmstadt; http://www.sim.informatik.tu-darmstadt.de/res/sw/dircolI MUSCOD-II: H.G. Bock et al., IWR Heidelberg; http://www.iwr.uni-heidelberg.de/∼agbock/RESEARCH/muscod.phpI MISER: K.L. Teo et al., Curtin University, Perth; http://school.maths.uwa.edu.au/ les/miser/I PSOPT: http://www.psopt.org/I ...

Further optimization software:I NPSOL (dense problems), SNOPT (sparse large-scale problems): Stanford Business Software; http://www.sbsi-sol-optimize.comI KNITRO (sparse large-scale problems): Ziena Optimization; http://www.ziena.com/knitro.htmI IPOPT (sparse large-scale problems): A. Wachter: https://projects.coin-or.org/IpoptI filterSQP: R. Fletcher, S. Leyffer; http://www.mcs.anl.gov/ leyffer/solvers.htmlI ooQP: M. Gertz, S. Wright; http://pages.cs.wisc.edu/ swright/ooqp/I qpOASES: H.J. Ferreau, A. Potschka, C. Kirches; http://homes.esat.kuleuven.be/ optec/software/qpOASES/I ...

Software for boundary value problems:I BOUNDSCO: H. J. Oberle, University of Hamburg; http://www.math.uni-hamburg.de/home/oberle/software.htmlI COLDAE: U. Ascher; www.cs.ubc.ca/ ascher/coldae.fI ...

Links:I Decision Tree for Optimization Software; http://plato.la.asu.edu/guide.htmlI CUTEr: large collection of optimization test problems; http://www.cuter.rl.ac.uk/I COPS: large-scale optimization test problems; http://www.mcs.anl.gov/∼more/cops/I MINTOC: testcases for mixed-integer optimal control; http://mintoc.de/I ...


Announcement: youBot Robotics Hackathon

Description:I student programming contestI addresses students and Phd’s who like to realize

projects with the KUKA youBot robotI 12 particpants from 5 Universities (UniBw M, Bayreuth,

TUM, TU Berlin, Maastricht)


Thanks for your Attention!

Questions?

Further information:

[email protected]/lrt1/gerdtswww.optimal-control.de

Fotos: http://de.wikipedia.org/wiki/MunchenMagnus Manske (Panorama), Luidger (Theatinerkirche), Kurmis (Chin. Turm), Arad Mojtahedi (Olympiapark), Max-k (Deutsches Museum), Oliver Raupach (Friedensengel), Andreas Praefcke (Nationaltheater)

numerical optimal control part 3: function space...

Documents