optimal control and dynamic programming · 6 approach • dynamic programming (dp) shall allow us...

37
4SC000 Q2 2017-2018 Optimal Control and Dynamic Programming Duarte Antunes

Upload: others

Post on 27-Jan-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

  • 4SC000 Q2 2017-2018

    Optimal Control and Dynamic Programming

    Duarte Antunes

  • Part IIIContinuous-time optimal

    control problems

  • Recap

    1

    Discrete optimization problems Stage decision problems

    Formulation Transition diagram Dynamic system & additive cost function

    DP algorithm Graphical DP algorithm & DP equation

    DP equation

    Partial information

    Bayesian inference & decisions based on prob. distribution

    Kalman filter and separation principle

    Alternative algorithms Dijkstra's algorithm Static optimization

  • 2

    Introduce optimal control concepts for continuous-time optimal control problems

    Goals of part III

    Discrete optimization

    problems

    Stage decision problems

    Continuous-time control problems

    Formulation Transition diagram Discrete-time system & additive cost function

    Differential equations & additive cost function

    DP algorithm

    Graphical DP algorithm & DP equation

    DP equationHamilton Jacobi Bellman equation

    Partial information

    Bayesian inference & decisions based on prob. distribution

    Kalman filter and separation principle

    Continuous-time Kalman filter and

    separation principle

    Alternative algorithms Dijkstra's algorithm Static optimization

    Pontryagin’s maximum principle

    And analyze frequency-domain properties of continuous-time LQR/LQG

  • Outline

    • Problem formulation and approach• Hamilton Jacobi Bellman equation• Linear quadratic regulator

  • 3

    Continuous-time optimal control problems

    Dynamic model

    Cost function

    The goal is to find an optimal path and an optimal policy

    • The differential equation has a unique solution in

    • We assume that do not explicitly depend on time for simplicity - we could consider

    • and

    Assumptionst 2 [0, T ]

    ẋ(t) = f(x(t), u(t)), x(0) = x0, t 2 [0, T ]

    Z T

    0g(x(t), u(t))dt+ gT (x(T ))

    f, gf(t, x(t), u(t)), g(t, x(t), u(t))

    x(t) 2 Rn u(t) 2 U ✓ Rm

  • 4

    Optimal path

    • A path consists of a control input and a corresponding solution to the differential equation

    • A path is said to be optimal is there is no other path with a smaller cost

    (u(t), x(t)) u(t)x(t)

    , t 2 [0, T ]

    • Choosing the control input can be seen as making decisions in infinitesimal time intervals which shape the derivative of the state (and thus determine its evolution)

    ẋ(t) = f(x(t), u(t)), x(0) = x0, t 2 [0, T ]

    Z T

    0g(x(t), u(t))dt+ gT (x(T ))

    t = T

    x(T )

  • 5

    Optimal policy

    • A policy is a function which maps states into actions at every time step

    • A policy is said to be optimal if for every state at every time ,

    coincides with the cost of the optimal path to the problem

    • We denote the cost of the latter problem by optimal cost-to-go

    u(t) = µ(t, x(t)), t 2 [0, T ]

    x(t) = x̄ t

    µ

    µ

    ẋ(s) = f(x(s), u(s)), x(t) = x̄, s 2 [t, T ]

    J(t, x̄)

    Z T

    tg(x(s), µ(s, x(s)))ds+ gT (x(T ))

    Z T

    tg(x(s), u(s))ds+ gT (x(T ))

  • 6

    Approach• Dynamic programming (DP) shall allow us to compute optimal policies and optimal paths

    and the Pontryagin’s maximum principle (PMP) shall allow us to compute optimal paths.

    • However, obtaining these results in continuous-time (CT) is mathematically involved.

    • To gain intuition in both cases we will first discretize the problem as a function of the discretization step (previously sampling period), apply DP and take the limit as the discretization step converges to zero.

    CT DP

    DT DP

    Discretization, step ⌧ ⌧ ! 0

    Taking the limit

    Optimal path and

    policy

    Stage decision problem

    CT control problem

    Optimal path and

    policy

  • 7

    Example

    +�

    R

    C+

    �u

    How to charge the capacitor in a RC circuit with minimum energy loss in the resistor?

    i

    x

    ẋ(t) = 1RC (u(t)� x(t))

    Let us consider R = C = T = xdesired = 1

    x(T ) = xdesired

    x(0) = 0minu(t)

    Z T

    0

    (x(t)� u(t))2

    R

    dt

  • 8

    Discretization

    Dynamic model

    Cost function

    Discretization times

    discretization step⌧

    x(t) = e�(t�tk) x(tk

    )| {z }xk

    +(1� e�(t�tk))u(tk

    )| {z }uk

    Z 1

    0(x(t)� u(t))2dt =

    h�1X

    k=0

    Z tk+1

    tk

    (e�(t�tk)xk + (1� e�(t�tk))uk � uk)2dt

    =h�1X

    k=0

    Z tk+1

    tk

    e

    �2(t�tk)dt(xk � uk)2

    =h�1X

    k=0

    1� e�2⌧

    2(xk � uk)2

    xk+1 = e�⌧

    xk + (1� e�⌧ )uk

    t 2 [tk, tk+1)

    kh = Ttk = k⌧

  • 9

    From terminal constraint to terminal cost

    time1 1 +�

    1x(t)

    The framework of stage decision problems does not take into account terminal constraints.

    Thus we apply a trick considering that a final control input is applied at the terminal time setting the state to the desired terminal value after seconds, .

    x(1 +�) = e��x(1) + (1� e��)u(1)Since this terminal control input is given by

    x(1 +�) = 1�

    u(1) =1� e��x(1)(1� e��)

  • 10

    The following cost approximates the original one that we are interested in

    From terminal constraint to terminal cost

    1� e��x(1)(1� e��)

    terminal cost

    � ! 0�(�) ! 1 asNote that but if�(�)(xh � 1)2 ! 0 xh ! 1

    �(�) =1� e�2�

    2(1� e��)2

    Z 1+�

    0(x(t)� u(t))2dt =

    Z 1

    0(x(t)� u(t))2dt+

    Z 1+�

    1(x(t)� u(t))2dt

    =(h�1X

    k=0

    1� e�2⌧

    2(xk � uk)2) +

    1� e�2�

    2(xh � uh)2

    =(h�1X

    k=0

    1� e�2⌧

    2(xk � uk)2) + �(�)(xh � 1)2

  • 11

    Dynamic programming

    Jk(xk) = minuk

    (xk � uk)2 + Jk+1(e�⌧xk + (1� e�⌧ )uk)

    Applying DP

    Jh(xh) = �(�)(xh � 1)2

    Results in Obtained from Riccati equations

    Example

    ⌧ = 0.2

    � = 0.01

    time t0 0.2 0.4 0.6 0.8 1

    x(t)

    0

    0.2

    0.4

    0.6

    0.8

    1

    time t0 0.2 0.4 0.6 0.8 1

    u(t)

    0

    0.5

    1

    1.5

    2

    uk = Kkxk + ↵k

    Jk(xk) = ✓kx2k + �kxk + �k

  • 12

    Taking the limit ⌧ ! 0

    Seems to be converging to u(t) = 1 + t x(t) = t . Later we will prove this.

    � = 0.01

    � = 0.001

    ⌧ = 0.01

    ⌧ = 0.05

    ⌧ = 0.01

    � = 0.01

    time t0 0.2 0.4 0.6 0.8 1

    x(t)

    0

    0.2

    0.4

    0.6

    0.8

    1

    time t0 0.2 0.4 0.6 0.8 1

    u(t)

    0

    0.5

    1

    1.5

    2

    time t0 0.2 0.4 0.6 0.8 1

    x(t)

    0

    0.2

    0.4

    0.6

    0.8

    1

    time t0 0.2 0.4 0.6 0.8 1

    u(t)

    0

    0.5

    1

    1.5

    2

    time t0 0.2 0.4 0.6 0.8 1

    x(t)

    0

    0.2

    0.4

    0.6

    0.8

    1

    time t0 0.2 0.4 0.6 0.8 1

    u(t)

    0

    0.5

    1

    1.5

    2

  • 13

    Static optimization

    minu0,...,uh�1

    h�1X

    k=0

    (1� e�2⌧ )2

    (xk � uk)2

    xk+1 = e�⌧

    xk + (1� e�⌧ )uks.t.x0 = 0 xh = 1

    Static optimization problem which can handle constraints

    Lagrangian

    L(x1, u0,�1, . . . , xh�1, uh�1,�h) =h�1X

    k=0

    (1� e�2⌧ )2

    (xk�uk)2+h�1X

    k=0

    �k+1(e�⌧

    xk+(1�e�⌧ )uk�xk+1)

    Necessary optimality conditions amount to solving a linear system (when )

    @L

    @xk= 0

    @L

    @uk= 0

    �k = (1� e�2⌧ )(xk � uk) + �k+1e�⌧

    0 = (1� e�2⌧ )(xk � uk) + �k+1(1� e�⌧ )

    xk+1 = e�⌧

    xk + (1� e�⌧ )uk

    x0 = 0 xh = 1

    k 2 {0, . . . , h� 1}

    k 2 {0, . . . , h� 1}

    k 2 {1, . . . , h� 1}

    k 2 {0, . . . , h� 1}@L

    @�k+1= 0

  • 14

    Taking the limit ⌧ ! 0

    Again, seems to be converging to u(t) = 1 + t x(t) = t

    ⌧ = 0.05

    ⌧ = 0.2

    ⌧ = 0.01

    time t0 0.2 0.4 0.6 0.8 1

    x(t)

    0

    0.2

    0.4

    0.6

    0.8

    1

    time t0 0.2 0.4 0.6 0.8 1

    u(t)

    0

    0.5

    1

    1.5

    2

    time t0 0.2 0.4 0.6 0.8 1

    x(t)

    0

    0.2

    0.4

    0.6

    0.8

    1

    time t0 0.2 0.4 0.6 0.8 1

    u(t)

    0

    0.5

    1

    1.5

    2

    time t0 0.2 0.4 0.6 0.8 1

    x(t)

    0

    0.2

    0.4

    0.6

    0.8

    1

    time t0 0.2 0.4 0.6 0.8 1

    u(t)

    0

    0.5

    1

    1.5

    2

  • 15

    Discussion• In this lecture we follow this discretization approach (the more formal continuous-time

    approach can be found in Bertsekas’ book) to derive the counterpart of DP for continuous-time control problems, which is the Hamilton Jacobi Bellman equation

    • Later we will use both the discretization approach and the continuous-time approach to derive the Pontryagin’s maximum principle.

    • With such tools we will be able to establish the optimal solution for charging the capacitor, and solve many other problems.

    CT PMP

    CT DP

    DT PMP

    DT DP

    Discretization, step ⌧ ⌧ ! 0

    Taking the limit

    Optimal path and

    policy

    Stage decision problem

    CT control problem

    Optimal path and

    policy

  • Outline

    • Problem formulation and approach• Hamilton Jacobi Bellman equation• Linear quadratic regulator

  • 16

    Discretization approach

    Dynamic model

    Cost function

    • Note that these are approximate discretizations. We could have considered exact discretization, as in the linear case, but this approximation will suffice.

    Discretization times

    discretization step⌧kh = Ttk = k⌧

    xk+1 = xk + ⌧f(xk, uk) xk = x(k⌧) uk = u(k⌧)

    ẋ(t) = f(x(t), u(t)), x(0) = x0, t 2 [0, T ]

    Z T

    0g(x(t), u(t))dt+ gT (x(T ))

    h�1X

    k=0

    g(xk, uk)⌧ + gh(xh) gh(x) = gT (x), 8x

  • 17

    Dynamic programming

    DP equations for the resulting stage decision problem

    Jh(xh) = gh(xh)

    Jk(xk) = minuk2U

    g(xh, uk)⌧ + Jk+1(xk + ⌧f(xk, uk))

    For convenience let us define

    J̄(k⌧, x) = minu2U

    g(x, u)⌧ + J̄((k + 1)⌧, x+ ⌧f(x, u))

    J̄(h⌧, x) = Jh(x)J̄(t, x) = Jk(x), k 2 [k⌧, (k + 1)⌧)

    Then the dynamic programming algorithm can be written as

    k 2 {h� 1, . . . , 0}

    k 2 {h� 1, . . . , 0}

    8x

    J̄(h⌧, x) = gh(x) 8x

    8x

  • 18

    Taking the limit

    Using first order Taylor series expansion

    ⌧ ! 0

    J̄((k+1)⌧, x+ ⌧f(x, u)) = J̄(k⌧, x)+ ⌧(@

    @t

    J̄(k⌧, x)+@

    @x

    J̄(k⌧, x)f(x, u))+o(⌧2)

    and replacing in the DP algorithm, we obtain

    Assuming that (wishful thinking....) as , converges to a continuously differentiable function, then

    J̄(k⌧, x) = minu2U

    g(x, u)⌧ + J̄(k⌧, x)+ ⌧(@

    @t

    J̄(k⌧, x)+@

    @x

    J̄(k⌧, x)f(x, u))+o(⌧2)

    J̄(t, x)

    0 = minu2U

    g(x, u) +@

    @t

    J̄(t, x) +@

    @x

    J̄(t, x)f(x, u)

    ⌧ ! 0

  • 19

    Theorem (HJB)

    Suppose that is continuously differentiable in and , and is such that it satisfies the Hamilton-Jacobi-Bellman equation:

    V (t, u) t x

    0 = minu2U

    g(x, u) +@

    @t

    V (t, x) +@

    @x

    V (t, x)f(x, u) 8t, x

    V (T, x) = gT (x)

    Suppose also that attains the minimum in the HJB equation for all . u = µ(t, x)t, x

    Then coincides with the optimal cost-to-go and coincides with the optimal policy.

    V (t, x)J(t, x) µ(t, x)

  • 20

    Discussion

    • The HJB equation is a partial differential equation.• The intuitive arguments provided before show that this partial

    differential equation is just an extension of the DP algorithm.

    • The bottleneck of such intuitive arguments is how to establish that the cost-to-go is differentiable.

    • The formal proof uses different argument, following a continuous-time approach. It can be found in Bertsekas’ book, pag 111.

    • Partial differential equations are in general very hard to solve analytically.

    • We are going to apply the HJB equation first to a simple example, then for linear systems and solve the previous problem of charging a capacitor.

  • 21

    Example

    For the simple problem*

    ẋ(t) = u(t) u(t) 2 U := [�1, 1]

    12 (x(T ))

    2

    dynamics

    cost

    t 2 [0, T ]

    The HJB equation is

    with the terminal condition

    Approach: find a candidate for optimality and check that it satisfies HJB.

    V (T, x) =1

    2x

    2

    * example taken from Bertsekas’ book, p. 112

    0 = minu2[�1,1]

    @

    @t

    V (t, x) +@

    @x

    V (t, x)u

  • 22

    Example

    There is an obvious candidate for optimality: move the state towards zero as quickly as possible

    and for an initial time and initial state , the cost is given by

    µ

    ⇤(t, x) = �sign(x) =

    8><

    >:

    1 if x < 0,

    0 if x = 0,

    � 1 if x > 0t x

    J

    ⇤(t, x) =

    1

    2

    (max{0, |x|� (T � t)})2

    xT � t�(T � t)

  • 23

    Example

    This function satisfies the terminal condition of the HJB theorem

    J

    ⇤(T, x) =1

    2x

    2

    satisfies the HJB equation

    0 = min

    u2[�1,1][1 + sgn(x)u]max{0, |x|� (T � t)}

    µ ⇤ (t, x) = u = �sign(x)

    where the minimum in the HJB equation is achieved by

    (not unique when )|x(t)| T � t

    Then this is an optimal policy.

    @

    @x

    J

    ⇤(t, x) = sign(x)max{0, |x|� (T � t)}

    @

    @t

    J

    ⇤(t, x) = max{0, |x|� (T � t)}

  • Outline

    • Problem formulation and approach• Hamilton Jacobi Bellman equation• Linear quadratic regulator

  • 24

    Linear systems, quadratic cost

    HJB

    Dynamic model

    Cost function

    Inspired by the fact that a discretization based approach would result in quadratic costs-to-go, let us try . If such function satisfies the HJB equations, it is the cost-to-go!V (t, x) = x|P (t)x

    ẋ(t) = Ax(t) +Bu(t) x(0) = x0

    0 = minu2Rm

    [x|Qx+ 2x|Su+ u|Ru+@V (t, x)

    @t

    +@V (t, x)

    @x

    (Ax+Bu)]

    V (T, x) = x|QTx

    x(T )|QTx(T ) +Z T

    0(x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt

    Q SS| R

    �> 0

  • 25

    The HJB equation takes then the form

    To obtain the minimum, differentiate and equate to zero

    Linear systems, quadratic cost

    which leads to

    which is only satisfied if

    We have concluded that if satisfies this Riccati equation, then is the cost-to-go and is the optimal policy.

    P (T ) = QT

    P (T ) = QT

    P (T ) = QT

    P (t) J(t, x) = x|P (t)xµ(t, x) = K(t)x

    K(t)|{z}

    0 = minu2Rm

    [x|Qx+ 2x|Su+ u|Ru+ x|Ṗ (t)x+ 2x|P (t)Ax+ 2x|P (t)Bu)]

    2(B|P (t) + S|)x+ 2Ru = 0u = �R�1(B|P (t) + S|)x

    0 = x|(Ṗ (t) + P (t)A+A|P (t)� (P (t)B + S)R�1(B|P (t) + S|) +Q)x

    Ṗ (t) = �(P (t)A+A|P (t)� (P (t)B + S)R�1(B|P (t) + S|) +Q)

  • 26

    Finite horizon quadratic control

    Finite horizonThe optimal control policy for the following problem

    is where is the unique solution of

    P (T ) = QT

    P (t)

    ẋ(t) = Ax(t) +Bu(t)

    u(t) = K(t)x(t)

    , x(0) = x0

    the Riccati equation

    K(t) = �R�1(B|P (t) + S|),

    Moreover, the optimal cost-to-go is given by x|0P (0)x0

    minu

    Z T

    0(x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt+ x(T )|QTx(T )

    Ṗ (t) = �(P (t)A+A|P (t)� (P (t)B + S)R�1(B|P (t) + S|) +Q)

  • 27

    Linear Quadratic Regulator

    Infinite horizon

    The reasoning follows from similar arguments used in the context of stage decision problems.

    The optimal policy for the following problem

    is , where is the unique positive definite solution to the algebraic Riccati equation

    ẋ(t) = Ax(t) +Bu(t) x(0) = x0

    u(t) = Kx(t)

    (A+BK)

    P

    Moreover the closed-loop matrix has all its eigenvalues on the left-half complex plane and the optimal cost-to-go is given by .

    0 = PA+A|P � (PB + S)R�1(B|P + S|) +Q

    K = �R�1(B|P + S|)

    x

    |0Px0

    Q SS| R

    �> 0

    (A,B) controllable

    minu

    Z 1

    0(x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt

  • 28

    Charging a capacitor

    Applying a trick allows to cast our problem in the standard LQR formulation

    ẋ(t) = �x(t) + u(t)

    R 10 (x(t)� u(t))

    2dt+ �(x(1)� 1)2

    |{z} |{z}

    |{z}|{z}

    |{z}

    R 10

    ⇥x(t) y(t)

    ⇤ 1 00 0

    � x(t)y(t)

    �+2

    ⇥x(t) y(t)

    ⇤ �10

    �u(t)dt+1u(t)2+�

    ⇥x(1) y(1)

    ⇤ � ���� �

    � x(1)y(1)

    A B

    SR

    QTQ|{z}

    Dynamic model

    Cost function

    ẋ(t)ẏ(t)

    �=

    �1 00 0

    � x(t)y(t)

    �+

    10

    �u(t)

    x(0)y(0)

    �=

    x0

    1

  • 29

    Riccati equations

    P (T ) = QT

    Ṗ (t) = �(P (t)A+A|P (t)� (P (t)B + S)R�1(B|P (t) + S|) +Q)

    Riccati equations

    P (t) =

    p1(t) p2(t)p2(t) p3(t)

    boil down to and

    ṗ1(t) ṗ2(t)ṗ2(t) ṗ3(t)

    �= �

    p1(t) p2(t)p2(t) p3(t)

    � �1 00 0

    ���1 00 0

    � p1(t) p2(t)p2(t) p3(t)

    +

    p1(t)� 1p2(t)

    � ⇥p1(t)� 1 p2(t)

    ⇤�1 00 1

    or equivalently to the non-linear differential equations

    ṗ1(t) = 2p1(t) + (p1(t)� 1)2 � 1 = p1(t)2

    ṗ2(t) = p2(t) + p2(t)(p1(t)� 1) = p1(t)p2(t)ṗ3(t) = p2(t)

    2

    p1(1) = �p2(1) = p3(1) = �

    whose solution is (solution method not addressed here) p1(t) = �p2(t) = p3(t) =1

    1 + 1� � t

  • 30

    Optimal policy and optimal pathOptimal policy

    u(t) = �R�1(B|P + S)x(t)y(t)

    =⇥�(p1(t)� 1) �p2(t)

    ⇤ x(t)1

    �= �(p1(t)� 1)x(t) + p1(t) = �p1(t)(x(t)� 1) + x(t)

    Optimal path for x(0) = 0

    p1(t) =1

    1 + 1� � t

    ẋ(t) = �x(t) + u(t) = �p1(t)(x(t)� 1)

    Letting the parameter of the artificial terminal cost converge to zero we obtain

    x(t) =t� (1 + 1� )

    1 + 1�+ 1

    u(t) = 1 + t

    x(t) = t

    � ! 0 (� ! 1)

  • 31

    Discussion

    • The HJB equation is a partial differential equation and an analytical solution is very hard to find.

    • For problems with linear models and quadratic costs, computing the optimal policy and optimal paths involves solving non-linear differential equations (Riccati equations).

    • We were able to solve these Riccati equations since the dimension of the state-space in our example was small.

    • The approach based on Pontryagin’s maximum principle will lead to different conditions which can be applied to more cases.

    • We will later consider stochastic disturbances, but the advantages of having a policy are exactly the same as for stage decision problems.

  • 32

    Concluding remarks

    • The counter part of DP for stage-decision problems is the HJB equation.

    • This is a partial differential equation very hard to solve in general.

    • However, for linear systems we can solve it and this leads to the Riccati equations.

    • As for discrete-time optimal control problems this leads to an algebraic Riccati equation (LQR in continuous-time) when the horizon is infinite.

    Summary:

    After this lecture you should be able to:

    • Compute optimal policy and optimal path for problems with linear model and finite-horizon quadratic cost (Riccati equations).

    • Compute the optimal policy for problems with linear models and infinite-horizon quadratic cost.

    • Solve the algebraic Riccati equation analytically when the dimension of the state-space is small.