optimal real-time landing using deep networks
TRANSCRIPT
![Page 1: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/1.jpg)
Optimal real-time landing
using deep networks
Carlos Sánchez and Dario Izzo
Advanced Concepts Team (ESA)
Daniel Hennes
Robotics Innovation Center (DFKI)
Learning the optimal state-feedback using deep networks
![Page 2: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/2.jpg)
Rocket Landing Simulation
Watch the video clip
“Deep Learning in Deep Space”
https://youtu.be/8sDldAK4O0E
![Page 3: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/3.jpg)
An on-board real-time optimal control system
Goal is to make
![Page 4: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/4.jpg)
• The mathematical model of a system usually leads to a system of equations
describing the nature of the interaction of the system.
• These equations are commonly known as governing laws or model
equations of the system.
• The model equations can be :
time independent steady-state model equations
time dependent dynamic model equations
In this course, we are mainly interested in dynamical systems.
Systems that evolve with time are known as dynamic systems.
Dynamic Systems
![Page 5: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/5.jpg)
Examples of Dynamic models – RLC Circuit
• Linear Differential Equations
• Example RLC circuit (Ohm‘s and Kirchhoff‘s Laws)
𝑥 = 𝐴𝑥 + 𝐵𝑢
• Nonlinear Differential equations : general cases
𝑥 = 𝑓(𝑥 𝑡 , 𝑢 𝑡 , 𝑡)
![Page 6: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/6.jpg)
In mathematical systems theory, simulation is done by solving the
governing equations of the system for various input scenarios.
This requires algorithms corresponding to the type of
systems model equation.
Numerical methods for the solution of systems of
equations and differential equations.
Simulation ...
![Page 7: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/7.jpg)
• A system with degrees of freedom can be always
manipulated to display certain useful behavior.
• Manipulation possibility to control
• Control variables are usually systems degrees of freedom.
We ask:
What is the best control strategy that forces a system to
display required characterstics, output, follow a trajectory,
etc ?
Optimal Control
Methods of Numerical Optimization
Optimization of Dynamic Systems
![Page 8: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/8.jpg)
Ex. : Optimal Control of Landing𝑥1
𝑥1𝑠, 𝑥2
𝑠Model Equations:
![Page 9: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/9.jpg)
Objectives of the optimal control :
• Minimization of the error :
• Minimization of energy : 0𝑡𝑢 𝑡 𝑑𝑡
Ex. : Optimal Control of Landing
(𝑥1𝑠−𝑥1(𝑡)), (𝑥2
𝑠−𝑥2(𝑡)) Cost @ terminal
Integration of cost rate
during flight
![Page 10: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/10.jpg)
Problem formulation :
Cost function :
Model (state )
equations :
Initial states:
Desired final states:
How to solve the above optimal control problem in order to achieve
the desired goal ? That is, how to determine the optimal trajectories
𝑥1∗(t), 𝑥2
∗(𝑡) that provide a minimum energy consumption 0𝑡𝑢 𝑡 𝑑𝑡 so
that the rocket can be halted at the desired position?
![Page 11: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/11.jpg)
Optimization of Dynamic Systems
Regard simplified optimal control problem:
Cost rate @ t
![Page 12: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/12.jpg)
Euler Discretization
![Page 13: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/13.jpg)
𝐽 𝑥, 𝑡𝑘 = 𝐽𝑘(𝑥)Cost @ t=𝑡𝑘
Cost @ t=0 𝐽0(𝑥) = 𝑘=0𝑁−1ℎ𝐿 𝑥, 𝑢, 𝑡𝑘 + 𝐸(𝑢𝑁)
= ℎ𝐿 𝑥, 𝑢, 𝑡0 + 𝑘=1𝑁−1ℎ𝐿 𝑥, 𝑢, 𝑡𝑘 + 𝐸(𝑢𝑁)
= ℎ𝐿 𝑥, 𝑢, 𝑡0 + 𝐽1(𝑥)
= ℎ𝐿 𝑥, 𝑢, 𝑡0 + ℎ𝐿 𝑥, 𝑢, 𝑡1 +⋯+ ℎ𝐿 𝑥, 𝑢, 𝑡𝑁−1 + 𝐽𝑁(𝑥)
𝐽1(𝑥) = ℎ𝐿 𝑥, 𝑢, 𝑡1 + 𝐽2(𝑥)
⋮
𝐽𝑁−1(𝑥) = ℎ𝐿 𝑥, 𝑢, 𝑡𝑁−1 + 𝐽𝑁(𝑥)
𝐽𝑁(𝑥) = 𝐸(𝑢𝑁) So given u(¢) we can solve inductively backwards in time for objective J(t, x, u(¢)), starting at t = tN
Called dynamic programming (DP)
Dynamic Programming for Euler Scheme
Cost @ t=1
Cost @ t=N-1
Cost @ t=N
![Page 14: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/14.jpg)
Dynamic Programming for Euler Scheme
Optimal control
![Page 15: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/15.jpg)
Hamilton-Jacobi-Bellman (HJB) Equation
![Page 16: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/16.jpg)
Optimal feedback control for state x at time t is obtained from
Hamilton-Jacobi-Bellman equationoptimal control policy
a set of extremely challenging partial differential equations
Impossible to implement an Onboard Real Time Controller
Continuous Time Optimal Control
![Page 17: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/17.jpg)
1. Pre-compute many optimal trajectories
2. Train an artificial neural network to approximate the optimal behaviour
3. Use the network to drive the spacecraft
Proposed Approach
Learning the optimal control policy using deep networks
![Page 18: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/18.jpg)
Landing models
![Page 19: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/19.jpg)
Training data generation
• The training data is generated using the Hermite-Simpson transcription and a
non-linear programming (NLP) solver
• The initial state of each trajectory is randomly selected from a training area
• 150,000 trajectories are generated for each one of the problems
(9,000,000 - 15,000,000 data points)
![Page 20: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/20.jpg)
Approximate state-action with a DNN
![Page 21: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/21.jpg)
Approximate state-action with a DNN
![Page 22: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/22.jpg)
Approximate state-action with a DNN
![Page 23: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/23.jpg)
How good is it ?
Very accurate results
+
The DNNs can be used as
an on-board reactive
system (32.000x faster
than optimization
methods used to
generate the data)
![Page 24: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/24.jpg)
Generalization ?
• Successful landings from states outside of the training initial conditions
• This suggest that an approximation to the solution of the HJB equations is
learned
![Page 25: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/25.jpg)
![Page 26: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/26.jpg)
CNN for state estimation from camera
1. Train a neural network to guess the state from an on-board camera
2. Use it together with the previous DNNs to get fully automated visual
landing
![Page 27: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/27.jpg)
A simple setup is used: a 3D model (Blender) of a rocket landing on
a sea platform (Falcon 9 inspired
CNN for state estimation from camera
![Page 28: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/28.jpg)
CNN for state estimation from camera
![Page 29: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/29.jpg)
CNN for state estimation from camera
![Page 30: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/30.jpg)
CNN for state estimation from camera
![Page 31: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/31.jpg)
Watch the video clip
https://youtu.be/8sDldAK4O0E
![Page 32: Optimal real-time landing using deep networks](https://reader031.vdocuments.us/reader031/viewer/2022022820/621c147de889a51d97057d56/html5/thumbnails/32.jpg)
• Deep networks trained with large datasets successfully approximate
the optimal control even for regions out of the training data
• DNNs can be used as an on-board reactive control system, while
current methods fail to compute optimal trajectories in an efficient way
Conclusions