development of a deep recurrent neural network controller
TRANSCRIPT
Development of a Deep Recurrent Neural Network
Controller for Flight Applications
American Control Conference (ACC) May 26, 2017
Scott A. Nivison Pramod P. Khargonekar
Department of Electrical and Computer Engineering University of Florida
Distribution A: Approved for public release; distribution is unlimited.
Outline
Inspiration DLC Limitations and Research Goals Background β Direct Policy Search Flight Vehicle (Plant) Model Architecture - Deep Learning Flight Control Optimization - Deep Learning Controller Simulation Results Conclusions Future Work
Distribution A: Approved for public release; distribution is unlimited
Inspiration
Deep Learning (DL): β’ Dominant performance in multiple machine learning areas: Speech, language, vision, face recognition, etc. β’ Replaces time consuming typical hand-tuned machine learning methods
Distribution A: Approved for public release; distribution is unlimited.
Key Breakthroughs in establishing Deep Learning: β’ Optimization methods: Stochastic Gradient Descent, Hessian-free, Nestorovβs momentum, etc. β’ Deep recurrent neural network (RNN) architectures: deep stacked RNN, deep bidirectional
RNN, etc. β’ RNN Modules: Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) β’ Automatic gradient computation β’ Parallel computing
Connecting Deep Learning to Control: β’ Guided Policy Search β uses trajectory optimization to assist policy learning (S. Levine and V.
Koltun, 2013) β’ Hessian-free Optimizationβ deep RNN learning control with uncertainties (I. Sutskever, 2013)
DLC Limitations and Research Goals
Policy Search/Deep Learning (DL) Control Limitations: β’ Large number of computations at each time step β’ No standard training procedures for control tasks β’ Few analysis tools for deep neural network architectures β’ Non-convex form of optimization provides few guarantees β’ Lack of research in optimization with regards to robustness β’ Most RL approaches use the combination of modeled dynamics and real-life trial to tune policy (not
useful for flight control)
Distribution A: Approved for public release; distribution is unlimited.
Develop a robust deep recurrent neural network (RNN) controller with gated recurrent unit (GRU) modules for a highly agile high-speed flight vehicle that is trained using a set of sample trajectories that contain disturbances, aerodynamic uncertainties, and significant control attenuation/amplification in the plant dynamics during optimization.
Research Goals
Direct Policy Search Background
Adaptive Controller: Longitudinal Short-Period Dynamics for High-Speed Flight Vehicle:
Adaptive Controller:
Distribution A: Approved for public release; distribution is unlimited.
π₯π‘+1 = π π₯π‘,π’π‘ + π€, π₯0~π(π₯0)
πΞβ = ππππππ₯πΞπ½Ξ = ππππππ₯πΞ£π‘=1π‘π πΎπ‘πΈ π π₯π‘,π’π‘ πΞ , πΎ β [0,1]
Markovian Dynamics
Find Parametrized Policy (πΞβ ) for the finite horizon problem:
Certainty-Equivalence (CE) Assumption: The optimal policy for the learned model corresponds to the optimal policy for the true dynamics
How to use models for long-term predictions: 1. Stochastic inference (i.e. trajectory sampling) 2. Deterministic approximate inference
*M. P. Deisenroth, et al., βA survey on policy search for robotics,β Foundations and Trends in Robotics, vol. 2, 2013.
Adaptive Controller:
Reinforcement Learning (RL): Develop methods to sufficiently train an agent by maximizing a cost function through repeated interactions with its environment.
Direct Policy Search Background
Adaptive Controller:
Stochastic Sampling:
Distribution A: Approved for public release; distribution is unlimited.
π½Ξ= ππππππ₯πΞ£π‘=1π‘π πΎπ‘Ξ[π(π₯π‘,π’π‘)|πΞ]
Expected long-term reward:
Adaptive Controller: π½ΞοΏ½ = 1
πΞ£π=1π Ξ£π‘=1
π‘π πΎπ‘π(π₯π‘π) limπββ
π½ΞοΏ½ = π½Ξ
Approximation of π½Ξ:
Adaptive Controller:
Deterministic Approximations:
πΈ π π₯π‘,π’π‘ |πΞ = οΏ½π π₯π‘ π π₯π‘ ππ₯π‘
π π₯π‘ β Ζ π₯π‘|ππ‘π₯ , Ξ£π‘ π₯
Approximation of π π₯π‘ : (e. g. Unscented Transformation or Moment Matching)
*M. P. Deisenroth, et al., βA survey on policy search for robotics,β Foundations and Trends in Robotics, vol. 2, 2013.
Deep Learning based Flight Control
Distribution A: Approved for public release; distribution is unlimited.
Deep Learning Training Architecture
π₯π‘ are the states of the plant π’π‘πππ‘ is the actuator output π¦π‘ is the output vector ππ is the plant noise ππ’ is the control effectiveness ππ’ is an input disturbance ππ’,ππΌ ,ππ are uncertainty parameters
Generalized Form of the Discrete Plant Dynamics
Flight Vehicle Model
Distribution A: Approved for public release; distribution is unlimited.
Longitudinal Rigid Body Dynamics: Force and Moment Equations:
Aerodynamic Coefficients (Longitudinal):
Deep Learning Controller - Architecture
Distribution A: Approved for public release; distribution is unlimited.
Stacked Recurrent Neural Network (S-RNN)
ππ‘ = [ππ ,πΌ, π, ποΏ½] π = π¦π π π β π¦πππ
ππ = οΏ½ ππ‘π
0ππ
Ξπ matrix of parameters for each GRU module Ξπ = ππ’,ππ’,ππ ,ππ ,πβ,πβ,π1, π2, π3 Ξ total parameters of the controller Ξ = [Ξ1,Ξ2, β¦ ,ΞπΏ ,π, π] πΏ total layers of GRU modules
Controller Input Vector:
Deep Learning Controller - Optimization
Distribution A: Approved for public release; distribution is unlimited.
π½(Ξ) =1πΞ£π=1π π½π(Ξ)
Estimated Expected Long-Term Reward:
Cost function for each trajectory, i:
πππππ = π: only uncertainties πππππ = π : uncertainties, noise, and disturbances
π½π(Ξ) = Ξ£π‘=0π‘π πΎπ‘π π₯π‘,π’π‘
π πΌ,π π,π π’,Ξπ’ ~πππππππ πππ,πππ₯ [ππΌπ ,πππ ,ππ’π , ππ’π ] are uncertainties (ith trajectory) π is the number of sampled trajectories ππ is the duration of each sampled trajectory π1,π2,π3,π4 are positive constant gains ππ ,ποΏ½ΜοΏ½ are static constant bounds of the funnel ππ‘ = π¦π π π β π¦ππ π
Instantaneous Measurement:
Time-varying funnel:
π π₯π‘,π’π‘ = οΏ½π1ππ‘2 + π2ποΏ½ΜοΏ½2 ππ πππππ = ππ3ππ 2 + π4ποΏ½ΜοΏ½2 ππ πππππ = π
ππ = max ππ‘ β ππ , 0 ποΏ½ΜοΏ½ = max (|π’π‘|Μ β ποΏ½ΜοΏ½, 0)
π½οΏ½ΜοΏ½ π = π₯ β βπ π π₯, π β€ ποΏ½ΜοΏ½ π π½π π = π₯ β βπ πΈ π₯, π β€ ππ π
DLC β Incremental Training
Distribution A: Approved for public release; distribution is unlimited.
Optimization Specifications:
RNN/GRU, RNN, TD-FNN L-BFGS (Quasi-Newton) Optimization N = 2,970 Sample Trajectories 540 Performance Trajectories 2,430 Robust Trajectories 3.1 GHz PC with 256 GB RAM and 10 cores Parallel Processing ~40 hours for 1,000 sample trajectories for optimization
DLC - Results
Distribution A: Approved for public release; distribution is unlimited.
Initial Conditions:
Augmented polynomial short period model:
Flight Condition:
DLC - Results
Distribution A: Approved for public release; distribution is unlimited.
π΄π΄πΈ = 1πΞ£π=1π Ξ£π‘=1
π‘π ππ‘
π΄π΄π =1πΞ£π=1
π Ξ£π‘=1π‘π π’οΏ½ΜοΏ½
π΄π΄πΈ = Ξ£π=1π π΄π΄πΈ π΄π΄π = Ξ£π=1π π΄π΄π
Initial Conditions:
Augmented polynomial short period model:
Performance Metrics:
π number of trajectories for each analysis model ππ is the duration of each sampled trajectory CTE is the cumulative tracking error CCR is the cumulative control rate
Flight Condition:
DLC Performance: 66% reduction in CTE, 91.5% reduction in CCR
DLC - Results
Distribution A: Approved for public release; distribution is unlimited.
Initial Conditions:
Augmented polynomial short period model:
Flight Condition:
Gain Scheduled Controller Deep Learning Controller
DLC - Results
Distribution A: Approved for public release; distribution is unlimited.
Gain Scheduled Controller
Deep Learning Controller
Gain Scheduled Controller
No uncertainty or disturbances
Contribution and Conclusion
Distribution A: Approved for public release; distribution is unlimited.
β’ Created a novel training procedure focused on bringing deep learning benefits to flight control.
β’ Trained controller using a set of sample trajectories that contain disturbances, aerodynamic uncertainties, and significant control attenuation/amplification in the plant dynamics during optimization.
β’ We found benefits of using a piecewise cost function that allows the designer to solve both robustness and performance criteria simultaneously.
β’ We utilized an incremental initialization training procedure for deep recurrent neural networks.
Future Work
Distribution A: Approved for public release; distribution is unlimited.
β’ Pursue a vehicle model with flexible body effects, time delays, controller effectiveness, center of gravity changes, and aerodynamic parameter variations.
β’ Explore improving parameter convergence and analytic guarantees: Kullback-Leibler (KL) divergence, importance sampling, etc.
β’ Pursue development/use of robustness analysis tools for deep learning controllers to provide region of attraction estimates and time delay margins: sum-of-squares programming etc.
Questions?
Distribution A: Approved for public release; distribution is unlimited.