decision making architectures for safe planning and

Decision Making Architectures for Safe

Planning and Control of Agile Autonomous Vehicles

Evangelos A. TheodorouAutonomous Control and Decision Systems Lab

Perceptual Decision Making

Reinforcement Learning

Model PredictiveControl

Pri

or

Kn

ow

led

ge

High

Low Time Scale Time Steps Trajectories

Fully

Partially

Perceptio

nPlant

WPlantControl

W

Policies

Costs

TerrestrialNavigation

Outline

Intro & Motivation

Control Architectures & Perception

Conclusions and Future

Control Architectures & Uncertainty

What happens when uncertainty is not considered?

Nominal DynRe-Optimization

Online Model Learning

Planned Trajectory

Safety Controller

Low LevelAdaptive Control

Nominal DynRe-Optimization

Actual Dyn Re-Optimization

Information Processing Architectures

How would you architect your stack?

Where should learning be incorporated?

What notions of robustness we have?

Tubes Full Blown ModelLearning

Adaptive + Predictive

Tube-MPPI Robust MPPI

z

MPPI

Model Predictive Path Integral (MPPI) Control

(-) Importance Sampler may get stuck to a local minima.

(-) Nominal State is chosen independent of Actual State.

(-) Importance Sampling is unaware of the underlying ancillary controller.

(+) Augmented Importance Sampling.

(+) Nice Trade-off between agility and robustness.

(-) Robustness issues when Large disturbances.

✓Performance near dynamic limits ✓Constraint satisfaction ✓Real-Time Performance

FastRe-optimization

GPU

Low Level Re-optimization

Fast re-planning on GPU on nominal dynamics/Fast Tracking on a CPU

Robust MPPI

Free Energy Diff < Levels Constraint Satisfaction + Tracking/Uncertainty + Sampling Error

Learning Deep Tubes for Robust MPC

Learning Deep Tubes for Robust MPC

Sully Miracle on the HudsonAirbus 320 lost both engines shortly after takeoff due to bird strike.

High LevelOptimization

Medium LevelOptimization

Low LevelOptimization

Time-ScaleFrequency

Slow

Fast

Courtesy: NASA Langley Aerodrome

Planned Trajectory

Safety Controller


Stochastic Control Barrier Functions

Stochastic Control Lyapunov Functions

Bayesian Neural Networks

Adaptation and Online Learning

0 10 20 30 40 50 60X Position

−3

−2

−1

0

1

2

Y Po

sitio

n

ref ad qp pd rob balsa

−5

0

5No Learning

ref 0-60s 60-120sGP

−2.5 0.0 2.5−5

0

5Dropout NN

−2.5 0.0 2.5

ALPaCA

Algorithm 1: BAyesian Learning-based Safety and Adap-tation (BALSA)

1 Require: Prior model f(x), known g(x), referencetrajectory xrm, choice of modeling algorithmΔi(x)∼N (mi(x),σi(x)), dt, A, Hu≤b.

2 Initialize: i=0, Dataset D0=∅, t=0, solve P3 while true do4 Obtain μrm= x2rm(t) and compute μpd

5 Compute model error and uncertaintyμad=mi(x(t)), and σi(x(t))

6 μqp← Solve QP (17)7 Set u(t)=g(x)−1(μrm+μpd+μqp−μad−f(x))8 Apply control u(t) to system.9 Step forward in time t← t+dt.

10 Append new data point to database:11 Xt=[x(t)], Yt=

(x2(t+dt)−x2(t))/dt−(f(x(t))+g(x(t)u(t)).12 Di←Di∪{Xt,Yt}13 if updateModel then14 Update model Δi(x,μ) with database Di

15 Di+1←Di, i← i+1 Stochastic Control Barrier Functions

Stochastic Control Lyapunov Functions

Bayesian Neural Networks

How do we bring adaptation?

Planned Trajectory

Safety Controller



MPPI

TrackingController


Case L1 off L1 on1) � �2) � �3) � �4) � �5) � �

1) known dynamics model (since drag is not modeledin the nominal dynamics, some drag compensation isexpected with L1 augmentation);

2) mass increase by 50%;3) moment of inertia increase by 100% in all axes;4) constant nose-up pitching moment disturbance of

0.1 Nm (equivalent to center of gravity offset);5) reduction in motor thrust control power by 40% (reduc-

tion in both TδT and MδM ).


MPPI

TrackingController


Case L1 off L1 on1) � �2) � �3) � �4) � �5) � �

1) known dynamics model (since drag is not modeledin the nominal dynamics, some drag compensation isexpected with L1 augmentation);

2) mass increase by 50%;3) moment of inertia increase by 100% in all axes;4) constant nose-up pitching moment disturbance of

0.1 Nm (equivalent to center of gravity offset);5) reduction in motor thrust control power by 40% (reduc-

tion in both TδT and MδM ).

x ∼ PID(X ) x ∼ PL(X )System ID Distribution: Local Distribution:

θi+1 = θi − γ (αGL(θi) +GID(θi))

α = maxa∈[0,1]

s.t 〈aGL(θi) +GID(θi), GID(θi)〉 ≥ 0

Proposed Scheme:

θi+1 = θi − γGL(θi)Update Scheme: θi+1 = θi − γ(GL(θi) +GID(θi))

LWPR: y =

L∑i=1

wi · fi(x− ci), wi =exp

(− 12 (x− cI)

TDi(x− ci))

∑Lj=1 exp

(− 12 (x− cj)TDj(x− cj)

)

G. Williams et all, arXiv:1905.05162, Submitted


Adaptive Model Predictive Control

ComputationSize FLOPs/Prediction

LWPR 5,645 (Receptive Fields) > 141, 125Neural Network 1,412 (Weights and Biases) 2, 688

Base SGD LW-PR2 LWPRRoll Rate 0.01 0.01 0.01 0.01

Long. Acc. 2.73 2.28 2.30 2.06Lat. Acc. 1.71 1.29 1.24 1.28

Head. Acc. 8.28 4.48 4.87 4.54Total MSE 3.18 2.10 2.11 1.97

Active MSE N/A N/A 2.54 N/A

Performance

Outline

Motivation & Intro


Conclusiona and Future


Outline

Motivation & Intro


Conclusion and Future


What is the optimal IPA for perceptual control?

Is the design of IPA imposed by the nature of the data?

Do we have any priors for designing IPAs?

How important is the structure of IPAs for safety in AI?

Questions:

Information Processing Architecture (IPA) for Perceptual control

Decision Making Control

Perception/ML

Information Processing Architectures : IPA

Plant

World

Policies

End to EndArchitectures

PlantControl

World

Costs

StructuredArchitectures

Filtering MPCSensors Systems

Teacher - Fully observable MPC Learner

Y. Pan et all RSS 2018.

IPA-I

Y. Pan et all RSS 2018.

IPA-I

IPA-I & Uncertainty Quantification

Aleatoric - Incomplete dataEpistemic- Incomplete knowledge of the environment.

Types of Uncertainty in ML Models

K. Lee et all ICRA 2019.

��

��2

IPA-I & Uncertainty Quantification

At Training Time Minimize the Loss:

Total Uncertainty:

At Test Time Sample the structure of the Network:

Uncertainty Quantification & Redundancy

IPA-II: The Macula-Net

3D

(A) (B) (C) (D) (E)

ModelPredictionNetwork

MaculaNetwork

u

μlog(σ2)

3D max pooling3D convolution+ReLU+Dropout+BatchNorm

fully connected+ReLU+Dropout+BatchNormlinear

(4, 32, 32, 3)

(4, 32, 32, 64)

(4, 16, 16, 128)

(4, 8, 8, 256)

(4, 4, 4, 512)(4, 2, 2, 512)

(4, 1, 1, 512)

4096 4096

1000

3D

MPC

SystemModel

Control Trajectory

StateTrajectory

Control

K. Lee et all, arXiv:1904.11898, Submitted

IPA-II: The Macula-Net

3D

(A) (B) (C) (D) (E)


MaculaNetwork

u

New Objects DropoutVGG [9] PAPC [Ours]

Min: 0.37 mAvg: 0.39 mMax: 0.42 m

Min: 4.28 mAvg: 6.87 mMax: 9.22 m

Min: 2.20 mAvg: 2.54 mMax: 2.86 m

Min: 4.81 mAvg: 5.48 mMax: 6.25 m

Min: 0.00 mAvg: 0.00 mMax: 0.00 m

Min: 6.80 mAvg: 7.25 mMax: 7.83 m

Min: 2.12 mAvg: 2.25 mMax: 2.44 m

Min: 7.62 mAvg: 6.87 mMax: 8.33 m

Min: 1.28 mAvg: 2.06 mMax: 2.44 m

Min: 6.55 mAvg: 7.51 mMax: 8.17 m

Min: 0.00 mAvg: 0.63 mMax: 2.51 m

Min: 10.58 mAvg: 11.28 mMax: 14.96 m

Min: 0.00 mAvg: 0.26 mMax: 1.29 m

Min: 6.91 mAvg: 12.55 mMax: 14.63 m

Min: 0.00 mAvg: 0.67 mMax: 4.11 m

Min: 6.17 mAvg: 10.09 mMax: 13.25 m

IPA-II

3D

(A) (B) (C) (D) (E)


MaculaNetwork

u

Filtering MPCSensors SystemsDynamics

Cost

Optimize

IPA-III

IPA-III

AI in Aerospace Systems

IPA-IV: PixelMPC

p = v

v = g +m−1(Rωb fT+fD +wf )

q =1

2

⎡⎢⎢⎣−qx −qy −qzqw −qz qyqz qw −qx−qy qx qw

⎤⎥⎥⎦⎡⎣ωx

ωy

ωz

⎤⎦

Xpixel = Fpixel(q,Xpixel,U)

= PolarToEuler(DOF (q,Xpixel,U))

Drone Dynamics Pixel Dynamics

Augmented Dynamics

IPA-IV: PixelMPC

Outline

Motivation & Intro




Outline

Motivation & Intro




Decision Making Architectures

Partial Differential Equations

Stochastic DifferentialEquations

Stochastic Optimal Control Forward/Backward Stochastic Differential Equations (FBSDEs)

Perceptual Decision Making Risk Measures and Stochastic Differential Games

Perceptual Decision Making Control Barrier Functions & Barrier Certificates

Deep Neural NetworkArchitectures

Perceptual Decision Making Adaptive Control & Contraction Theory

Safety & Deep Learning Theory

xt+1 = ft(xt,ut)

minu

J(u;x0) = minu

[φ(xT ) +

T−1∑t=0

�t(xt,ut)

]State Output Activation

Controls Weights

Time Horizon Number of Layers

Terminal Cost Loss Functions

Optimal Control Deep Learning

Cost Function

Dynamics

Autonomous Control and Decision Systems Lab

Vertical Lift Research Center of Excellence

Students:Collaborators:

Jim Regh - Georgia Tech

Naira Hovakimyan - UIUC

Ali-akbar Agha-mohammadiJPL - NASA

decision making architectures for safe planning and

Documents