probabilistic safety constraints for learned high relative

Probabilistic safety constraints for learned high relative degreesystem dynamics

Mohammad Javad Khojasteh∗1 Vikas Dhiman∗2 Massimo Franceschetti2 Nikolay Atanasov2

2University of California San Diego, 1California Institute of Technology

Objective

We study the problem of enforcing probabilistic safety when sys-tem dynamics are unknown and being learned from the samples,

minuk

Task cost (xk,uk)s.t. P( Safety |xk,uk) ≥ 1− risk tolerance (1)

Contributions

Matrix Variate Gaussian ProcessWe derive inferenceequations for Matrix Variate Gaussian Processes thatpreserve structure.

Safe-controller for higher relative degree systemsWe deriveCantelli-inequality based safety bound for higherrelative degree systems and use it create QCQP basedsafe controller.

Inter-triggering time safety analysisWe derive conditions toensure safety between control computation times.

Notation

Symbol Meaningxk ∈ Rn System state at discrete time kx(t) System state at cont. time tu ∈ Rm Control signalu , (1; u)f (x) drift term of system dynamicsg(x) input gain term of system dynamicsF (x) , [f (x), g(x)]vec(M) Column-major vectorization of a matrix Mh(x) Control barrier function defining the safety

region as h(x) ≥ 0πε(x) Reference controller whose trajectory we want

to follow as close as possible

Problem formulation

For a control-affine system dynamics,

x = f (x) + g(x)u = [f (x) + g(x)][

1u

]= F (x)u,

assume the system dynamics F (x) to be a Gaussian Processvec(F (x)) ∼ GP(vec(M0(x)),K0(x,x′)), (2)

where vec(F (x)) is column-major vectorization of F (x). Designa safe controller with safe probability pk,

minuk∈U

‖uk − πε(xk)‖

s.t. P(safety at all times) ≥ pk (3)where the safety condition can be simply h(xk) > 0 or a ControlBarrier Condition [2], CBC(x,u) , h(x) + αh(x) ≥ 0 withα > 0.

Matrix Variate Gaussian Process

We define Matrix Variate Gaussian ProcessMVGP(M(x),A,B(x,x′)) [4, 3].

vec(F (x)) ∼ GP(vec(M0(x)),B0(x,x′)⊗A)⇔ F (x) ∼MVGP(M0(x),A,B0(x,x′)) (4)

Advantages to alternative approaches:Fewer parameters Only (1 + m)2 + n2 parameters when learning

B0 and A as compared to (1 + m)2n2 parameters for K0.(m is control dimensions, n is state dimensions)

Captures correlation across output dimensions As compared tolearning a GP per dimension, we capture correlation acrossoutput dimensions without excessive computational cost:O((1 + m)3k2) + O(k3) vs O((1 + m)k2) + O(k3) where kis number of samples.

Preserves structure across inference Inference with k datasamples of {xi,xi,ui}ki=1, leads to another MVGP,

Fk(x∗) ∼MVGP(Mk(x∗),A,Bk(x∗,x∗)) (5)where Mk and Bk can be computed from the data samples.

Stochastic Control Barrier Condition

We consider the safety condition for system of relative degreer (defined as LgLr−1

f h(x) 6= 0, but LgLjfh(x) = 0 for all j ={0, . . . , r−2}) as the exponential control barrier condition CBC(r)

defined as [1],CBC(r)(x,u) := L(r)

f h(x) + LgL(r−1)f h(x)u + k>αη(x),

where η(x) , (h(x);Lfh(x); . . . ;L(r−1)f h(x)) (6)

We show that the mean and variance of CBC(r)(x,u) are affineand quadratic in u.

E[CBC(r)] =(E[F (x)>∇L(r−1)

f h(x)] + E[[k>αη(x),0>]>])>

u(7)

Var[CBC(r)] = u>Var[∇L(r−1)

f h(x)>F (x) + [k>αη(x),0>]]

u(8)

Hence the safety condition P(CBC(r)(xk,uk) ≥ ζ > 0) ≥ pk canbe written as quadratic constraints using Cantelli’s inequalityand the controller thus becomes,

minuk∈U‖uk − πε(xk)‖

s.t. (E[CBC(r)k ]− ζ)2 ≥ pk

1− pkVar[CBC(r)

k ]

E[CBC(r)k ]− ζ ≥ 0 (9)

While we derive closed form expression for E[CBC(r)] andVar[CBC(r)] for r = 1 and r = 2, in general for r ≥ 3 the meanand variance can be estimated using Monte Carlo estimators.

Inter-triggering time safety analysis

We assume sample trajectories from Gaussian Process dynamicsare locally Lk-Lipchitz with large probability qk, then we establishthatP(CBC(xk,uk) ≥ ζ > 0|xk,uk) ≥ pk

and τk ≤1Lk

ln(1 + Lkζ

(χkLk + Lα·h)‖xk‖)

=⇒ P(CBC(x(t),uk) ≥ 0) ≥ pk = pkqk ∀t ∈ [tk, tk + τk)(10)

Results

Figure 1:Bottom row: Learned vs true pendulum dynamics using matrix variate Gaussian Process regression

Experiment and Results

We evaluate the proposed approach on a pendulum with massm and length l with state x = [θ, ω] and control-affine dynamicsf (x) = [ω,−g

l sin(θ)] and g(x) = [0, 1ml] as depicted in Fig 2. The

control barrier function is chosen as h(x) = cos(∆col)−cos(θ−θc).

0 50 100

71

72

73

74

75

θ(d

egre

es)

0 50 100

−0.4

−0.2

0.0

ω(r

ad/s

)

0 50 100−5

0

5

10

15

20

u

Pendulum

Figure 2:Top left: Pendulum simulation (left) with an unsafe (red) region.Top right: The pendulum trajectory (middle) resulting from the applicationof safe control inputs (right) is shown.

Conclusion and Ongoing work

•More experiments (closer to the Motivation).•What if QCQP is non-convex?•Entropy objective to pick optimal actions for reducinguncertainity.

References

[1] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, andP. Tabuada.Control barrier functions: Theory and applications.In 2019 18th European Control Conference (ECC), pages 3420–3431,June 2019.

[2] Aaron D Ames, Xiangru Xu, Jessy W Grizzle, and Paulo Tabuada.Control barrier function based quadratic programs for safety criticalsystems.IEEE TAC, 62(8):3861–3876, 2016.

[3] Shengyang Sun, Changyou Chen, and Lawrence Carin.Learning Structured Weight Uncertainty in Bayesian Neural Networks.In AISTATS, pages 1283–1292, 2017.

[4] Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learning, volume 2.MIT press Cambridge, MA, 2006.

Acknowledgements

We gratefully acknowledge support from NSF awards CNS-1446891 and ECCS-1917177, and support from ARL DCIST CRA W911NF-17-2-0181.

Contact Information

Paper URL: https://arxiv.org/abs/1912.10116

Mohammad Javad Khojasteh [email protected] Dhiman [email protected]

https://arxiv.org/abs/1912.10116

http://www.its.caltech.edu/~mjkhojas/

mailto:[email protected]

http://vikasdhiman.info

mailto:[email protected]

probabilistic safety constraints for learned high relative

Documents