probabilistic safety constraints for learned high relative

1
Probabilistic safety constraints for learned high relative degree system dynamics Mohammad Javad Khojasteh *1 Vikas Dhiman *2 Massimo Franceschetti 2 Nikolay Atanasov 2 2 University of California San Diego, 1 California Institute of Technology Objective We study the problem of enforcing probabilistic safety when sys- tem dynamics are unknown and being learned from the samples, min u k Task cost (x k , u k ) s.t. P( Safety |x k , u k ) 1 - risk tolerance (1) Contributions Matrix Variate Gaussian Process We derive inference equations for Matrix Variate Gaussian Processes that preserve structure. Safe-controller for higher relative degree systems We derive Cantelli-inequality based safety bound for higher relative degree systems and use it create QCQP based safe controller. Inter-triggering time safety analysis We derive conditions to ensure safety between control computation times. Notation Symbol Meaning x k R n System state at discrete time k x(t) System state at cont. time t u R m Control signal u , (1; u) f (x) drift term of system dynamics g (x) input gain term of system dynamics F (x) , [ f (x),g (x)] vec(M ) Column-major vectorization of a matrix M h(x) Control barrier function defining the safety region as h(x) 0 π (x) Reference controller whose trajectory we want to follow as close as possible Problem formulation For a control-affine system dynamics, ˙ x = f (x)+ g (x)u =[ f (x)+ g (x)] " 1 u # = F (x)u , assume the system dynamics F (x) to be a Gaussian Process vec(F (x)) ∼ GP (vec(M 0 (x)), K 0 (x, x )), (2) where vec(F (x)) is column-major vectorization of F (x). Design a safe controller with safe probability p k , min u k ∈U u k - π (x k ) s.t. P(safety at all times) p k (3) where the safety condition can be simply ˙ h(x k ) > 0 or a Control Barrier Condition [2], CBC(x, u) , ˙ h(x)+ αh(x) 0 with α> 0. Matrix Variate Gaussian Process We define Matrix Variate Gaussian Process MVGP (M(x), A, B(x, x )) [4, 3]. vec(F (x)) ∼ GP (vec(M 0 (x)), B 0 (x, x ) A) F (x) ∼ MVGP (M 0 (x), A, B 0 (x, x )) (4) Advantages to alternative approaches: Fewer parameters Only (1 + m) 2 + n 2 parameters when learning B 0 and A as compared to (1 + m) 2 n 2 parameters for K 0 . (m is control dimensions, n is state dimensions) Captures correlation across output dimensions As compared to learning a GP per dimension, we capture correlation across output dimensions without excessive computational cost: O ((1 + m) 3 k 2 )+ O (k 3 ) vs O ((1 + m)k 2 )+ O (k 3 ) where k is number of samples. Preserves structure across inference Inference with k data samples of { ˙ x i , x i , u i } k i=1 , leads to another MVGP, F k (x * ) ∼ MVGP (M k (x * ), A, B k (x * , x * )) (5) where M k and B k can be computed from the data samples. Stochastic Control Barrier Condition We consider the safety condition for system of relative degree r (defined as L g L r -1 f h(x) = 0, but L g L j f h(x) = 0 for all j = {0,...,r - 2}) as the exponential control barrier condition CBC (r ) defined as [1], CBC (r ) (x, u) := L (r ) f h(x)+ L g L (r -1) f h(x)u + k α η (x), where η (x) , (h(x); L f h(x); ... ; L (r -1) f h(x)) (6) We show that the mean and variance of CBC (r ) (x, u) are affine and quadratic in u. E[CBC (r ) ]= E[ F (x) ∇L (r -1) f h(x)] + E[[ k α η (x), 0 ] ] u (7) Var[CBC (r ) ]= u Var h ∇L (r -1) f h(x) F (x)+[ k α η (x), 0 ] i u (8) Hence the safety condition P(CBC (r ) (x k , u k ) ζ> 0) ˜ p k can be written as quadratic constraints using Cantelli’s inequality and the controller thus becomes, min u k ∈U u k - π (x k ) s.t. (E[CBC (r ) k ] - ζ ) 2 ˜ p k 1 - ˜ p k Var[CBC (r ) k ] E[CBC (r ) k ] - ζ 0 (9) While we derive closed form expression for E[ CBC (r ) ] and Var[ CBC (r ) ] for r = 1 and r = 2, in general for r 3 the mean and variance can be estimated using Monte Carlo estimators. Inter-triggering time safety analysis We assume sample trajectories from Gaussian Process dynamics are locally L k -Lipchitz with large probability q k , then we establish that P(CBC(x k , u k ) ζ> 0|x k , u k ) ˜ p k and τ k 1 L k ln(1 + L k ζ (χ k L k + L α·h ) ˙ x k ) = P(CBC(x(t), u k ) 0) p k p k q k t [ t k ,t k + τ k ) (10) Results Figure 1:Bottom row: Learned vs true pendulum dynamics using matrix variate Gaussian Process regression Experiment and Results We evaluate the proposed approach on a pendulum with mass m and length l with state x =[ θ, ω ] and control-affine dynamics f (x)=[ ω, - g l sin(θ )] and g (x) = [0, 1 ml ] as depicted in Fig 2. The control barrier function is chosen as h(x)= coscol ) - cos(θ - θ c ). 0 50 100 71 72 73 74 75 θ (degrees) 0 50 100 -0.4 -0.2 0.0 ω (rad/s) 0 50 100 -5 0 5 10 15 20 u Pendulum Figure 2:Top left: Pendulum simulation (left) with an unsafe (red) region. Top right: The pendulum trajectory (middle) resulting from the application of safe control inputs (right) is shown. Conclusion and Ongoing work More experiments (closer to the Motivation). What if QCQP is non-convex? Entropy objective to pick optimal actions for reducing uncertainity. References [1] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In 2019 18th European Control Conference (ECC), pages 3420–3431, June 2019. [2] Aaron D Ames, Xiangru Xu, Jessy W Grizzle, and Paulo Tabuada. Control barrier function based quadratic programs for safety critical systems. IEEE TAC, 62(8):3861–3876, 2016. [3] Shengyang Sun, Changyou Chen, and Lawrence Carin. Learning Structured Weight Uncertainty in Bayesian Neural Networks. In AISTATS, pages 1283–1292, 2017. [4] Christopher KI Williams and Carl Edward Rasmussen. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006. Acknowledgements We gratefully acknowledge support from NSF awards CNS-1446891 and ECCS- 1917177, and support from ARL DCIST CRA W911NF-17-2-0181. Contact Information Paper URL: https://arxiv.org/abs/1912.10116 Mohammad Javad Khojasteh [email protected] Vikas Dhiman [email protected]

Upload: others

Post on 03-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic safety constraints for learned high relative

Probabilistic safety constraints for learned high relative degreesystem dynamics

Mohammad Javad Khojasteh∗1 Vikas Dhiman∗2 Massimo Franceschetti2 Nikolay Atanasov2

2University of California San Diego, 1California Institute of Technology

Objective

We study the problem of enforcing probabilistic safety when sys-tem dynamics are unknown and being learned from the samples,

minuk

Task cost (xk,uk)s.t. P( Safety |xk,uk) ≥ 1− risk tolerance (1)

Contributions

Matrix Variate Gaussian ProcessWe derive inferenceequations for Matrix Variate Gaussian Processes thatpreserve structure.

Safe-controller for higher relative degree systemsWe deriveCantelli-inequality based safety bound for higherrelative degree systems and use it create QCQP basedsafe controller.

Inter-triggering time safety analysisWe derive conditions toensure safety between control computation times.

Notation

Symbol Meaningxk ∈ Rn System state at discrete time kx(t) System state at cont. time tu ∈ Rm Control signalu , (1; u)f (x) drift term of system dynamicsg(x) input gain term of system dynamicsF (x) , [f (x), g(x)]vec(M) Column-major vectorization of a matrix Mh(x) Control barrier function defining the safety

region as h(x) ≥ 0πε(x) Reference controller whose trajectory we want

to follow as close as possible

Problem formulation

For a control-affine system dynamics,

x = f (x) + g(x)u = [f (x) + g(x)][

1u

]= F (x)u,

assume the system dynamics F (x) to be a Gaussian Processvec(F (x)) ∼ GP(vec(M0(x)),K0(x,x′)), (2)

where vec(F (x)) is column-major vectorization of F (x). Designa safe controller with safe probability pk,

minuk∈U

‖uk − πε(xk)‖

s.t. P(safety at all times) ≥ pk (3)where the safety condition can be simply h(xk) > 0 or a ControlBarrier Condition [2], CBC(x,u) , h(x) + αh(x) ≥ 0 withα > 0.

Matrix Variate Gaussian Process

We define Matrix Variate Gaussian ProcessMVGP(M(x),A,B(x,x′)) [4, 3].

vec(F (x)) ∼ GP(vec(M0(x)),B0(x,x′)⊗A)⇔ F (x) ∼MVGP(M0(x),A,B0(x,x′)) (4)

Advantages to alternative approaches:Fewer parameters Only (1 + m)2 + n2 parameters when learning

B0 and A as compared to (1 + m)2n2 parameters for K0.(m is control dimensions, n is state dimensions)

Captures correlation across output dimensions As compared tolearning a GP per dimension, we capture correlation acrossoutput dimensions without excessive computational cost:O((1 + m)3k2) + O(k3) vs O((1 + m)k2) + O(k3) where kis number of samples.

Preserves structure across inference Inference with k datasamples of {xi,xi,ui}ki=1, leads to another MVGP,

Fk(x∗) ∼MVGP(Mk(x∗),A,Bk(x∗,x∗)) (5)where Mk and Bk can be computed from the data samples.

Stochastic Control Barrier Condition

We consider the safety condition for system of relative degreer (defined as LgLr−1

f h(x) 6= 0, but LgLjfh(x) = 0 for all j ={0, . . . , r−2}) as the exponential control barrier condition CBC(r)

defined as [1],CBC(r)(x,u) := L(r)

f h(x) + LgL(r−1)f h(x)u + k>αη(x),

where η(x) , (h(x);Lfh(x); . . . ;L(r−1)f h(x)) (6)

We show that the mean and variance of CBC(r)(x,u) are affineand quadratic in u.

E[CBC(r)] =(E[F (x)>∇L(r−1)

f h(x)] + E[[k>αη(x),0>]>])>

u(7)

Var[CBC(r)] = u>Var[∇L(r−1)

f h(x)>F (x) + [k>αη(x),0>]]

u(8)

Hence the safety condition P(CBC(r)(xk,uk) ≥ ζ > 0) ≥ pk canbe written as quadratic constraints using Cantelli’s inequalityand the controller thus becomes,

minuk∈U‖uk − πε(xk)‖

s.t. (E[CBC(r)k ]− ζ)2 ≥ pk

1− pkVar[CBC(r)

k ]

E[CBC(r)k ]− ζ ≥ 0 (9)

While we derive closed form expression for E[CBC(r)] andVar[CBC(r)] for r = 1 and r = 2, in general for r ≥ 3 the meanand variance can be estimated using Monte Carlo estimators.

Inter-triggering time safety analysis

We assume sample trajectories from Gaussian Process dynamicsare locally Lk-Lipchitz with large probability qk, then we establishthatP(CBC(xk,uk) ≥ ζ > 0|xk,uk) ≥ pk

and τk ≤1Lk

ln(1 + Lkζ

(χkLk + Lα·h)‖xk‖)

=⇒ P(CBC(x(t),uk) ≥ 0) ≥ pk = pkqk ∀t ∈ [tk, tk + τk)(10)

Results

Figure 1:Bottom row: Learned vs true pendulum dynamics using matrix variate Gaussian Process regression

Experiment and Results

We evaluate the proposed approach on a pendulum with massm and length l with state x = [θ, ω] and control-affine dynamicsf (x) = [ω,−g

l sin(θ)] and g(x) = [0, 1ml] as depicted in Fig 2. The

control barrier function is chosen as h(x) = cos(∆col)−cos(θ−θc).

0 50 100

71

72

73

74

75

θ(d

egre

es)

0 50 100

−0.4

−0.2

0.0

ω(r

ad/s

)

0 50 100−5

0

5

10

15

20

u

Pendulum

Figure 2:Top left: Pendulum simulation (left) with an unsafe (red) region.Top right: The pendulum trajectory (middle) resulting from the applicationof safe control inputs (right) is shown.

Conclusion and Ongoing work

•More experiments (closer to the Motivation).•What if QCQP is non-convex?•Entropy objective to pick optimal actions for reducinguncertainity.

References

[1] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, andP. Tabuada.Control barrier functions: Theory and applications.In 2019 18th European Control Conference (ECC), pages 3420–3431,June 2019.

[2] Aaron D Ames, Xiangru Xu, Jessy W Grizzle, and Paulo Tabuada.Control barrier function based quadratic programs for safety criticalsystems.IEEE TAC, 62(8):3861–3876, 2016.

[3] Shengyang Sun, Changyou Chen, and Lawrence Carin.Learning Structured Weight Uncertainty in Bayesian Neural Networks.In AISTATS, pages 1283–1292, 2017.

[4] Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learning, volume 2.MIT press Cambridge, MA, 2006.

Acknowledgements

We gratefully acknowledge support from NSF awards CNS-1446891 and ECCS-1917177, and support from ARL DCIST CRA W911NF-17-2-0181.

Contact Information

Paper URL: https://arxiv.org/abs/1912.10116

Mohammad Javad Khojasteh [email protected] Dhiman [email protected]