![Page 1: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/1.jpg)
Probabilistic ODE Solverswith Runge-Kutta Means
Michael Schober∗, David Duvenaud‡, Philipp Hennig∗
∗Research Group Elementary IntelligenceDepartment of Empirical InferenceMax Planck Institute for Intelligent SystemsTübingen, Germany
‡Computational and Biological Learning LabDepartment of EngineeringCambridge University
![Page 2: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/2.jpg)
Can we assign a probability distributionover the solution to
an ordinary differential equation(initial value problem)?
x(t0) = x0 x′(t) = f(x(t), t)
1 ,
![Page 3: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/3.jpg)
The Probabilistic View on Computationcomputing as the collection of information [Poincaré, 1896, Diaconis, 1988, O’Hagan, 1992]
A numerical methodestimates a function’s latent property
given the result of computations.
quadrature estimates ∫ ba f(x)dx given {f(xi)}linear algebra estimates x s.t. Ax = b given {As = y}
optimization estimates x s.t. ∇f(x) = 0 given {∇f(xi)}analysis estimates x(t) s.t. x′ = f(x, t), given {f(xi, ti)}
▸ computations yield “data” / “observations”▸ non-analytic quantities are “latent”▸ even deterministic quantities can be uncertain.
2 ,
![Page 4: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/4.jpg)
Numerical Methods and Statistical Estimatorsseveral classic numerical algorithms identified precisely as maximum a-posteriori estimators
quadrature [Diaconis, 1988, O’Hagan, 1991]Gaussian quadrature Gaussian process regression
linear algebra [Hennig, 2015]conjugate gradients Gaussian conditioning
nonlinear optimization [Hennig & Kiefel, 2013]
BFGS autoregressive filtering
ordinary differential equations [Schober et al., 2014]Runge-Kutta Gauss-Markov extrapolation
3 ,
![Page 5: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/5.jpg)
Runge-Kutta methodsare linear extrapolators of high convergence order [Hairer et al., 1987]
t0
t
x(t)
0
c1
c2
h
1
1 w11
1 w21 w22
1 b1 b2 b3
Y1 = f(1x0, t0 + 0)Y2 = f (1x0 +w11Y1, t0 + c1)Ys+1 = f (1x0 +∑s
i wsiYi, t0 + cs)x̂(t0 + h) = 1x0 +∑i biYi
4 ,
![Page 6: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/6.jpg)
Runge-Kutta methodsare linear extrapolators of high convergence order [Hairer et al., 1987]
t0 t0 + c1 t0 + c2 t0 + ht
x(t)
0
c1
c2
h
1
1 w11
1 w21 w22
1 b1 b2 b3
Y1 = f(1x0, t0 + 0)Y2 = f (1x0 +w11Y1, t0 + c1)Ys+1 = f (1x0 +∑s
i wsiYi, t0 + cs)x̂(t0 + h) = 1x0 +∑i biYi
4 ,
![Page 7: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/7.jpg)
Runge-Kutta methodsare linear extrapolators of high convergence order [Hairer et al., 1987]
t0 t0 + c1 t0 + c2 t0 + ht
x(t)
0
c1
c2
h
1
1 w11
1 w21 w22
1 b1 b2 b3
Y1 = f(1x0, t0 + 0)Y2 = f (1x0 +w11Y1, t0 + c1)Ys+1 = f (1x0 +∑s
i wsiYi, t0 + cs)x̂(t0 + h) = 1x0 +∑i biYi
4 ,
![Page 8: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/8.jpg)
Runge-Kutta methodsare linear extrapolators of high convergence order [Hairer et al., 1987]
t0 t0 + c1 t0 + c2 t0 + ht
x(t)
0
c1
c2
h
1
1 w11
1 w21 w22
1 b1 b2 b3
Y1 = f(1x0, t0 + 0)Y2 = f (1x0 +w11Y1, t0 + c1)Ys+1 = f (1x0 +∑s
i wsiYi, t0 + cs)x̂(t0 + h) = 1x0 +∑i biYi
4 ,
![Page 9: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/9.jpg)
Gaussian process solversare also linear extrapolators
▸ Linear extrapolation suggests Gaussian process model▸ Gaussian process solvers previously studied
[Skilling (1991), Chrekbtii et al. (2014), Hennig & Hauberg (2014)]
5 ,
![Page 10: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/10.jpg)
Some properties of Gaussian measuresThe only two equations you really need (in this group)
▸ closure under affine transformations (x ∈ RN ,y ∈ RM )
p(x) ∼N (m,P ), p(y∣x) ∼ N (Hx + ν,R)⇒ p([x
y]) ∼N ([ m
Hm + ν] , [ P PH⊺HP HPH⊺ +R])
▸ inference involves only linear algebra operations
p([xy]) ∼N ([m1
m2] , [P 1 C
C⊺ P 2])
p(x ∣y) ∼N (m1 +CP −12 (y −m2),P 1 −CP −1
2 C⊺)
⇒ sequential Gaussian inference at linear cost (‘filtering’)
6 ,
![Page 11: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/11.jpg)
Gaussian process solversimplicitly define a Butcher tableau
t0 t0 + c1 t0 + c2 t0 + ht
x(t)
0
c1
c2
h
1
1 w11
1 w21 w22
1 b1 b2 b3
y1 = f (µ ∣x0(t0 + 0), t0 + 0)
y2 = f (µ ∣x0,y1(t0 + c1), t0 + c1)ys+1 = f (µ ∣x0,yi(t0 + cs), t0 + cs)
x̂(t0 + h) = µ ∣x0,yi(t0 + h)
µ ∣x0(t0) ∶= [k(t0, t0)] [k(t0, t0)]−1 (x0)
= 1x0 7 ,
![Page 12: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/12.jpg)
Gaussian process solversimplicitly define a Butcher tableau
t0 t0 + c1 t0 + c2 t0 + ht
x(t)
0
c1
c2
h
1
1 w11
1 w21 w22
1 b1 b2 b3
y1 = f (µ ∣x0(t0 + 0), t0 + 0)
y2 = f (µ ∣x0,y1(t0 + c1), t0 + c1)
ys+1 = f (µ ∣x0,yi(t0 + cs), t0 + cs)x̂(t0 + h) = µ ∣x0,yi(t0 + h)
µ ∣x0,y1(t0 + c1) ∶= [k(t0 + c1, t0) k∂(t0 + c1, t0)] [ k(t0, t0) k∂(t0, t0)
k∂ (t0, t0) k∂ ∂(t0, t0)]−1 (x0
y1)
= w10x0 +w11y1 7 ,
![Page 13: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/13.jpg)
Gaussian process solversimplicitly define a Butcher tableau
t0 t0 + c1 t0 + c2 t0 + ht
x(t)
0
c1
c2
h
1
1 w11
1 w21 w22
1 b1 b2 b3
y1 = f (µ ∣x0(t0 + 0), t0 + 0)
y2 = f (µ ∣x0,y1(t0 + c1), t0 + c1)ys+1 = f (µ ∣x0,yi(t0 + cs), t0 + cs)
x̂(t0 + h) = µ ∣x0,yi(t0 + h)
µ ∣x0,yi(t0 + cs) ∶= [k(t0 + cs, t0) k∂(t0 + cs, t0 + ci)]K−1 (x0
yi)
= w20x0 +∑si=1w2iyi 7 ,
![Page 14: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/14.jpg)
Gaussian process solversimplicitly define a Butcher tableau
t0 t0 + c1 t0 + c2 t0 + ht
x(t)
0
c1
c2
h
1
1 w11
1 w21 w22
1 b1 b2 b3
y1 = f (µ ∣x0(t0 + 0), t0 + 0)
y2 = f (µ ∣x0,y1(t0 + c1), t0 + c1)ys+1 = f (µ ∣x0,yi(t0 + cs), t0 + cs)
x̂(t0 + h) = µ ∣x0,yi(t0 + h)µ ∣x0,yi
(t0 + h) ∶= [k(t0 + h, t0) k∂(t0 + h, t0 + ci)]K−1 (x0
yi)
= b0x0 +∑si=1 biyi 7 ,
![Page 15: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/15.jpg)
Gauss-Markov-Runge-Kutta methodsa GP solver whose mean matches RK exactly
▸ RK choose (c,w, b) such that ∥x̂(t0 + h) − x(t0 + h)∥ = O(hp)▸ polynomial form suggests integrated Wiener (polynomial spline)
process
p(x(t)) = GP(x(t); 0, ks(t, t′)) where
ks(t, t′) =[ t
τ[ t′
τmin(t̃, t̃′)dt̃ dt̃′
▸ τ _−∞: improper prior p(x(t)), proper posterior after sobservations.
▸ kth-times integrated Wiener process gives k-order RK solver!▸ Inherets RK guarantees. Gives closed-form solution for tableau (used
to use numerical search!)▸ a Markov (state-space) model, so inference is O(s) (as opposed to
usual O(s3) cost
8 ,
![Page 16: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/16.jpg)
Calibrating Uncertaintywithin the parametrized class
▸ posterior mean µ ∣y = kK−1y invariant under k_ θ2k
▸ posterior covariance k ∣y = k − kK−1k scaled by θ2
▸ initial ideas for uncertainty calibration in paper (more to come)
9 ,
![Page 17: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/17.jpg)
Multi-Step Extension[A. Nordsieck, 1962]
Naïve chaining Smoothing Probabilistic continuation
0.20.40.60.81
x
t0 +⋯ h 2h 3h 4h
0
2
4 ⋅10−2
t
x(t)−
f(t)
t0 +⋯ h 2h 3h 4h
⋅10−2
t
t0 +⋯ h 2h 3h 4h
⋅10−2
t
▸ probabilistic interpretation questions RK beyond s steps▸ ‘obvious’ solution is to continue filtering process▸ result very similar, though not identical, to multi-step methods
10 ,
![Page 18: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/18.jpg)
Some Conceptual Open Questionsprecise interpretation of posterior measure still evolving
How precise can the connection to multi-step methods be?▸ order / stability conditions currently not fully understood▸ flexibility is also a design criterion▸ what about stiff problems?
What, precisely, does the posterior mean?▸ width of Gaussian posterior should be inferred from regularity of
‘observed’ gradients. How, precisely, should this be done? (We haveone particular solution)
▸ is the Gaussian family enough? How expensive is it to move beyondGauss?
11 ,
![Page 19: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/19.jpg)
What we’ve done so far:▸ Numerical methods can be interpreted as performing statistical
inference from noise-free data▸ in some cases, e.g. Runge-Kutta, this link can be made precise▸ Inherets convergence guarantees, but also get extensibility &
uncertainty estimates
What we’re working on next:▸ understand the connection to multi-step methods▸ construct a robust probabilistic IVP solver▸ Continue finding model-based interpretations of numerical solvers.
12 ,
![Page 20: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdf · Probabilistic ODE Solvers with Runge-Kutta Means ... Gives closed-form solution for tableau](https://reader033.vdocuments.us/reader033/viewer/2022041521/5e2ec9d38b7df6291b5a569c/html5/thumbnails/20.jpg)
Bibliography
P. Diaconis. Bayesian numerical analysis. Statistical decision theory and related topics, IV(1):163–175,1988.
E. Hairer, S.P. Nørsett, and G. Wanner. Solving Ordinary Differential Equations I – Nonstiff Problems.Springer, 1987.
S. Hauberg, M. Schober, M. Liptrot, P. Hennig, and A. Feragen. A random riemannian metric forprobabilistic shortest-path tractography. In Medical Image Computing and Computer AssistedIntervention–MICCAI 2015. Springer, 2015.
P. Hennig. Probabilistic interpretation of linear solvers. SIAM J on Optimization, 25(1):210–233, 2015.
P. Hennig and M. Kiefel. Quasi-Newton Methods – a new direction. Journal of Machine LearningResearch, 14:834–865, March 2013.
A. O’Hagan. Bayes–Hermite quadrature. J of Statistical Planning and Inference, 29(3):245–260, 1991.
A. O’Hagan. Some Bayesian Numerical Analysis. Bayesian Statistics, 4:345–363, 1992.
H. Poincaré. Calcul des probabilités. Gauthier-Villars, Paris, 1896.
S. Särkkä. Recursive Bayesian Inference on Stochastic Differential Equations. PhD thesis, HelsinkiUniversity of Technology, 2006.
M. Schober, D. Duvenaud, and P. Hennig. Probabilistic ODE Solvers with Runge-Kutta Means.Advances in Neural Information Processing Systems (NIPS), 2014.
M. Schober, N. Kasenburg, A. Feragen, P. Hennig, and S. Hauberg. Probabilistic shortest pathtractography in DTI using Gaussian Process ODE solvers. In Medical Image Computing andComputer-Assisted Intervention–MICCAI 2014. Springer, 2014.
13 ,