neural ordinary differential equations › ~rtqichen › posters › neural_ode_poster.pdf · n) h...

Neural Ordinary Differential EquationsRicky T. Q. Chen∗, Yulia Rubanova∗, Jesse Bettencourt∗, David Duvenaud

∗Equal Contribution University of Toronto, Vector Institute

Contributions

Black-box ODE solvers as a differentiable modeling component.

• Continuous-time recurrent neural nets and continuous-depth feedforward nets.

• Adaptive computation with explicit control over tradeoff between speed andnumerical precision.

• ODE-based change of variables for automatically-invertible normalizing flows.

• Open-sourced ODE solvers with O(1)-memory backprop:https://github.com/rtqichen/torchdiffeq

ODE Solvers: How Do They Work?

• z(t) changes in time, defines an infinite set of trajectories.

• Define a differential equation: dzdt = f (z(t), t, θ).

• Initial-value problem: given z(t0), find z(t1) = z(t0) +∫ t1t0f (z(t), t, θ).

• Approximate solution with discrete steps, e.g. z(t + h) = z(t) + hf (z, t).

• Higher-order solvers are more accurate and use larger step sizes.

• Can adapt step size h given error tolerance level.

Continuous version of ResNets

ODE-Net replaces ResNet blocks with ODESolve(f , z(t0), t0, t1, θ), where f is aneural net with parameters θ.

z(t1) = z(t0) +

∫ t1

f (z(t), t, θ)dt = ODESolve(z(t0), f , t0, t1, θ)

Residual Network ODE-Net

Output

ht+1 = ht + f (ht, θt)dh(t)dt = f (h(t), t, θ)

ResNet ODE-Netdef resnet(x, θ):

h1 = x + NeuralNet(x, θ[0])

h2 = h1 + NeuralNet(h1, θ[1])

return h4

def f(z, t, θ):

return NeuralNet ([z, t], θ)

def ODEnet(x, θ):

return ODESolve(f, x, 0, 1, θ)

‘Depth’ is automatically chosen by an adaptive ODE solver.

Computing gradient for ODE solutions

O(1) memory cost when training.Don’t store activations, follow dynamics in reverse.No backpropagation through the ODE solver – compute the gradient throughanother call to ODESolve.

Adjoint StateState

Define adjoint state:a(t) = −∂L/∂z(t)

Adjoint state dynamics:a(t)dt = −a(t)∂f (z(t),t,θ)

Solve ODE backwards in time:dLdθ =

∫ t1t0a(t)T ∂f (z(t),t,θ)

∂θ dt

def f_and_a ([z,a,grad], t):

return [f, -a*df/da , -a*df/d theta]

[z0 , dL/dx , dL/d theta] =

ODESolve(f_and_a , [z(t1), dL/dz(t), 0], t0, t1)

ODE Nets for Supervised Learning

Adaptive computation: can adjust speed vs precision.We can specify the error tolerance of ODE solution:

ODESolve(fθ, z(t0), t0, t1, θ, rtol , atol)nu

Number of evaluationsEr

# forward evalser

# forward evals

0 25 50 75 100(d) Training Epoch

training epoch

Performance on MNIST

Test Error # Params Memory Time

1-Layer MLP 1.60% 0.24 M - -ResNet 0.41% 0.60 M O(L) O(L)

RK-Net 0.47% 0.22 M O(L̃) O(L̃)

ODE-Net 0.42% 0.22 M O(1) O(L̃)

Instantaneous Change of Variables

Change of variables theorem to compute exact changes in probability of samplestransformed through bijective F :

z1 = z0 + f (z0) =⇒ log p(z1)− log p(z0) = − log

∣∣∣∣det∂F

∣∣∣∣Requires invertible F . Cost O(D3).

Theorem: Assuming that f is uniformly Lipschitz continuous in z and continuousin t, then:

dt= f (z(t), t) =⇒ ∂ log p(z(t))

∂t= −tr

)Function f does not have to be invertible. Cost O(D2).

Continuous Normalizing Flows (CNF)

Automatically-invertible Normalizing Flows.Planar CNF is smooth and much easier to train than planar NF.

Planar normalizing flow Continuous analog of planar flow(Rezende and Mohamed, 2015)

z(t + 1) = z(t) + uh(wTz(t) + b) dz(t)dt = uh(wTz(t) + b)

log p(z(t + 1)) = log p(z(t)) ∂ log p(z(t))∂t = −uT ∂h

∂z(t)

− log∣∣1 + uT∂h

∣∣

5% 20% 40% 60% 80% 100%

NF Target

(a) Two Circles

5% 20% 40% 60% 80% 100%

NF Target

(b) Two Moons

Figure: Visualizing the transformation from noise to data. Continuous normalizing flows areefficiently reversible, so we can train on a density estimation task and still be able to sample from thelearned density efficiently.

Samples

Figure: Have since scaled CNFs to images using Hutchinson’s estimator (Grathwohl et al. 2018).

Continuous-time Generative Model for Time Series

Time series with irregular observation times.No discretization of the timeline is needed.

zt0zt1

RNN encoder

Latent spaceData space

q(zt0 |xt0 ...xtN)

ht0 ht1 htN

ODE Solve(zt0 , f, ✓f , t0, ..., tM )

…ztN

Observed Unobserved

t0 t1 tN

tN+1 tM

Prediction Extrapolation

t0 t1 tN tN+1 tM

x̂(t)

Ground TruthObservationPredictionExtrapolation

Recurrent Neural Network Latent ODE Latent space

Figure: Latent ODE learns smooth latent dynamics from noisy observations.

Prior Works on ODE+DL

LeCun. ”A theoretical framework for back-propagation.” (1988)Pearlmutter. ”Gradient calculations for dynamic recurrent neural networks: a survey.” (1993)Haber & Ruthotto. ”Stable Architectures for Deep Neural Networks.” (2017)Chang et al. ”Multi-level Residual Networks from Dynamical Systems View.” (2018)

neural ordinary differential equations › ~rtqichen › posters › neural_ode_poster.pdf · n) h...

Documents

copy of 5 simpleschreibübungenlenses...i c h h a b e z u v...

z i n t r oeed u c i nddg t h e --n i n t a l l e n &eh e a...

the royal society of chemistry · 2017. 3. 25. · 2 (1)...

t a n z a n i a

d n z f go t h i s -...

kathy mackey web design a t w i t t z e n d

movie script - a n t z

nucleon: t = ½, m t = ½. for a nucleus , by...

workshop for the update of the pacific northwest portion...

review robust dynamical decoupling · 2014. 7. 11. ·...

how to teach special relativity - danielhoek.com · how to...

p r z y s t a w k i - rabu sushi · 2020. 7. 1. · p r z y...

s o u t h z o n ec e n t r a l z o n en o r t h z o n e ·...

si l e n z o c a n t a t o r e

t f o n z - wordpress.com

yz { | } | ~ z · 2017. 1. 3. · t z z z t z n w m w w n w...

0£#ì Ñ)f2°)z - nagoya...t ` ) «)z x&Å' )z r ¥*( p5...

w rt z 'nt)$- ihn ,r o t

С m t s y q }...

t n ’ a } z tnai bulletin