particle filters for data assimilationdcrisan/talkparis05to07nov2019.pdfbain, d crisan, fundamentals...

Particle Filters for Data Assimilation

Dan Crisan

Imperial College London

Course III: Big data, data assimilation, and uncertainty quantificationThe Mathematics of Climate and the Environment

IHP, Paris, September 9 - December 21 2019

Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 1 / 88

. Syllabus

Lectures:

1. Data assimilation as a stochastic filtering problem. Framework. Thesignal and the observation process. Prior/Posterior Distribution. Thelinear/Kalman filter (Tuesday, November 5, 10:30am-12:30pm)

2. Solving data assimilation problems using particle filters. Mathematicaland methodological considerations. The standard particle filter. ModelReduction. Tempering. Jittering. Nudging. (Wednesday, November 6,10:30am-12:30pm)

3. Two case studies: 2D Euler and 2D Two layer Quasigeostrophic model.Algorithmic considerations: Choice of initial condition, number ofparticles, assimilation step, reduction parameter. Forecast reliability.(Thursday, November 7, 10:30am-12:30pm)

Material prepared in collaboration with Oana Lang, Wei Pan, Igor Shevchenko.


. Syllabus

Lecture I


. List of Contents

Lecture I

What is Data Assimilation (DA)? DA as a Stochastic Filtering Problem.

The Stochastic Filtering Problem

Framework (Signal/Observation) and Notation

Recursion Formula

The Linear Filter

Continuity with respect to the corresponding state space model

Final remarks


. What is DA ?

What is Data Assimilation ?

set of methodologies that combines past knowledge of a system in theform of a numerical model with new information about that system in theform of observations of that system.

designed to improve forecasting, reduce model uncertainties and adjustmodel parameters.

term used mainly in the computational geoscience community

major component of Numerical Weather Prediction

Variational DA: combines the model and the datathrough the optimisation of a given criterion(minimisation of a so-called cost-function).

Sequential DA: uses a set of modeltrajectories/possible scenarios that areintermittently updated according to data and areused to infer the past, current or future position ofa system.

Hurricane Irma forecast: a. ECMWF, b. USA Global Forecast


. What is stochastic filtering ?

DA as a Stochastic Filtering Problem

Stochastic Filtering: The process of using partial observations and astochastic model to make inferences about an evolving dynamical system.


The Filtering Problem Framework: discrete/continuous time

X the signal process - “hidden component”

Y the observation process - “the data”

The filtering problem : Find the conditional distribution of the signal Xt givenYt = σ(Ys, s ∈ [0, t ]), i.e.,

πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

Discrete framework:

{Xt}t≥0 Markov chain P (Xt ∈ dxt |Xt−1 = xt−1) = ft(xt |xt−1)dt ,

{Xt , Yt}t≥0 P (Yt ∈ dy |Xt = xt) = gt(y |xt)dy

Continuous framework:

dXt = f (Xt)dt + σ(Xt)dVt ,

dYt = h(Xt)dt + dWt .


The Filtering Problem Framework

The filtering problem : Find the conditional distribution of the signal Xt givenYt = σ(Ys, s ∈ [0, t ]), i.e.,

πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

Discrete framework: {Xt , Yt}t≥0 Markov process

The signal process

• {Xt}t≥0 Markov chain, X0 ∼ π0 (dx0)

• P (Xt ∈ dxt |Xt−1 = xt−1) = Kt (xt−1, dxt) = ft(xt |xt−1)dxt ,

• Example: Xt = b (Xt−1) + σ (Xt−1) Bt , Bt ∼ N (0, 1) i.i.d.

The observation process

• P(Yt ∈ dyt |X[0,t] = x[0,t], Y[0,t−1] = y[0,t−1]

)= P (Yt ∈ dyt |Xt = xt) = gt(yt |xt)dyt

• Example: Yt = h (Xt) + Vt , Vt ∼ N (0, 1) i.i.d.

where X[0,t] , (X0, ..., Xt) , x[0,t] , (x0, ..., xt) .

◦ A. Bain, D Crisan, Fundamentals of Stochastic Filtering, Springer, 2009.


The Filtering Problem The signal

Let the signal X = {Xt , t ∈ N} be a stochastic process defined on theprobability space (Ω,F ,P) with values in Rd . Let FXt be the filtrationgenerated by the process; that is,

FXt , σ(Xs, s ∈ [0, t ]).

We assume that X is a Markov chain. That is, for all t ∈ N and A ∈ B(Rd ),

P(

Xt ∈ A | FXt−1)

= P (Xt ∈ A | Xt−1) . (1)

The transition kernel of the Markov chain X is the function Kt(∙, ∙) defined onRd × B(Rd ) such that, for all t ∈ N and x ∈ Rd ,

Kt(x , A) = P(Xt ∈ A | Xt−1 = x). (2)

The transition kernel Kt is required to have the following properties.

i. Kt(x , ∙) is a probability measure on (Rd ,B(Rd )), for all t ∈ N and x ∈ Rd .

ii. Kt(∙, A) ∈ B(Rd ), for all t ∈ N and A ∈ B(Rd ).


The Filtering Problem Notation

Notation:

• posterior measure: the conditional distribution of the signal Xt given Yt

πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

• predictive measure: the conditional distribution of the signal Xt given Yt−1

pt (A) = P(Xt ∈ A|Yt−1), t ≥ 0, A ∈ B(Rd ).

• prior distribution: the distribution of the signal Xt

qt(A) , P(Xt ∈ A), t ≥ 0, A ∈ B(Rd ).

• If μ is a measure and f is a function, then μ (f ) ,∫

f (x)μ (dx) .• If f is a function and K is a kernel, then Kf (x) ,

∫f (y)K (x , dy) .

• If μ is a measure and k is a kernel, then Kμ (A) ,∫

μ (dx) K (x , A) .


The Filtering Problem Notation

Proposition

The prior distribution qt satisfies the formula

qt = Kt . . . K2K1q0, t > 0.

qt(f ) = q0(ft), t > 0,

where ft = K0K1 . . . Kt f .

Proof: Use induction and the recurrence formula qt = Ktqt−1, t > 0.

Definition

Let μ be a measure and ϕ be a non-negative function such that μ(ϕ) > 0. Theprojective product ϕ ∗ μ is the measure defined by

ϕ ∗ μ(A) ,

∫

Aϕ(x)μ(dx)

μ(ϕ)


The Filtering Problem Bayes’ recursion formula

Theorem (Bayes’ recursion formula)

The posterior distribution satisfies the following recursion formula

Prediction pt = Ktπt−1Updating πt = gt ∗ pt

(3)

In other words, dπtdpt = C−1t gt , where Ct ,

∫Rd gt (yt , xt) pt (dxt).

πt−1Kt−−−−−→

modelforecast

prediction

Ktπt−1 =: ptnon-linear : gt∗−−−−−−−−−→

assimilationanalysisupdate

gt ∗ pt= πt



Proof.

Step 1. Ktπt−1 = pt

pt f = E [f (Xt) | Y0:t−1]

= E[E[f (Xt) | FXt−1 ∨ σ(W0:t−1)

]| σ(Y0:t−1)

](tower property)

= E[E[f (Xt) | FXt−1

]| σ(Y0:t−1)

](W0:t−1 is independent of X0:t )

= E [Kt f (Xt−1) | σ(Y0:t−1)] (Markov property of X )

= πt−1(Kt f ),

for any f ∈ B(Rd ), which implies that pt = Ktπt−1 .



Step 2. gt ∗ pt = πtIdea: For any A ∈ B(Rd )

∫

C0:t

πt(A)PY0:t (dy0:t)

︸︷︷︸P({Xt ∈ A} ∩ {Y0:t ∈ C0:t})

=

∫

C0:t

gytt ∗ pt(A)PY0:t (dy0:t). (4)

Step 2.1. Show that PY0:t (dy0:t) = pt(gytt)

dytPY0:t−1(dy0:t−1) :

PY0:t (C0:t) =P({Yt ∈ Ct} ∩ {Xt ∈ Rd} ∩ {Y0:t−1 ∈ C0:t−1}

)

=

∫

Rd×C0:t−1

P (Yt ∈ Ct | Xt = xt , Y0:t−1 = y0:t−1)︸︷︷︸∫

Ctg

ytt (xt )dyt

PXt ,Y0:t−1 (dxt , dy0:t−1))︸︷︷︸pt (dxt )PY0:t−1 (dy0:t−1)

=

∫

Rd×C0:t−1

∫

Ctgytt (xt) dyt p

y0:t−1t (dxt)PY0:t−1(dy0:t−1)

=

∫

C0:t

∫

Rdgytt (xt)p

y0:t−1t (dxt)PY0:t−1(dy0:t−1) dyt .



Step 2.2. Use the definition of the projective product to write:∫

C0:t

gytt ∗ pt(A)PY0:t (dy0:t)

=

∫

C0:t

∫A g

ytt (xt)pt (dxt)

pt(gytt) PY0:t (dy0:t)

=

∫

C0:t

∫

Agytt (xt)pt(dxt) dytPY0:t−1(dy0:t−1)

=

∫

A×C0:t−1

(∫

Ctgytt (xt)dyt

)

︸︷︷︸P (Yt ∈ Ct | Xt = xt , Y0:t−1 = y0:t−1)

pt(dxt)PY0:t−1(dy0:t−1)︸︷︷︸PXt ,Y0:t−1 (dxt , dy0:t−1)

=

∫

A×C0:t−1

P (Yt ∈ Ct | Xt = xt , Y0:t−1 = y0:t−1) × PXt ,Y0:t−1 (dxt , dy0:t−1)

= P ({Xt ∈ A} ∩ {Y0:t ∈ C0:t}).


The Filtering Problem The Linear Filter

Consider the following dynamical system in discrete time:

Xt+1 = FtXt + ft + Wt , X0 = ξ, Xt ∈ Rn,

Yt = HtXt + ht + Bt , Yt ∈ Rm,(5)

where

Ft , Ht are matrices with appropriate sizes,

ft is a sequence of vectors in Rn,

ht is a sequence of vectors in Rm,

ξ is a Gaussian random variable with mean x0 and covariance matrix P0.

Wt , Bt are Gaussian random variables with mean 0 and covariancematrices Qt , Rt .

The random variables ξ, Bt , Wt are mutually independent.

Aim : ComputeX̂N = E[XN | YN ]

where YN = σ(Y0, Y1, . . . , YN−1).



Define the linear estimate

FS := X̄N +N−1∑

t=0

St(Yt − Ȳt)

where Si are matrices from L(Rm,Rn) which define the filter F . Note that

X̄t+1 = Ft X̄t + ft , t = 0, 1, . . . , N − 1

Ȳt+1 = HtX̄t + ht , t = 0, 1, . . . , N − 1.

The best linear filter is obtained by choosing S such that it minimizes thefunctional

LS = E[(XN −FS)

T (XN −FS)].

We can write

LS = trΛNN +N−1∑

t=0

(trRtSTt St − 2trΛtNStHt

)+

N−1∑

t,l=0

trΛltHTt STt SlHl

whereΛtl := E

[(Xt − X̄t

)T (Xl − X̄l

)]∈ L(Rn,Rn)

is the correlation matrix of the process Xt .Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 17 / 88


Proposition

There exists a unique S which minimizes the functional LS.

Proof. LS is a quadratic form with

N−1∑

t,l=0

trΛltHTt STt SlHl ≥ 0

andN−1∑

t=0

trRtSTt St ≥ αN−1∑

t=0

n∑

h=1

m∑

i=1

(St,hi)2 = αN−1∑

t=0

trSTt St = α‖S‖2.

Therefore LS has a unique minimizer Ŝ, defined by

N−1∑

t=0

(tr(

ŜtStRt + RtŜTt St)− 2trΛtNStHt

)

+N−1∑

t,l=0

tr(ΛltHTt Ŝ

Tt SlHl + H

Tl Ŝ

Tl StHtΛtl

)= 0, ∀S.



Proposition

Prove that

E [XN |YN ] = X̂N=FŜ = X̄N +N−1∑

t=0

Ŝt(Yt − Ȳt)

and that πt is Gaussian

Proof. Let �N := XN −FŜ. Using the definition of LS one has

∑

t

E[�TNStYt + Y

Tt S

Tt �N

]= 0, ∀S0, S1, . . . , SN−1.

Since S0, S1, . . . , SN−1 are arbitrary, it follows that �TN and Y0, Y1, . . . , YN−1 areuncorrelated and therefore also independent. We use here the fact that�TN , Y0, Y1, . . . , YN−1 are mutually Gaussian (crucial !).



Theorem

The solution X̂t = E [Xt |Yt ] satisfies the recurrence formula

X̂N+1 = FNX̂ ∗N + fN

X̂ ∗N = X̂N + PNHTN

(RN + HNPNHTN

)−1 (YN − HNX̂N − hN

)

PN+1 = QN + FNP∗NFTN

P∗N = PN − PNHTN

(HNPNHTN + RN

)−1HNPN

X̂0 = X0,

(6)

Proof.

Step 1. Define the innovation process

IN := YN − (HNX̂N + hN)

IN is Gaussian, independent of YN with mean 0 and covariance given byδtl(Rt + HtPtHTt

).



Step 2. Using IN and the previous proposition one has

X̂ ∗N = E [XN | Y0, . . . , YN−1, IN ]

X̂ ∗N = X̂N + KNIN

where KN is a gain factor which minimizes the covariance of the error andneeds to be determined.

Step 3. Find the optimal value of KN :

3.1. The covariance is given by

P∗N = PN + KN(

HNPNHTN + RN)

K TN

− KN E[IN�TN

]

︸︷︷︸HN PN

−E[�NITN

]KN



3.2 By completing the square one has

P∗N = PN + KN(

HNPNHTN + RN)

K TN − KNHNPN − PNHTN K

TN

= PN +[

KN − PNHTN(

HNPNHTN + RN)−1]

(HNPNHTN + RN

)[

K TN −(

HNPNHTN + RN)−1

HNPN

]

− PNHTN(

HNPNHTN + RN)−1

HNPN

therefore the best value of KN is given by

KN = PNHTN(

HNPNHTN + RN)−1

.


The Filtering Problem State Space Models

Definition

A state space model S is a triple S = {μ0, κ, g} consisting of a probabilitymeasure μ0, a sequence of Markov kernels κ = {κt}t≥1 and a sequence ofbounded potentials functions g = {gt}t≥1.

For an arbitrary state space model S we define the following operators:the prediction (P) operator Ψt(μ) := κtμ,the update (U) operator Υt(μ) := gt ∗ μ,the prediction-update (PU) operator Φt(μ) := (Υt ◦ Ψt)(μ)the composition of PU operators as Φt|k := Φt ◦ Φt−1 ◦ ∙ ∙ ∙ ◦ Φk+1

Remark

Let Φt be the PU operator associated to the state space model Scorresponding to the filtering problem S = {π0, K , g} consisting of theprobability measure π0, a signal transition kernels K = {Kt}t≥1 and thesequence of likelihood functions g = {gt(yt , ∙)}t≥1. From (3) we deduce thatπt = Φt(πt−1). we can compactly represent the evolution of the filter over t − kconsecutive steps, namely πt = Φt|k (πk ). Note that the map Φt depends onthe kernels Kt and the likelihood functions gt alone (and not on π0).



The posterior measure is determined by the state space model S (theconverse is false !). We prove next that the posterior measure dependscontinuously on S. To do this we need to introduce a topology D on the set ofstate space models M. We do so by specifying the topology for each of thethere component parts of S. To be specific:

We endow the set of probability measures P(Rd ) with the metrisabletopology given by the total variation distance

Dtv (α, β) := supA

|α(A) − β(A)|, α, β ∈ P(Rd ),

where the supremum is taken over all measurable sets.

The sequence of Markov kernels κn = {κnt }t≥1 converges to κ = {κt}t≥1when

limn→∞

Dtv (κnt (∙, x), κt(∙, x)) = 0, for every t ≥ 1 and any x ∈ X , (7)

and we denote limn→∞ κn = κ.



We impose the topology of bounded convergence on the set of(non-negative) bounded potential functions. More precisely, we say thatthe sequence gn = {gnt }t≥1 is uniformly bounded when there existsG < ∞ such that supn,t≥1 ‖g

nt ‖∞ < G. Then, a uniformly bounded

sequence gn converges to g = {gt}t≥1 when

limn→∞

gnt (x) = gt(x), for every t ≥ 0 and any x ∈ X , (8)

and we write limn→∞ gn = g.

Finally, a sequence of state space models Sn = {πn0 , κn, gn} converges to

the model S = {π0, κ, g} in the topology D when limn→∞ Dtv (πn0 , π0) = 0,limn→∞ κn = κ and limn→∞ gn = g. We denote limn→∞ Sn = S.

The topology D has the property that convergence of the sequence of modelsSn, n ≥ 0, to S implies convergence of the marginal probability measures πnt(generated by the models Sn) towards the optimal filter πt generated by modelS. This result is made rigorous by the following theorem:



Theorem

Let Sn = {πn0 , κn, gn}, n ≥ 0, and S = {π0, κ, g} be elements of M with

corresponding PU operators Φnt and Φt , respectively. If limn→∞ Sn = S, then

limn→∞ Φnt|0(πn0) = Φt|0(π0).

Proof. Induction. The case t = 0 holds trivially, since limn→∞ Sn = S impliesthat limn→∞ Dtv (πn0 , π0) = 0. Assume that limn→∞ Dtv (βn, β) = 0 for any t ≥ 1,where βn = Φnt−1|0(π

n0) and β = Φt−1|0(π0). We apply the prediction operator

Ψnt (α) = κnt α,

|(f , Ψnt (βn)) − (f , Ψt (β))|

= |(f , Ψnt (βn)) − (f , Ψnt (β)) + (f , Ψ

nt (β)) − (f , Ψt (β))|

≤ |(κnf , βn − β)| + |((κn − κ)f , β)|

≤ ‖f‖∞Dtv (βn, β) + |((κn − κ)f , β)| ,

(9)

where the last inequality follows from the definition of TV distance. The firstterm on the right hand side of (9) converges to zero by the inductionhypothesis, while the second term converges to zero by the boundedconvergence theorem.



Next, we write the PU operator Φnt in terms of the P operator Ψnt to obtain

|(f , Φnt (βn)) − (f , Φt (β))| =

∣∣∣∣(fgnt , Ψ

nt (β

n))

(gnt , Ψnt (β

n))−

(fgt , Ψt (β))(gt , Ψt (β))

∣∣∣∣

≤

∣∣∣∣(fgnt , Ψ

nt (β

n))

(gnt , Ψnt (β

n))−

(fgnt , Ψnt (β

n)))

(gt , Ψt (β))

∣∣∣∣

+

∣∣∣∣(fgnt , Ψ

nt (β

n))

(gt , Ψt (β))−

(fgt , Ψt (β))(gt , Ψt (β))

∣∣∣∣

≤‖f‖∞ |(gnt , Ψ

nt (β

n)) − (gt , Ψt (β))|(gt , Ψt (β))

+|(fgnt , Ψ

nt (β

n)) − (fgt , Ψt (β))|(gt , Ψt (β))

. (10)

However, inequality (9) implies that both terms on the right hand side of (10)converge to 0, hence the proof is complete. �


The Filtering Problem Final Remarks

Final remarks:

The framework and the theoretical results of stochastic filtering provide asound base for developing data assimilation methodology

The posterior distribution of the signal satisfies a recurrence formula thatcannot be, in general, explicitly solved.

In the linear case the posterior distribution is Gaussian with mean andcovariance matrices that satisfy (6)

The posterior distribution depends continuous on the initial distribution,the signal transition kernels and the likelihood function.


The Filtering Problem Lecture II

Lecture II


The Filtering Problem List of Contents

Lecture II

How do we define an approximation

Particle Filters/Sequential Monte Carlo methods.

The Standard Particle Filter

Convergence Result

General Remarks

Why is the high-dimensional filtering problem hard ?

Model Reduction (High 7→Low Res)

Tempering, Jittering, Nudging

Final Remarks


The Filtering Problem How do we define an approximation ?

The description of a numerical approximation for the solution of the filteringproblem should contain three parts:

particle approximations Gaussianapproximations(aj (t)︸︷︷︸weight

, v1j (t) , . . . , vdj (t)

︸︷︷︸position

)nj=1 (aj (t)︸︷︷︸weight

, v1j (t) , . . . , vdj (t)

︸︷︷︸mean

, ω11j (t) , . . . , ωddj (t)

︸︷︷︸covariance matrix

)nj=1

πt πnt =∑n

j=1 aj (t) δvj (t) πt πnt =

∑nj=1 aj (t) N

(vj (t) , ωj (t)

)

2. The law of evolution of the approximation:

particle approximations Gaussianapproximations

πnt

mutation︷︸︸︷−→model

π̄nt+δ

selection︷︸︸︷−→

{Ys}s∈[t,t+δ]πnt+δ π

nt

forecast︷︸︸︷−→model

π̄nt+δ

assimilation︷︸︸︷−→

{Ys}s∈[t,t+δ]πnt+δ

3. The measure of the approximating error:

supϕ∈Cb

E [|πnt (ϕ) − πt(ϕ)|] , π̂t − π̂nt , ‖π

nt − πt‖TV .


The Filtering Problem Quantized information = particles

The quantized information is modelled by n stochastic processes

{pi(t), t > 0} i = 1, ..., n, pi(t) ∈ RN .

We think of the processes pi as the trajectories of n (generalized)particles.

Typically N > d , where d is the dimension of the state space.

πnt = Λnt (pi(t), t > 0 i = 1, ..., n).

Methodologies:

classical particle filtersgaussian approximationswaveletsgrid methods


The Filtering Problem The classical/standard/bootstrap/garden-variety particle filter

πn = {πn(t), t ≥ 0} the occupation measure of a system of weighted particles

πn(0) =n∑

i=1

1n

δxni −→ πn(t) =n∑

i=1

āni (t)δV ni (t).

• DC, Particle Filters. A Theoretical Perspective, Sequential Monte CarloMethods in Practice, 2001.• P. Del Moral. Feynman-Kac Formulae: Genealogical and Interacting ParticleSystems with Applications. Springer, 2004.



1. Initialisation [t = 0].

For i = 1, ..., N, sample x (i)0 from π0,

πN0 =1N

N∑

i=1

δx (i)0.

2. Iteration [t − 1 to t ].Let x (i)t−1, i = 1, . . . , n be the positions of the particles at time t − 1.

πNt−1 =1N

N∑

i=1

δx (i)t−1.

Step 1.

For i = 1, ..., n, sample x̄ (i)t from ft−1(xt |x(i)t−1)dxt .

pNt =1N

N∑

i=1

δx̄ (i)t.



Compute the (normalized) weight ā(i)t = gt(x̄(i)t )/(

∑nj=1 gt(x̄

(j)t )).

π̄Nt =N∑

i=1

ā(i)t δx̄ (i)t= gt ? pNt .

Step 2.

Replace each particle by ξ(i)t offsprings such that∑n

i=1 ξ(i)t = n.

[Sample with replacement n-times from x̄ (i)t , ]Denote the positions of the particles by x (i)t , i = 1, . . . , n.

πNt =1N

N∑

i=1

δx (i)t.

Further details in:

Bain, A., DC, Fundamentals of Stochastic Filtering, Series: StochasticModelling and Applied Probability, Vol. 60, Springer Verlag, 2009.



Theorem

πn converges to π. Moreover

supt∈[0,T ]

sup{‖ϕ‖∞≤1}

EY [|πNt (ϕ) − πt(ϕ)|] ≤cT√N

.

and√

N(πN − π) converges to a measure valued process ū = {ūt , t ≥ 0}.



Notation:

• Error(π, T , N) = supt∈[0,T ] sup{‖ϕ‖∞≤1} EY [|πNt (ϕ) − πt(ϕ)|]

• Error(p, T , N) = supt∈[0,T ] sup{‖ϕ‖∞≤1} EY [|pNt (ϕ) − pt(ϕ)|]

Theorem

For all T > 0, there exists cT such that

Error(π, T , N) ≤cT√N

, Error(p, T , N) ≤cT√N

if and only if Error(π, 0, N) ≤ c0√N

and, for all T > 0, there exists cT such that

supt∈[0,T ]

sup{‖ϕ‖∞≤1}

EY [|pNt (ϕ) − πNt−1Kt(ϕ)|] ≤

cT√N

(11)

supt∈[0,T ]

sup{‖ϕ‖∞≤1}

EY [|πNt (ϕ) − π̄Nt (ϕ)|] ≤

cT√N

. (12)



Proof.” ⇒ ”Immediate from the following two inequalities

∣∣∣pNt ϕ − π

Nt−1Ktϕ

∣∣∣ ≤∣∣∣pNt ϕ − ptϕ

∣∣∣+∣∣∣πt−1(Ktϕ) − πNt−1(Ktϕ)

∣∣∣ ,

∣∣∣πNt ϕ − π̄

Nt ϕ∣∣∣ ≤∣∣∣πNt ϕ − πtϕ

∣∣∣+∣∣∣πtϕ − π̄

Nt ϕ∣∣∣

where we used the fact that pt = πt−1Kt .” ⇐ ”Induction. The case t = 0 is assumed. The induction step is obtained asfollows: Since pt = πt−1Kt by the triangle inequality

|pNt ϕ − ptϕ| ≤ |pNt ϕ − π

Nt−1Ktϕ| + |π

Nt−1Ktϕ − πt−1Ktϕ|.

Also

π̄Nt ϕ−πtϕ=pNt (ϕgt)

pNt gt−

pt(ϕgt)ptgt

=−pNt (ϕgt)

pNt gt × ptgt(pNt gt−ptgt)+

(pNt (ϕgt)

ptgt−

pt(ϕgt)ptgt

)

,

and as |pNt (ϕgt)| ≤ ‖ϕ‖∞pNt gt ,

∣∣∣π̄Nt ϕ − πtϕ

∣∣∣ ≤

‖ϕ‖∞ptgt

∣∣∣pNt gt − ptgt

∣∣∣+

1ptgt

∣∣∣pNt (ϕgt) − pt(ϕgt)

∣∣∣ .



Corollary

The standard particle filter produces an aproximation πNt such that

Error(π, T , N) ≤cT√N

.

Proof. Immediate from the fact that the standard particle gives (11)+(12).�

A numerical example

Remarks:

Particle filters are recursive algorithms: The approximation for πt andYt+1 are the only information used in order to obtain the approximation forπt+1. In other words, the information gained from Y1, ..., Yt is embeddedin the current approximation.

The generic SMC method involves sampling from the prior distribution ofthe signal and then using a weighted bootstrap technique (or equivalent)with weights defined by the likelihood of the most recent observation data.



Step 2 can be done by means of sampling with replacement (SIRalgorithm), stratified sampling, Bernoulli sampling,Carpenter-Clifford-Fearnhead-Whitley genetic algorithm, Crisan-LyonsTBBA algorithm. All these methods satisfy the convergence requirement.

If d is small to moderate, then the standard particle filter can perform verywell in the time parameter n.

Under certain conditions, the Monte Carlo error of the estimate of thefilter can be uniform with respect to the time parameter.

The function xk 7→ g(xk , yk ) can convey a lot of information about thehidden state, especially so in high dimensions. If this is the case, usingthe prior transition kernel f (xk−1, xk ) as proposal will be ineffective.

It is then known that the standard particle filter will typically performpoorly in this context, often requiring that N = O(κd ).



10−3.5

10−3

10−2.5

10−2

5 10 15 20 25 30DimensionW

allclock

timepertimestep

(secon

ds) Algorithm PF STPF

Figure: Computational cost per time step to achieve a predetermined RMSE versusmodel dimension, for standard particle filter (PF) and STPF.


The Filtering Problem Why is the high-dimensional problem hard ?

Consider

Π0 = N (0, 1) (mean 0 and variance matrix 1).Π1 = N (1, 1) (mean 1 and variance matrix 1).Πd = N (d , 1) (mean d and variance matrix 1).d(Π0, Π1)TV = 2P [ |X | ≤ 1/2 ], X ∼ N(0, 1).d(Π0, Πd )TV = 2P [ |X | ≤ d/2 ], X ∼ N(0, 1).as d increases, the two measures get further and further apart, becomingsingular w.r.t. each other.as d increases, it becomes increasingly harder to use standardimportance sampling, to construct a sample from Π3 by using a proposalfrom Π1, weighting it using

dΠddΠ0

and (possibly) resample from it.



ConsiderΠ0 = N ((0, . . . , 0), Id ) (mean (0, . . . , 0) and covariance matrix Id ).Πd = N ((1, . . . , 1), Id ) (mean (1, . . . , 1) and covariance matrix Id ).d(Π0, Πd )TV = 2P [ |X | ≤ d/2 ], X ∼ N(0, 1).as d increases, the two measures get further and further apart, becomingsingular w.r.t. each other exponentially fast.it becomes increasingly harder to use standard importance sampling, toconstruct a sample from Πd by using a proposal from Π0.‘Moving’ from Π0 to Πd is equivalent to moving from a standard normaldistribution N (0, 1) to a normal distribution N (d , 1) (the total variationdistance between N (0, 1) and N (d , 1) is the same as that between Π1and Π2).

Add-on techniques:

• Tempering * • Jittering *• Model Reduction (High 7→Low Res)* • Nudging*• Sequential DA in space • Optimal transport prior 7→posterior• Hybrid models • Hamiltonian Monte Carlo• Informed priors • Localization



Model Reduction (High 7→Low Res)*

Model reduction is a valuable methodology that can lead to substantialcomputational savings: For the numerical example presented in theselectures, we perform state space order reduction through a coarsening of thegrid used for the numerical algorithm that approximates the dynamical systemfrom 2 × 5132 ≈ 0.5 × 106 to 2 × 652 = 8450.

Recall that the recursion formula for the conditional distribution of the signal

pt = πt−1Kt πt = gt ? pt , (13)

where dπtdpt = C−1t gt , where Ct ,

∫Rd gt (yt , xt) pt (dxt).

Following from Lecture I, πt is a continuous function of (π0, g1, ..., gt , K1, ..., Kt).In other words if

limn 7→0

(πn0 , gn1 , ..., g

nt , K

n1 , ..., K

nt ) = (π0, g1, ..., gt , K1, ..., Kt)

and pntΔ= πnt−1K

nt π

nt

Δ= gnt ? p

nt , then limε 7→0 π

εt = πt and limε 7→0 p

εt = pt (again,

in a suitably chosen topology).NB. Note that πεt is no longer the solution of a filtering problem, but simply thesolution of the iteration (13) .


The Filtering Problem Tempering

Tempering

Framework:

{Xt}t≥0 Markov chain P (Xt ∈ dxt |Xt−1 = xt−1) = ft(xt |xt−1)dxt ,

{Xt , Yt}t≥0 P (Yt ∈ dyt |Xt = xt) = gt(yt |xt)dyt

For i = 1 to d

◦ reweight the particle using g1dt and (possibly) resample from it

◦ move particles using an MCMC that leaves gkdt ftπ[0,t−1] invariant

Beskos, DC, Jasra, On the stability of SMC methods in high dimensions, 2014.Kantas, Beskos, Jasra, Sequential Monte Carlo for inverse problems, 2014.


The Filtering Problem Tempering

Initialisation t=0: For i = 1, . . . , n, sample qi0 from π0.Iteration (ti−1, ti ]: Given the ensemble {Xn (ti−1)}n=1,...,N ,

1 Evolve Xn (ti−1) using the signal equation to obtain Xn (ti) .2 Given X := {Xn (ti)}n=1,...,N , define normalised tempered weights

λ̄n,i (φ, X ) :=exp (−φΛn,i)∑m exp (−φΛm,i)

where the dependence on X means the Λn,i are computed using X .Define effective sample size

ESSi (φ, X ) :=∥∥λ̄i (φ, X )

∥∥−1

l2 .

Set φ = 1.3 ... While ESSi (φ, X ) < Nthreshold do:

(a) Find 1 − φ < φ′ < 1 such that ESSi (φ′ − (1 − φ) , X ) ≈ Nthreshold. Resampleaccording to λ̄n,i (φ′ − (1 − φ) , X ) and apply MCMC (jittering) if required (i.e.when there are duplicated particles), to obtain a new set of particles X (φ′).Set φ = 1 − φ′ and X = X (φ′) .

(b) If ESSi ≥ Nthreshold then STOP and go to the next filtering step with{(Xn (ti) , λ̄n,i

)}n=1,...,N

.


The Filtering Problem Jittering

Jittering

Procedure employed to reduce sample degeneracy.The particles are moved using a suitable chosen kernelThe moves are controlled so that the size of the (stochastic) perturbationremains of the same order as the particle filter error (1/

√N)

Algorithm (MCMC step after each tempering procedure)

Given the ensemble {Xn,k (ti)}n=1,...,N corresponding to the k ’th temperingstep with temperature φk , and proposal step size ρ ∈ [0, 1], repeat thefollowing steps.Propose

X̃n (ti) = G(

Xn (ti−1) , ρW (ti−1 : ti ; ω) +√

1 − ρ2Z (ti−1 : ti ; ω))

where Xn (ti) = G (Xn (ti−1) , W (ti−1 : ti ; ω)) , and W ⊥ Z .Accept X̃n (ti) with probability

1 ∧λ̄(φk , X̃n (ti)

)

λ̄ (φk , Xn (ti))

where λ (φ, x) = exp (−φΛ(x)) is the unnormalised weight function.Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 47 / 88

The Filtering Problem Jittering

Nudging

Nudging: reduce the error by ’correcting’ the model solution such that theparticles are kept closer to the true state ⇒ a nudging term N (α) addedto the model

(ti−1, ti ] : Given the ensemble {Xn(ti−1)}Nn=1 we want to assimilateobservational data Yti in order to obtain a new ensemble {Xn(ti)}

Nn=1 that

defines πNti :Obtain observation YtiEvolve Xn(ti−1)

modified kernel−−−−−−−−−→ X̃n(ti)

X̃n(ti) = Xn(ti−1) + f (Yti ) + Wti

Define new weights according to the ratio between the law of X̃n(ti) and thatof Xn(ti)


The Filtering Problem Final Remarks

Particle filters/sequential Monte Carlo methods are theoretically justifiedalgorithms for approximating the state of dynamical systems partially(and noisily) observed.

Particle filters are recursive algorithms: The approximation for πt andYt+1 are the only information used in order to obtain the approximation forπt+1. In other words, the information gained from Y1, ..., Yt is embeddedin the current approximation.

The standard particle filter is unsuitable for solving high-dimensionalproblem.

Properly calibrated and modified, particle filters can be used to solve highdimensional problems (see also the work of Peter Jan van Leeuwen,Roland Potthast, Hans Kunsch).

Important parameters: initial condition, number of particle, number ofobservations, correction times, observation error, etc.


The Filtering Problem .

Lecture III


The Filtering Problem List of Contents

Lecture III

Damped and forced incompressible 2D Euler modelFramework: Signal/Initial Condition/ObservationsParticle filter initial conditionModel Reduction/Noise Calibration/Uncertainty QuantificationParameters: No of Particles/Resampling Intervals/Observation Times

2-layer quasi-geostrophic model

Framework: Signal/Initial Condition/ObservationsNudging

Final Remarks

Based on joint work with Colin Cotter, Darryl Holm, Wei Pan, Igor Shevchenko.

◦ Numerically Modelling Stochastic Lie Transport in Fluid Dynamics◦ A Particle Filter for Stochastic Advection by Lie Transport (SALT): A case study forthe damped and forced incompressible 2D Euler equation◦ Modelling uncertainty using circulation-preserving stochastic transport noise in a2-layer quasi-geostrophic model◦ Data assimilation for a quasi-geostrophic model with circulation-preservingstochastic transport noise


The Filtering Problem Hardware comparison

Fluid dynamics models are used extensively to describe the evolution of theatmosphere and the oceans and play a crucial role in numerical weatherprediction (NWP). Numerical weather prediction requires massive computingcapabilities as it relies on Hi-Res Scientific Computations:

14,000 trillion calculations per second

2 petabytes of memory

460,000 compute cores.

24 petabytes of storage for saving data

Dimension of the state space O(109)

Dimension of the observation space O(107)

The Cray XC40 supercomputer (source: Met Office website)

Instead: Use particle filters built lower resolutions stochastic models.

2x Intel Xeon CPU @ 2.10GHz

32 logical cores

2x Nvidia Quadro M4000

64GB memory

Dimension of the state space 513 × 513 × 2

Dimension of the observation space 9 × 9 × 2

The Beast (source: Wei’s office)


The Filtering Problem Hardware comparison

Damped and forcedincompressible 2D Euler equation


The Filtering Problem A stochastic transport model

Example: Satellite Tracks Hurricanes Madeline and Lester in the Pacific

Laboratory Experiment: Fluid Transport Dynamics

High versus Low Res simulation: Euler equation

Deterministic versus Stochastic models: Stochastic Euler

Euler (vorticity form)

∂tω + u ∙ ∇ω = Q − rω, div(ω) = 0

ω vorticity field u velocity field

Evolution of Lagrangian fluid parcels

dtxt = ut(xt) ⇒ dtxt = ut(xt)dt +∑

i ξi(xt) ◦ dWi(t)



Consider a two dimensional incompressible fluid flow u defined on 2D-torusΩ = [0, Lx ] × [0, Ly ] modelled by the two-dimensional Euler equations withforcing and dampening. Let q = ẑ × curl u denote the vorticity of u, where ẑdenotes the z-axis. For a scalar field g : Ω → R, we write∇⊥ g = (−∂y g, ∂xg)

T . Let ψ : Ω × [0,∞) → R denote the stream function.

∂t q + (u ∙∇) q = Q − rq

u = ∇⊥ ψ

Δψ = q.

Q is the forcing term given by Q = 0.1 sin (8πx)

r is a positive constant - the large scale dissipation time scale.

we consider slip flow boundary condition ψ∣∣∂Ω

= 0.

evolution of Lagrangian fluid parcels

dxtdt

= u(xt , t) .



Domain is [0, 1]2

PDE System | SPDE System

∂tω + u ∙ ∇ω = Q − rω | dq + ū ∙ ∇qdt +∑

i

ξi ∙ ∇q ◦ dW it = (Q − rq) dt

u = ∇⊥ψ | ū = ∇⊥ψ̃

Δψ = ω | Δψ̃ = q

Q = 0.1 sin (8πx), r = 0.01. Boundary Condition ψ|∂Ω = 0 and ψ̃∣∣∣∂Ω

= 0.

PDE SPDEGrid Resolution 512x512 64x64Numerical Δt 0.0025 0.01

Spin-up 40 ett ett: eddy turnover time L/uL ≈ 2.5 time units.Numerical scheme: a mixed continuous and discontinuous Galerkin finiteelement scheme + an optimal third order strong stability preservingRunge-Kutta, [Bernsen et al 2006, Gottlieb 2005].


The Filtering Problem initial condition

Initial Condition

Initial configuration for the vorticity

ωspin = sin(8πx) sin(8πy) + 0.4 cos(6πx) cos(6πy)

+ 0.3 cos(10πx) cos(4πy) + 0.02 sin(2πy) + 0.02 sin(2πx)(14)

from which we spin–up the system until an energy equilibrium state seems tohave been reached.This equilibrium state, denoted by ωinitial, is then chosen as the initial condition.



Plot of the numerical PDE solution at the initial time tinitial and itscoarse-grained version done via spatial averaging and projection of the finegrid stream-function to the coarse grid.



Plot of the numerical PDE solution at the final time t = tinitial + 146 large eddyturnover times (ett). The coarse-graining is done via spatial averaging andprojection of the fine grid streamfunction to the coarse grid.


The Filtering Problem Observation

Observations:u is observed on a subgrid of the signal grid (9 × 9 points)

Yt (x) =

{uSPDEt (x) + αzx , zx ∼ N (0, 1) Experiment 1uPDEt (x) + αzx , zx ∼ N (0, 1) Experiment 2

α is calibrated to the standard deviation of the true solution over a coarsegrid cell.


The Filtering Problem Particle Filter Initial Condition

Particle Filter Initial Condition

A good choice of the initial condition is esential for the successfulimplementation of the filter.

In practice it is a reflection of the level of uncertainty of the estimate ofinitial position of the dynamical system.

We use the initial condition is to obtain an ensemble which containparticles that are reasonably ‘close’ to the truth.Choice for the running example

deformation - physically consistent with the system, casimirs preserved.We take a nominal value ωt0 and deform it using the following ‘modified’Euler equation:

∂tω + βi u(τi) ∙ ∇ω = 0 (15)

where βi ∼ N (0, �), i = 1, . . . , Np are centered Gaussian weights with anapriori variance parameter �, and τi ∼ U (tinitial, t0) , i = 1, . . . , Np are uniformrandom numbers. Thus each u (τi) corresponds to a PDE solution in thetime period [tinitial, t0).



Alternative choicesq + ζ where ζ is gaussian random field, doable but not physical, only worksfor q because it’s the least smooth of the three fields of interest . The otherfields are spatially smooth. also this breaks the SPDE well-posednesstheorem (function space regularity). Figure (ux,uy)



directly perturb ψ, by ψ + ψ̄ where ψ̄ = (I − κΔ)−1 ζ invert ellipticoperator with boundary condition ψ̄ = 0. Figure (ux , uy)


The Filtering Problem Model Reduction

Model Reduction (High 7→Low Res)*

We perform state space order reduction through a coarsening of the grid usedfor the numerical algorithm that approximates the dynamical system from2 × 5132 ≈ 0.5 × 106 to 2 × 652 = 8450. The procedure is theoreticallyjustified due to the continuity of the posterior distribution wrt the state spacemodel (see Lecture 1)

To account for the ”missing scales” we use a “stochastic parametrization”. Wedefine a Stochastic PDE on the coarser grid:

∂t q + (u ∙∇) q +∞∑

k=1

(ξk ∙ ∇) q ◦ dBkt = Q − rq

u = ∇⊥ ψ

Δψ = q.

ξk are divergence free given vector fieldsξk are computed from the true solution by using an empirical orthogonalfunctions (EOFs) procedureBkt are scalar independent Brownian motions


The Filtering Problem Methodology to calibrate the noise

The reason for this “stochastic parametrization” is grounded in solid physicalconsiderations, see

D.D. Holm, Variational principles for stochastic fluids, Proc.Roy. Soc. A, 2015.

dxft = uft (x

ft )dt

dxct = uct (x

ct )dt +

∑

i

ξi(xct ) ◦ dWi(t)For each m = 0, 1, . . . , M − 1

1 Solve dxfij (t)/dt = uft (x

fij (t)) with initial condition x

fij (mΔT ) = xij .

2 Compute uct by low-pass filtering uft along the trajectory.

3 Compute xcij (t) by solving dxcij (t)/dt = u

ct (x

fij (t)) with the same initial

condition.4 Compute the difference Δxmij = x

fij ((m + 1)ΔT ) − x

cij ((m + 1)ΔT ), which

measures the error between the fine and coarse trajectory.



Having obtained Δxmij , we would like to extract the basis for the noise. Thisamounts to a Gaussian model of the form

Δxmij√δt

= Δ̃xij +N∑

k=1

ξkij ΔWkm,

where ΔW km are i.i.d. Normal random variables with mean zero, variance 1.

We estimate ξ by using Empirical orthogonal functions (EOFs). EOFscan be thought of as principal components that correspond to the spatialcorrelations of a field (see Hannachi, A. (2004), Hannachi, A., Jollie, I.,and Stephenson, D. (2007)). EOFs are the eigenvectors of thevelocity-velocity spatial covariance tensor.

We write the data time series Δxmij , m = 0, . . . , M − 1 as a matrix F̃whose entries are two dimensional vectors, and whose rows (row indexm) correspond to serialised Δxmij . Let F := detrend(F̃ ) where the detrendfunction removes the column mean from each entry. We then estimatethe spatial covariance tensor by computing R := 1M−1 F

T F .

We take the EOFs to be the eigenvectors of R, ranked in descendingorder according to the eigenvalues.



Number of BM (EoFs - empirical orthogonal functions)

decide on a case by case basis

too many will slow down the algorithm

On the left: Number of EOFs 90% variance vs 50% (no change).On the right: Normalised spectrum of the Δx covariance operator, showingnumber of BM required to capture 50%, 70% and 90% of total variance



The aim of the calibration is to capture the statistical properties of thesub-grid fluctuations, rather than the trajectory of the flow.

Validation of the stochastic parameterisation in terms of uncertaintyquantification for the SPDE.

Performance of DA algorithm relies on the correct modelling of theunresolved scales.



Model reduction UQ pictures for systems 256x256, 128x128 and 64x64

(a) ux (b) uy

(a) psi (b) q



Ensemble Distance from the “truth”

Velocity Field d({

q̂i , i = 1, . . . , Np}

, ω, t)

:= mini∈{1,...,Np}‖ω(t)−q̂i (t)‖

L2(D)

‖ω(t)‖L2(D)


The Filtering Problem Parameters

Parameters


The Filtering Problem Parameters

Number of Particles

decide on a case by case basistoo few will not give a reasonable solutiontoo many will slow down the algorithm

Picture Number of particles 225 (good) vs 500 (no change), 225 (good) vs 25(less good) 25 seems ok but we want as many as computationally feasible totune the algorithm

(a) psi 225 vs 500 (b) psi 225 vs 25


The Filtering Problem SIR fails

Classical Particle Filter fails !

Histogram of weights

Figure: example: loglikelihoods histogram, period 1 ett, 100 particles


The Filtering Problem Resampling Intervals

Resampling Intervals

small resampling intervals lead to an unreasonable increase in thecomputational effort

large resampling intervals make the algorithm fail

the ESS can be used as criterion for choosing the resampling time

adapted resampling time can be used

ESS evolution in time/observation noise


The Filtering Problem Results

DA Solution for DA periods: 1 ETT and 0.2 ETT

(a) ux (b) uy (c) psi (d) q

Figure: DA: obs spde, period 0.2 ett




Figure: DA: obs pde, period 0.2 ett


Figure: DA: obs pde, period 1 ett



Number of tempering steps/Average MCMC steps

3. Add-on steps: Model Reduction, Tempering, Jittering

πNt−1

no nudging︷︸︸︷−→−→pNt

adaptive tempering+jiterring︷︸︸︷−→−→−→−→ πNt

Truth versus Conditional Mean: Euler equation with forcing and damping

Truth versus Absolute Difference: Euler equation with forcing and damping



2-layer quasi-geostrophic model



Case study: two-layer quasi-geostrophic model for a β-plane channel flow withO(106) degrees of freedom. The model is reduced by following the stochasticvariational approach for geophysical fluid dynamics introduced in Holm[2015]as a framework for deriving stochastic parametrisations for unresolved scales.The computations are done only for O(104) degrees of freedom.

πNt−1

nudging︷︸︸︷−→−→pNt

adaptive tempering+jiterring︷︸︸︷−→−→−→−→ πNt

Example: 2-layer quasi-geostrophic model

The two-layer deterministic QG equations for the potential vorticity (PV) q:

∂q1∂t

+ u1 ∙ ∇q1 = νΔ2ψ1 − β∂ψ1∂x

,

∂q2∂t

+ u2 ∙ ∇q2 = νΔ2ψ2 − μΔψ2 − β∂ψ2∂x

,

(16)

where ψ is the stream function, β is the planetary vorticity gradient, μ is thebottom friction parameter, ν is the lateral eddy viscosity, and u = (u, v) is thevelocity vector. The computational domain Ω = [0, Lx ] × [0, Ly ] × [0, H] is ahorizontally periodic flat-bottom channel of depth H = H1 + H2 given by twostacked isopycnal fluid layers of depth H1 and H2.



Forcing in (16) is introduced via a vertically sheared, baroclinically unstablebackground flow

ψi → −Ui y + ψi , i = 1, 2, (17)

where the parameters Ui are background-flow zonal velocities. The PVanomaly and stream function are related through two elliptic equations:

q1 = Δψ1 + s1(ψ2 − ψ1), (18a)

q2 = Δψ2 + s2(ψ1 − ψ2), (18b)

with stratification parameters s1, s2. The system (16)-(18) is augmented bythe integral mass conservation constraint

∂

∂t

∫∫

Ω

(ψ1 − ψ2) dydx = 0, (19)

by the periodic horizontal boundary conditions,

ψ∣∣∣Γ2

= ψ∣∣∣Γ4

, ψ = (ψ1, ψ2) , and no-slip boundary conditions

u∣∣∣Γ1

= u∣∣∣Γ3

= 0 . set at northern and southern boundaries of the domain.



The stochastic version of the QG equations (16) is given by:

dq1 +

(

u1 dt +K∑

k=1

ξk1 ◦ dWkt

)

∙ ∇q1 =(

νΔ2ψ1 − β∂ψ1∂x

)

dt ,

dq2 +

(

u2 dt +K∑

k=1

ξk2 ◦ dWkt

)

∙ ∇q2 =(

νΔ2ψ2 − μΔψ2 − β∂ψ2∂x

)

dt .

(20)

The stochastic terms marked in red color is the only difference from thedeterministic QG model (16), all other equations are the same as in thedeterministic case.

Stochastic solutions on two different signal gridsGs = {129 × 65, 257 × 129} in order to highlight the effect of the modelreduction on the results.Observation process Y velocity observed at two different data gridsGd = {4 × 4, 8 × 4}.The size of the ensemble is taken to be N = 100 and the number ofBrownian motion (independent sources of stochasticity) is taken to beK = 32. This is enough to reasonably quantify the uncertainty of themodel: we showe that the spread of the ensemble will not increasesubstantially by taking more particles and/or sources of noise (BMs).



The observations data Yt is an M-dimensional process that consists ofnoisy measurements of the velocity field u taken at a point belonging tothe data grid Gd :

Yt := Psd (Zt) + η,

where Psd : Gs → Gd is a projection operator from the signal grid Gs to thedata grid Gd , η = N (0, Iσ) is a normally distributed random vector, withmean vector 0 = (0, . . . , 0) and diagonal covariance matrixIσ = diag(σ21 , . . . , σ

2M).

Rather than choosing an arbitrary σ = (σ1, . . . , σM) for the standarddeviation of the noise, we use the standard deviation of the velocity fieldcomputed over the coarse grid cell of the signal grid.

We introduce the likelihood-weight function

W(X, Y) = exp

(

−12

M∑

i=1

∥∥∥∥

Psd (Xi) − Yiσi

∥∥∥∥

2

2

)

, (21)

with M being the number of grid points (weather stations).



In order to measure the variability of the weights (21) of particles we usethe effective sample size:

ESS(w) =

(N∑

i=1

(wi)2

)−1

, w := w

(N∑

i=1

wi

)−1

, (22)

which is close to the ensemble size N if the particles have weights thatare close to each other, and decays to one, as the ensemble degenerates(i.e. there are fewer and fewer particles with large weights and the resthave small weights).

One should resample for the weighted ensemble if the ESS drops belowa given threshold, N∗,

ESS< N∗.

We chose N∗ = 80 to be our threshhold.


The Filtering Problem Nudging

Nudging

The idea of nudging is to correct the solution of SPDE (20) so as to keep theparticles closer to the true state. To do so, we add a ‘nudging term’ (marked inblue) to SPDE (20),

dqi(λ) +

(

u i(λ) dt +K∑

k=1

ξki ◦ dWkt +

K∑

k=1

ξki λk dt

)

∙ ∇qi(λ) = Fi dt , i = 1, 2.

(23)

q depends on the parameter λ. The trajectories of the particles will besolutions of this perturbed SPDE (23). To account for the perturbation, theparticles will have new weights according to Girsanov’s theorem, given by

W(q(λ), Y, λ) = exp

(

−

([12

M∑

i=1

∥∥∥

Psd (qtj+1(λ)) − Ytj+1σi

∥∥∥

2

2+

∫ tj+1

tj

(

λ2kdt2

− λk dWk

)]))

.

(24)These weights measure the likelihood of the position of the particles given theobservation, and the last term accounts for the change of probabilitydistribution from q to q(λ). We wish to choose λ so as to maximize theselikelihoods.



In other words, we look to solve the equivalent minimization problem

minλk , k∈[1..K ]

[12

M∑

i=1

∥∥∥


∥∥∥

2

2+

∫ tj+1

tj

(

λ2kdt2

− λk dWk

)]

(25)

together with (23). In general this is a challenging nonlinear optimisationproblem, especially if one allows the λk ’s to vary in time.To simplify the problem, we perturb only the corrector stage of the finaltimestep before tj+1. Then the (discrete version of the) minimizationproblem (25) becomes


[12

M∑

i=1

∥∥∥


∥∥∥

2

2+

K∑

k=1

(

λ2kδt2

− λkΔWk

)]

, (26)

where δt is the time step. Let us re-write

qtj+1(λ) = A(qtj+1/2) +K∑

k=1

Bk (q̃tj+1)(ΔWk + λkδt),

where qtj+1/2 and q̃tj+1 are computed in the prediction and the extrapolationsteps, respectively.



We can then re-write the minimisation problem (26) as


V(q(λ), Y, λ), (27)

whereV(q(λ), Y, λ) = Q + Q1(λ) + Q2(λ, ΔW1, ..., ΔWK ),

This is a quadratic minimization problem with the optimal value λ depending(linearly) on the increments ΔW1, ..., ΔWK . This optimal choice is not allowedas the parameter λ can only be a function of all the approximation q̃tj+1 , qtj+1/2and Ytj+1 (since it needs to be adapted to the forward filtration of the set ofBrownian motions {Wk}). To ensure that this constraint is satisfied, weminimise the conditional expectation of V(q(λ), Y, λ) given the q̃tj+1 , qtj+1/2 andYtj+1 , that is


E[V(q(λ), Y, λ)|q̃tj+1 , qtj+1/2 , Ytj+1

].

This functional is quadratic in λ, and hence the optimization can be done bysolving a linear system. This nudging methodology remains asymptoticallyconsistent.



up1 vp1

t [days] t [days]

Figure: Signal grid Gs = 257 × 129.


The Filtering Problem Final remarks

Incorporating stochastic transport into the fluid dynamics equation allowsquantification of model uncertainty.

The aim of the noise calibration is to capture the statistical properties ofthe sub-grid fluctuations, rather than the trajectory of the flow.

Validation of the stochastic parameterisation in terms of uncertaintyquantification for the SPDE.

Performance of DA algorithm relies on the correct modelling of theunresolved scales.Data assimilation is performed using Particle Filters

Particle filters are theoretically justified algorithms for approximating thestate of dynamical systems partially (and noisily) observed.Properly calibrated and modified, particle filters can be used to solve highdimensional data assimilation problems.One can use methodology to assess forecast reliability.

Additional work is needed to analyse realistic models that incorporate:boundary conditions, gravity, rotation, buoyancy, bathymetry, etc


.Syllabus

.List of ContentsLecture 1.What is DA ?What is stochastic filtering ?

The Stochastic filtering ProblemThe filtering problem

The Filtering ProblemFramework: discrete/continuous timeFrameworkThe signalNotationBayes' recursion formulaThe Linear Filter

The Linear FilterState Space ModelsFinal RemarksLecture IIList of ContentsHow do we define an approximation ?Quantized information = particlesThe classical/standard/bootstrap/garden-variety particle filter

The Stochastic filtering ProblemThe filtering problem

The Filtering ProblemFramework: discrete/continuous timeWhy is the high-dimensional problem hard ?TemperingJitteringFinal Remarks.List of ContentsHardware comparisonA stochastic transport modelinitial conditionObservationParticle Filter Initial ConditionModel Reduction Methodology to calibrate the noiseParametersSIR failsResampling IntervalsResultsResultsFrameworkFrameworkNudgingFinal remarks

particle filters for data assimilationdcrisan/talkparis05to07nov2019.pdfbain, d crisan, fundamentals...

Documents