particle filters for data assimilationdcrisan/talkparis05to07nov2019.pdfbain, d crisan, fundamentals...

Click here to load reader

Upload: others

Post on 03-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Particle Filters for Data Assimilation

    Dan Crisan

    Imperial College London

    Course III: Big data, data assimilation, and uncertainty quantificationThe Mathematics of Climate and the Environment

    IHP, Paris, September 9 - December 21 2019

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 1 / 88

  • . Syllabus

    Lectures:

    1. Data assimilation as a stochastic filtering problem. Framework. Thesignal and the observation process. Prior/Posterior Distribution. Thelinear/Kalman filter (Tuesday, November 5, 10:30am-12:30pm)

    2. Solving data assimilation problems using particle filters. Mathematicaland methodological considerations. The standard particle filter. ModelReduction. Tempering. Jittering. Nudging. (Wednesday, November 6,10:30am-12:30pm)

    3. Two case studies: 2D Euler and 2D Two layer Quasigeostrophic model.Algorithmic considerations: Choice of initial condition, number ofparticles, assimilation step, reduction parameter. Forecast reliability.(Thursday, November 7, 10:30am-12:30pm)

    Material prepared in collaboration with Oana Lang, Wei Pan, Igor Shevchenko.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 2 / 88

  • . Syllabus

    Lecture I

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 3 / 88

  • . List of Contents

    Lecture I

    What is Data Assimilation (DA)? DA as a Stochastic Filtering Problem.

    The Stochastic Filtering Problem

    Framework (Signal/Observation) and Notation

    Recursion Formula

    The Linear Filter

    Continuity with respect to the corresponding state space model

    Final remarks

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 4 / 88

  • . What is DA ?

    What is Data Assimilation ?

    set of methodologies that combines past knowledge of a system in theform of a numerical model with new information about that system in theform of observations of that system.

    designed to improve forecasting, reduce model uncertainties and adjustmodel parameters.

    term used mainly in the computational geoscience community

    major component of Numerical Weather Prediction

    Variational DA: combines the model and the datathrough the optimisation of a given criterion(minimisation of a so-called cost-function).

    Sequential DA: uses a set of modeltrajectories/possible scenarios that areintermittently updated according to data and areused to infer the past, current or future position ofa system.

    Hurricane Irma forecast: a. ECMWF, b. USA Global Forecast

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 5 / 88

  • . What is stochastic filtering ?

    DA as a Stochastic Filtering Problem

    Stochastic Filtering: The process of using partial observations and astochastic model to make inferences about an evolving dynamical system.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 6 / 88

  • The Filtering Problem Framework: discrete/continuous time

    X the signal process - “hidden component”

    Y the observation process - “the data”

    The filtering problem : Find the conditional distribution of the signal Xt givenYt = σ(Ys, s ∈ [0, t ]), i.e.,

    πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

    Discrete framework:

    {Xt}t≥0 Markov chain P (Xt ∈ dxt |Xt−1 = xt−1) = ft(xt |xt−1)dt ,

    {Xt , Yt}t≥0 P (Yt ∈ dy |Xt = xt) = gt(y |xt)dy

    Continuous framework:

    dXt = f (Xt)dt + σ(Xt)dVt ,

    dYt = h(Xt)dt + dWt .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 7 / 88

  • The Filtering Problem Framework

    The filtering problem : Find the conditional distribution of the signal Xt givenYt = σ(Ys, s ∈ [0, t ]), i.e.,

    πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

    Discrete framework: {Xt , Yt}t≥0 Markov process

    The signal process

    • {Xt}t≥0 Markov chain, X0 ∼ π0 (dx0)

    • P (Xt ∈ dxt |Xt−1 = xt−1) = Kt (xt−1, dxt) = ft(xt |xt−1)dxt ,

    • Example: Xt = b (Xt−1) + σ (Xt−1) Bt , Bt ∼ N (0, 1) i.i.d.

    The observation process

    • P(Yt ∈ dyt |X[0,t] = x[0,t], Y[0,t−1] = y[0,t−1]

    )= P (Yt ∈ dyt |Xt = xt) = gt(yt |xt)dyt

    • Example: Yt = h (Xt) + Vt , Vt ∼ N (0, 1) i.i.d.

    where X[0,t] , (X0, ..., Xt) , x[0,t] , (x0, ..., xt) .

    ◦ A. Bain, D Crisan, Fundamentals of Stochastic Filtering, Springer, 2009.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 8 / 88

  • The Filtering Problem The signal

    Let the signal X = {Xt , t ∈ N} be a stochastic process defined on theprobability space (Ω,F ,P) with values in Rd . Let FXt be the filtrationgenerated by the process; that is,

    FXt , σ(Xs, s ∈ [0, t ]).

    We assume that X is a Markov chain. That is, for all t ∈ N and A ∈ B(Rd ),

    P(

    Xt ∈ A | FXt−1)

    = P (Xt ∈ A | Xt−1) . (1)

    The transition kernel of the Markov chain X is the function Kt(∙, ∙) defined onRd × B(Rd ) such that, for all t ∈ N and x ∈ Rd ,

    Kt(x , A) = P(Xt ∈ A | Xt−1 = x). (2)

    The transition kernel Kt is required to have the following properties.

    i. Kt(x , ∙) is a probability measure on (Rd ,B(Rd )), for all t ∈ N and x ∈ Rd .

    ii. Kt(∙, A) ∈ B(Rd ), for all t ∈ N and A ∈ B(Rd ).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 9 / 88

  • The Filtering Problem Notation

    Notation:

    • posterior measure: the conditional distribution of the signal Xt given Yt

    πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

    • predictive measure: the conditional distribution of the signal Xt given Yt−1

    pt (A) = P(Xt ∈ A|Yt−1), t ≥ 0, A ∈ B(Rd ).

    • prior distribution: the distribution of the signal Xt

    qt(A) , P(Xt ∈ A), t ≥ 0, A ∈ B(Rd ).

    • If μ is a measure and f is a function, then μ (f ) ,∫

    f (x)μ (dx) .• If f is a function and K is a kernel, then Kf (x) ,

    ∫f (y)K (x , dy) .

    • If μ is a measure and k is a kernel, then Kμ (A) ,∫

    μ (dx) K (x , A) .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 10 / 88

  • The Filtering Problem Notation

    Proposition

    The prior distribution qt satisfies the formula

    qt = Kt . . . K2K1q0, t > 0.

    qt(f ) = q0(ft), t > 0,

    where ft = K0K1 . . . Kt f .

    Proof: Use induction and the recurrence formula qt = Ktqt−1, t > 0.

    Definition

    Let μ be a measure and ϕ be a non-negative function such that μ(ϕ) > 0. Theprojective product ϕ ∗ μ is the measure defined by

    ϕ ∗ μ(A) ,

    Aϕ(x)μ(dx)

    μ(ϕ)

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 11 / 88

  • The Filtering Problem Bayes’ recursion formula

    Theorem (Bayes’ recursion formula)

    The posterior distribution satisfies the following recursion formula

    Prediction pt = Ktπt−1Updating πt = gt ∗ pt

    (3)

    In other words, dπtdpt = C−1t gt , where Ct ,

    ∫Rd gt (yt , xt) pt (dxt).

    πt−1Kt−−−−−→

    modelforecast

    prediction

    Ktπt−1 =: ptnon-linear : gt∗−−−−−−−−−→

    assimilationanalysisupdate

    gt ∗ pt= πt

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 12 / 88

  • The Filtering Problem Bayes’ recursion formula

    Proof.

    Step 1. Ktπt−1 = pt

    pt f = E [f (Xt) | Y0:t−1]

    = E[E[f (Xt) | FXt−1 ∨ σ(W0:t−1)

    ]| σ(Y0:t−1)

    ](tower property)

    = E[E[f (Xt) | FXt−1

    ]| σ(Y0:t−1)

    ](W0:t−1 is independent of X0:t )

    = E [Kt f (Xt−1) | σ(Y0:t−1)] (Markov property of X )

    = πt−1(Kt f ),

    for any f ∈ B(Rd ), which implies that pt = Ktπt−1 .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 13 / 88

  • The Filtering Problem Bayes’ recursion formula

    Step 2. gt ∗ pt = πtIdea: For any A ∈ B(Rd )

    C0:t

    πt(A)PY0:t (dy0:t)

    ︸ ︷︷ ︸P({Xt ∈ A} ∩ {Y0:t ∈ C0:t})

    =

    C0:t

    gytt ∗ pt(A)PY0:t (dy0:t). (4)

    Step 2.1. Show that PY0:t (dy0:t) = pt(gytt)

    dytPY0:t−1(dy0:t−1) :

    PY0:t (C0:t) =P({Yt ∈ Ct} ∩ {Xt ∈ Rd} ∩ {Y0:t−1 ∈ C0:t−1}

    )

    =

    Rd×C0:t−1

    P (Yt ∈ Ct | Xt = xt , Y0:t−1 = y0:t−1)︸ ︷︷ ︸∫

    Ctg

    ytt (xt )dyt

    PXt ,Y0:t−1 (dxt , dy0:t−1))︸ ︷︷ ︸pt (dxt )PY0:t−1 (dy0:t−1)

    =

    Rd×C0:t−1

    Ctgytt (xt) dyt p

    y0:t−1t (dxt)PY0:t−1(dy0:t−1)

    =

    C0:t

    Rdgytt (xt)p

    y0:t−1t (dxt)PY0:t−1(dy0:t−1) dyt .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 14 / 88

  • The Filtering Problem Bayes’ recursion formula

    Step 2.2. Use the definition of the projective product to write:∫

    C0:t

    gytt ∗ pt(A)PY0:t (dy0:t)

    =

    C0:t

    ∫A g

    ytt (xt)pt (dxt)

    pt(gytt) PY0:t (dy0:t)

    =

    C0:t

    Agytt (xt)pt(dxt) dytPY0:t−1(dy0:t−1)

    =

    A×C0:t−1

    (∫

    Ctgytt (xt)dyt

    )

    ︸ ︷︷ ︸P (Yt ∈ Ct | Xt = xt , Y0:t−1 = y0:t−1)

    pt(dxt)PY0:t−1(dy0:t−1)︸ ︷︷ ︸PXt ,Y0:t−1 (dxt , dy0:t−1)

    =

    A×C0:t−1

    P (Yt ∈ Ct | Xt = xt , Y0:t−1 = y0:t−1) × PXt ,Y0:t−1 (dxt , dy0:t−1)

    = P ({Xt ∈ A} ∩ {Y0:t ∈ C0:t}).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 15 / 88

  • The Filtering Problem The Linear Filter

    Consider the following dynamical system in discrete time:

    Xt+1 = FtXt + ft + Wt , X0 = ξ, Xt ∈ Rn,

    Yt = HtXt + ht + Bt , Yt ∈ Rm,(5)

    where

    Ft , Ht are matrices with appropriate sizes,

    ft is a sequence of vectors in Rn,

    ht is a sequence of vectors in Rm,

    ξ is a Gaussian random variable with mean x0 and covariance matrix P0.

    Wt , Bt are Gaussian random variables with mean 0 and covariancematrices Qt , Rt .

    The random variables ξ, Bt , Wt are mutually independent.

    Aim : ComputeX̂N = E[XN | YN ]

    where YN = σ(Y0, Y1, . . . , YN−1).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 16 / 88

  • The Filtering Problem The Linear Filter

    Define the linear estimate

    FS := X̄N +N−1∑

    t=0

    St(Yt − Ȳt)

    where Si are matrices from L(Rm,Rn) which define the filter F . Note that

    X̄t+1 = Ft X̄t + ft , t = 0, 1, . . . , N − 1

    Ȳt+1 = HtX̄t + ht , t = 0, 1, . . . , N − 1.

    The best linear filter is obtained by choosing S such that it minimizes thefunctional

    LS = E[(XN −FS)

    T (XN −FS)].

    We can write

    LS = trΛNN +N−1∑

    t=0

    (trRtSTt St − 2trΛtNStHt

    )+

    N−1∑

    t,l=0

    trΛltHTt STt SlHl

    whereΛtl := E

    [(Xt − X̄t

    )T (Xl − X̄l

    )]∈ L(Rn,Rn)

    is the correlation matrix of the process Xt .Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 17 / 88

  • The Filtering Problem The Linear Filter

    Proposition

    There exists a unique S which minimizes the functional LS.

    Proof. LS is a quadratic form with

    N−1∑

    t,l=0

    trΛltHTt STt SlHl ≥ 0

    andN−1∑

    t=0

    trRtSTt St ≥ αN−1∑

    t=0

    n∑

    h=1

    m∑

    i=1

    (St,hi)2 = αN−1∑

    t=0

    trSTt St = α‖S‖2.

    Therefore LS has a unique minimizer Ŝ, defined by

    N−1∑

    t=0

    (tr(

    ŜtStRt + RtŜTt St)− 2trΛtNStHt

    )

    +N−1∑

    t,l=0

    tr(ΛltHTt Ŝ

    Tt SlHl + H

    Tl Ŝ

    Tl StHtΛtl

    )= 0, ∀S.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 18 / 88

  • The Filtering Problem The Linear Filter

    Proposition

    Prove that

    E [XN |YN ] = X̂N=FŜ = X̄N +N−1∑

    t=0

    Ŝt(Yt − Ȳt)

    and that πt is Gaussian

    Proof. Let �N := XN −FŜ. Using the definition of LS one has

    t

    E[�TNStYt + Y

    Tt S

    Tt �N

    ]= 0, ∀S0, S1, . . . , SN−1.

    Since S0, S1, . . . , SN−1 are arbitrary, it follows that �TN and Y0, Y1, . . . , YN−1 areuncorrelated and therefore also independent. We use here the fact that�TN , Y0, Y1, . . . , YN−1 are mutually Gaussian (crucial !).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 19 / 88

  • The Filtering Problem The Linear Filter

    Theorem

    The solution X̂t = E [Xt |Yt ] satisfies the recurrence formula

    X̂N+1 = FNX̂ ∗N + fN

    X̂ ∗N = X̂N + PNHTN

    (RN + HNPNHTN

    )−1 (YN − HNX̂N − hN

    )

    PN+1 = QN + FNP∗NFTN

    P∗N = PN − PNHTN

    (HNPNHTN + RN

    )−1HNPN

    X̂0 = X0,

    (6)

    Proof.

    Step 1. Define the innovation process

    IN := YN − (HNX̂N + hN)

    IN is Gaussian, independent of YN with mean 0 and covariance given byδtl(Rt + HtPtHTt

    ).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 20 / 88

  • The Filtering Problem The Linear Filter

    Step 2. Using IN and the previous proposition one has

    X̂ ∗N = E [XN | Y0, . . . , YN−1, IN ]

    X̂ ∗N = X̂N + KNIN

    where KN is a gain factor which minimizes the covariance of the error andneeds to be determined.

    Step 3. Find the optimal value of KN :

    3.1. The covariance is given by

    P∗N = PN + KN(

    HNPNHTN + RN)

    K TN

    − KN E[IN�TN

    ]

    ︸ ︷︷ ︸HN PN

    −E[�NITN

    ]KN

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 21 / 88

  • The Filtering Problem The Linear Filter

    3.2 By completing the square one has

    P∗N = PN + KN(

    HNPNHTN + RN)

    K TN − KNHNPN − PNHTN K

    TN

    = PN +[

    KN − PNHTN(

    HNPNHTN + RN)−1]

    (HNPNHTN + RN

    )[

    K TN −(

    HNPNHTN + RN)−1

    HNPN

    ]

    − PNHTN(

    HNPNHTN + RN)−1

    HNPN

    therefore the best value of KN is given by

    KN = PNHTN(

    HNPNHTN + RN)−1

    .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 22 / 88

  • The Filtering Problem State Space Models

    Definition

    A state space model S is a triple S = {μ0, κ, g} consisting of a probabilitymeasure μ0, a sequence of Markov kernels κ = {κt}t≥1 and a sequence ofbounded potentials functions g = {gt}t≥1.

    For an arbitrary state space model S we define the following operators:the prediction (P) operator Ψt(μ) := κtμ,the update (U) operator Υt(μ) := gt ∗ μ,the prediction-update (PU) operator Φt(μ) := (Υt ◦ Ψt)(μ)the composition of PU operators as Φt|k := Φt ◦ Φt−1 ◦ ∙ ∙ ∙ ◦ Φk+1

    Remark

    Let Φt be the PU operator associated to the state space model Scorresponding to the filtering problem S = {π0, K , g} consisting of theprobability measure π0, a signal transition kernels K = {Kt}t≥1 and thesequence of likelihood functions g = {gt(yt , ∙)}t≥1. From (3) we deduce thatπt = Φt(πt−1). we can compactly represent the evolution of the filter over t − kconsecutive steps, namely πt = Φt|k (πk ). Note that the map Φt depends onthe kernels Kt and the likelihood functions gt alone (and not on π0).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 23 / 88

  • The Filtering Problem State Space Models

    The posterior measure is determined by the state space model S (theconverse is false !). We prove next that the posterior measure dependscontinuously on S. To do this we need to introduce a topology D on the set ofstate space models M. We do so by specifying the topology for each of thethere component parts of S. To be specific:

    We endow the set of probability measures P(Rd ) with the metrisabletopology given by the total variation distance

    Dtv (α, β) := supA

    |α(A) − β(A)|, α, β ∈ P(Rd ),

    where the supremum is taken over all measurable sets.

    The sequence of Markov kernels κn = {κnt }t≥1 converges to κ = {κt}t≥1when

    limn→∞

    Dtv (κnt (∙, x), κt(∙, x)) = 0, for every t ≥ 1 and any x ∈ X , (7)

    and we denote limn→∞ κn = κ.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 24 / 88

  • The Filtering Problem State Space Models

    We impose the topology of bounded convergence on the set of(non-negative) bounded potential functions. More precisely, we say thatthe sequence gn = {gnt }t≥1 is uniformly bounded when there existsG < ∞ such that supn,t≥1 ‖g

    nt ‖∞ < G. Then, a uniformly bounded

    sequence gn converges to g = {gt}t≥1 when

    limn→∞

    gnt (x) = gt(x), for every t ≥ 0 and any x ∈ X , (8)

    and we write limn→∞ gn = g.

    Finally, a sequence of state space models Sn = {πn0 , κn, gn} converges to

    the model S = {π0, κ, g} in the topology D when limn→∞ Dtv (πn0 , π0) = 0,limn→∞ κn = κ and limn→∞ gn = g. We denote limn→∞ Sn = S.

    The topology D has the property that convergence of the sequence of modelsSn, n ≥ 0, to S implies convergence of the marginal probability measures πnt(generated by the models Sn) towards the optimal filter πt generated by modelS. This result is made rigorous by the following theorem:

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 25 / 88

  • The Filtering Problem State Space Models

    Theorem

    Let Sn = {πn0 , κn, gn}, n ≥ 0, and S = {π0, κ, g} be elements of M with

    corresponding PU operators Φnt and Φt , respectively. If limn→∞ Sn = S, then

    limn→∞ Φnt|0(πn0) = Φt|0(π0).

    Proof. Induction. The case t = 0 holds trivially, since limn→∞ Sn = S impliesthat limn→∞ Dtv (πn0 , π0) = 0. Assume that limn→∞ Dtv (βn, β) = 0 for any t ≥ 1,where βn = Φnt−1|0(π

    n0) and β = Φt−1|0(π0). We apply the prediction operator

    Ψnt (α) = κnt α,

    |(f , Ψnt (βn)) − (f , Ψt (β))|

    = |(f , Ψnt (βn)) − (f , Ψnt (β)) + (f , Ψ

    nt (β)) − (f , Ψt (β))|

    ≤ |(κnf , βn − β)| + |((κn − κ)f , β)|

    ≤ ‖f‖∞Dtv (βn, β) + |((κn − κ)f , β)| ,

    (9)

    where the last inequality follows from the definition of TV distance. The firstterm on the right hand side of (9) converges to zero by the inductionhypothesis, while the second term converges to zero by the boundedconvergence theorem.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 26 / 88

  • The Filtering Problem State Space Models

    Next, we write the PU operator Φnt in terms of the P operator Ψnt to obtain

    |(f , Φnt (βn)) − (f , Φt (β))| =

    ∣∣∣∣(fgnt , Ψ

    nt (β

    n))

    (gnt , Ψnt (β

    n))−

    (fgt , Ψt (β))(gt , Ψt (β))

    ∣∣∣∣

    ∣∣∣∣(fgnt , Ψ

    nt (β

    n))

    (gnt , Ψnt (β

    n))−

    (fgnt , Ψnt (β

    n)))

    (gt , Ψt (β))

    ∣∣∣∣

    +

    ∣∣∣∣(fgnt , Ψ

    nt (β

    n))

    (gt , Ψt (β))−

    (fgt , Ψt (β))(gt , Ψt (β))

    ∣∣∣∣

    ≤‖f‖∞ |(gnt , Ψ

    nt (β

    n)) − (gt , Ψt (β))|(gt , Ψt (β))

    +|(fgnt , Ψ

    nt (β

    n)) − (fgt , Ψt (β))|(gt , Ψt (β))

    . (10)

    However, inequality (9) implies that both terms on the right hand side of (10)converge to 0, hence the proof is complete. �

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 27 / 88

  • The Filtering Problem Final Remarks

    Final remarks:

    The framework and the theoretical results of stochastic filtering provide asound base for developing data assimilation methodology

    The posterior distribution of the signal satisfies a recurrence formula thatcannot be, in general, explicitly solved.

    In the linear case the posterior distribution is Gaussian with mean andcovariance matrices that satisfy (6)

    The posterior distribution depends continuous on the initial distribution,the signal transition kernels and the likelihood function.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 28 / 88

  • The Filtering Problem Lecture II

    Lecture II

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 29 / 88

  • The Filtering Problem List of Contents

    Lecture II

    How do we define an approximation

    Particle Filters/Sequential Monte Carlo methods.

    The Standard Particle Filter

    Convergence Result

    General Remarks

    Why is the high-dimensional filtering problem hard ?

    Model Reduction (High 7→Low Res)

    Tempering, Jittering, Nudging

    Final Remarks

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 30 / 88

  • The Filtering Problem How do we define an approximation ?

    The description of a numerical approximation for the solution of the filteringproblem should contain three parts:

    particle approximations Gaussianapproximations(aj (t)︸ ︷︷ ︸weight

    , v1j (t) , . . . , vdj (t)

    ︸ ︷︷ ︸position

    )nj=1 (aj (t)︸ ︷︷ ︸weight

    , v1j (t) , . . . , vdj (t)

    ︸ ︷︷ ︸mean

    , ω11j (t) , . . . , ωddj (t)

    ︸ ︷︷ ︸covariance matrix

    )nj=1

    πt πnt =∑n

    j=1 aj (t) δvj (t) πt πnt =

    ∑nj=1 aj (t) N

    (vj (t) , ωj (t)

    )

    2. The law of evolution of the approximation:

    particle approximations Gaussianapproximations

    πnt

    mutation︷︸︸︷−→model

    π̄nt+δ

    selection︷︸︸︷−→

    {Ys}s∈[t,t+δ]πnt+δ π

    nt

    forecast︷︸︸︷−→model

    π̄nt+δ

    assimilation︷︸︸︷−→

    {Ys}s∈[t,t+δ]πnt+δ

    3. The measure of the approximating error:

    supϕ∈Cb

    E [|πnt (ϕ) − πt(ϕ)|] , π̂t − π̂nt , ‖π

    nt − πt‖TV .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 31 / 88

  • The Filtering Problem Quantized information = particles

    The quantized information is modelled by n stochastic processes

    {pi(t), t > 0} i = 1, ..., n, pi(t) ∈ RN .

    We think of the processes pi as the trajectories of n (generalized)particles.

    Typically N > d , where d is the dimension of the state space.

    πnt = Λnt (pi(t), t > 0 i = 1, ..., n).

    Methodologies:

    classical particle filtersgaussian approximationswaveletsgrid methods

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 32 / 88

  • The Filtering Problem The classical/standard/bootstrap/garden-variety particle filter

    πn = {πn(t), t ≥ 0} the occupation measure of a system of weighted particles

    πn(0) =n∑

    i=1

    1n

    δxni −→ πn(t) =n∑

    i=1

    āni (t)δV ni (t).

    • DC, Particle Filters. A Theoretical Perspective, Sequential Monte CarloMethods in Practice, 2001.• P. Del Moral. Feynman-Kac Formulae: Genealogical and Interacting ParticleSystems with Applications. Springer, 2004.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 33 / 88

  • The Filtering Problem Framework: discrete/continuous time

    1. Initialisation [t = 0].

    For i = 1, ..., N, sample x (i)0 from π0,

    πN0 =1N

    N∑

    i=1

    δx (i)0.

    2. Iteration [t − 1 to t ].Let x (i)t−1, i = 1, . . . , n be the positions of the particles at time t − 1.

    πNt−1 =1N

    N∑

    i=1

    δx (i)t−1.

    Step 1.

    For i = 1, ..., n, sample x̄ (i)t from ft−1(xt |x(i)t−1)dxt .

    pNt =1N

    N∑

    i=1

    δx̄ (i)t.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 34 / 88

  • The Filtering Problem Framework: discrete/continuous time

    Compute the (normalized) weight ā(i)t = gt(x̄(i)t )/(

    ∑nj=1 gt(x̄

    (j)t )).

    π̄Nt =N∑

    i=1

    ā(i)t δx̄ (i)t= gt ? pNt .

    Step 2.

    Replace each particle by ξ(i)t offsprings such that∑n

    i=1 ξ(i)t = n.

    [Sample with replacement n-times from x̄ (i)t , ]Denote the positions of the particles by x (i)t , i = 1, . . . , n.

    πNt =1N

    N∑

    i=1

    δx (i)t.

    Further details in:

    Bain, A., DC, Fundamentals of Stochastic Filtering, Series: StochasticModelling and Applied Probability, Vol. 60, Springer Verlag, 2009.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 35 / 88

  • The Filtering Problem Framework: discrete/continuous time

    Theorem

    πn converges to π. Moreover

    supt∈[0,T ]

    sup{‖ϕ‖∞≤1}

    EY [|πNt (ϕ) − πt(ϕ)|] ≤cT√N

    .

    and√

    N(πN − π) converges to a measure valued process ū = {ūt , t ≥ 0}.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 36 / 88

  • The Filtering Problem Framework: discrete/continuous time

    Notation:

    • Error(π, T , N) = supt∈[0,T ] sup{‖ϕ‖∞≤1} EY [|πNt (ϕ) − πt(ϕ)|]

    • Error(p, T , N) = supt∈[0,T ] sup{‖ϕ‖∞≤1} EY [|pNt (ϕ) − pt(ϕ)|]

    Theorem

    For all T > 0, there exists cT such that

    Error(π, T , N) ≤cT√N

    , Error(p, T , N) ≤cT√N

    if and only if Error(π, 0, N) ≤ c0√N

    and, for all T > 0, there exists cT such that

    supt∈[0,T ]

    sup{‖ϕ‖∞≤1}

    EY [|pNt (ϕ) − πNt−1Kt(ϕ)|] ≤

    cT√N

    (11)

    supt∈[0,T ]

    sup{‖ϕ‖∞≤1}

    EY [|πNt (ϕ) − π̄Nt (ϕ)|] ≤

    cT√N

    . (12)

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 37 / 88

  • The Filtering Problem Framework: discrete/continuous time

    Proof.” ⇒ ”Immediate from the following two inequalities

    ∣∣∣pNt ϕ − π

    Nt−1Ktϕ

    ∣∣∣ ≤∣∣∣pNt ϕ − ptϕ

    ∣∣∣+∣∣∣πt−1(Ktϕ) − πNt−1(Ktϕ)

    ∣∣∣ ,

    ∣∣∣πNt ϕ − π̄

    Nt ϕ∣∣∣ ≤∣∣∣πNt ϕ − πtϕ

    ∣∣∣+∣∣∣πtϕ − π̄

    Nt ϕ∣∣∣

    where we used the fact that pt = πt−1Kt .” ⇐ ”Induction. The case t = 0 is assumed. The induction step is obtained asfollows: Since pt = πt−1Kt by the triangle inequality

    |pNt ϕ − ptϕ| ≤ |pNt ϕ − π

    Nt−1Ktϕ| + |π

    Nt−1Ktϕ − πt−1Ktϕ|.

    Also

    π̄Nt ϕ−πtϕ=pNt (ϕgt)

    pNt gt−

    pt(ϕgt)ptgt

    =−pNt (ϕgt)

    pNt gt × ptgt(pNt gt−ptgt)+

    (pNt (ϕgt)

    ptgt−

    pt(ϕgt)ptgt

    )

    ,

    and as |pNt (ϕgt)| ≤ ‖ϕ‖∞pNt gt ,

    ∣∣∣π̄Nt ϕ − πtϕ

    ∣∣∣ ≤

    ‖ϕ‖∞ptgt

    ∣∣∣pNt gt − ptgt

    ∣∣∣+

    1ptgt

    ∣∣∣pNt (ϕgt) − pt(ϕgt)

    ∣∣∣ .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 38 / 88

  • The Filtering Problem Framework: discrete/continuous time

    Corollary

    The standard particle filter produces an aproximation πNt such that

    Error(π, T , N) ≤cT√N

    .

    Proof. Immediate from the fact that the standard particle gives (11)+(12).�

    A numerical example

    Remarks:

    Particle filters are recursive algorithms: The approximation for πt andYt+1 are the only information used in order to obtain the approximation forπt+1. In other words, the information gained from Y1, ..., Yt is embeddedin the current approximation.

    The generic SMC method involves sampling from the prior distribution ofthe signal and then using a weighted bootstrap technique (or equivalent)with weights defined by the likelihood of the most recent observation data.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 39 / 88

  • The Filtering Problem Framework: discrete/continuous time

    Step 2 can be done by means of sampling with replacement (SIRalgorithm), stratified sampling, Bernoulli sampling,Carpenter-Clifford-Fearnhead-Whitley genetic algorithm, Crisan-LyonsTBBA algorithm. All these methods satisfy the convergence requirement.

    If d is small to moderate, then the standard particle filter can perform verywell in the time parameter n.

    Under certain conditions, the Monte Carlo error of the estimate of thefilter can be uniform with respect to the time parameter.

    The function xk 7→ g(xk , yk ) can convey a lot of information about thehidden state, especially so in high dimensions. If this is the case, usingthe prior transition kernel f (xk−1, xk ) as proposal will be ineffective.

    It is then known that the standard particle filter will typically performpoorly in this context, often requiring that N = O(κd ).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 40 / 88

  • The Filtering Problem Framework: discrete/continuous time

    10−3.5

    10−3

    10−2.5

    10−2

    5 10 15 20 25 30DimensionW

    allclock

    timepertimestep

    (secon

    ds) Algorithm PF STPF

    Figure: Computational cost per time step to achieve a predetermined RMSE versusmodel dimension, for standard particle filter (PF) and STPF.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 41 / 88

  • The Filtering Problem Why is the high-dimensional problem hard ?

    Consider

    Π0 = N (0, 1) (mean 0 and variance matrix 1).Π1 = N (1, 1) (mean 1 and variance matrix 1).Πd = N (d , 1) (mean d and variance matrix 1).d(Π0, Π1)TV = 2P [ |X | ≤ 1/2 ], X ∼ N(0, 1).d(Π0, Πd )TV = 2P [ |X | ≤ d/2 ], X ∼ N(0, 1).as d increases, the two measures get further and further apart, becomingsingular w.r.t. each other.as d increases, it becomes increasingly harder to use standardimportance sampling, to construct a sample from Π3 by using a proposalfrom Π1, weighting it using

    dΠddΠ0

    and (possibly) resample from it.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 42 / 88

  • The Filtering Problem Why is the high-dimensional problem hard ?

    ConsiderΠ0 = N ((0, . . . , 0), Id ) (mean (0, . . . , 0) and covariance matrix Id ).Πd = N ((1, . . . , 1), Id ) (mean (1, . . . , 1) and covariance matrix Id ).d(Π0, Πd )TV = 2P [ |X | ≤ d/2 ], X ∼ N(0, 1).as d increases, the two measures get further and further apart, becomingsingular w.r.t. each other exponentially fast.it becomes increasingly harder to use standard importance sampling, toconstruct a sample from Πd by using a proposal from Π0.‘Moving’ from Π0 to Πd is equivalent to moving from a standard normaldistribution N (0, 1) to a normal distribution N (d , 1) (the total variationdistance between N (0, 1) and N (d , 1) is the same as that between Π1and Π2).

    Add-on techniques:

    • Tempering * • Jittering *• Model Reduction (High 7→Low Res)* • Nudging*• Sequential DA in space • Optimal transport prior 7→posterior• Hybrid models • Hamiltonian Monte Carlo• Informed priors • Localization

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 43 / 88

  • The Filtering Problem Why is the high-dimensional problem hard ?

    Model Reduction (High 7→Low Res)*

    Model reduction is a valuable methodology that can lead to substantialcomputational savings: For the numerical example presented in theselectures, we perform state space order reduction through a coarsening of thegrid used for the numerical algorithm that approximates the dynamical systemfrom 2 × 5132 ≈ 0.5 × 106 to 2 × 652 = 8450.

    Recall that the recursion formula for the conditional distribution of the signal

    pt = πt−1Kt πt = gt ? pt , (13)

    where dπtdpt = C−1t gt , where Ct ,

    ∫Rd gt (yt , xt) pt (dxt).

    Following from Lecture I, πt is a continuous function of (π0, g1, ..., gt , K1, ..., Kt).In other words if

    limn 7→0

    (πn0 , gn1 , ..., g

    nt , K

    n1 , ..., K

    nt ) = (π0, g1, ..., gt , K1, ..., Kt)

    and pntΔ= πnt−1K

    nt π

    nt

    Δ= gnt ? p

    nt , then limε 7→0 π

    εt = πt and limε 7→0 p

    εt = pt (again,

    in a suitably chosen topology).NB. Note that πεt is no longer the solution of a filtering problem, but simply thesolution of the iteration (13) .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 44 / 88

  • The Filtering Problem Tempering

    Tempering

    Framework:

    {Xt}t≥0 Markov chain P (Xt ∈ dxt |Xt−1 = xt−1) = ft(xt |xt−1)dxt ,

    {Xt , Yt}t≥0 P (Yt ∈ dyt |Xt = xt) = gt(yt |xt)dyt

    For i = 1 to d

    ◦ reweight the particle using g1dt and (possibly) resample from it

    ◦ move particles using an MCMC that leaves gkdt ftπ[0,t−1] invariant

    Beskos, DC, Jasra, On the stability of SMC methods in high dimensions, 2014.Kantas, Beskos, Jasra, Sequential Monte Carlo for inverse problems, 2014.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 45 / 88

  • The Filtering Problem Tempering

    Initialisation t=0: For i = 1, . . . , n, sample qi0 from π0.Iteration (ti−1, ti ]: Given the ensemble {Xn (ti−1)}n=1,...,N ,

    1 Evolve Xn (ti−1) using the signal equation to obtain Xn (ti) .2 Given X := {Xn (ti)}n=1,...,N , define normalised tempered weights

    λ̄n,i (φ, X ) :=exp (−φΛn,i)∑m exp (−φΛm,i)

    where the dependence on X means the Λn,i are computed using X .Define effective sample size

    ESSi (φ, X ) :=∥∥λ̄i (φ, X )

    ∥∥−1

    l2 .

    Set φ = 1.3 ... While ESSi (φ, X ) < Nthreshold do:

    (a) Find 1 − φ < φ′ < 1 such that ESSi (φ′ − (1 − φ) , X ) ≈ Nthreshold. Resampleaccording to λ̄n,i (φ′ − (1 − φ) , X ) and apply MCMC (jittering) if required (i.e.when there are duplicated particles), to obtain a new set of particles X (φ′).Set φ = 1 − φ′ and X = X (φ′) .

    (b) If ESSi ≥ Nthreshold then STOP and go to the next filtering step with{(Xn (ti) , λ̄n,i

    )}n=1,...,N

    .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 46 / 88

  • The Filtering Problem Jittering

    Jittering

    Procedure employed to reduce sample degeneracy.The particles are moved using a suitable chosen kernelThe moves are controlled so that the size of the (stochastic) perturbationremains of the same order as the particle filter error (1/

    √N)

    Algorithm (MCMC step after each tempering procedure)

    Given the ensemble {Xn,k (ti)}n=1,...,N corresponding to the k ’th temperingstep with temperature φk , and proposal step size ρ ∈ [0, 1], repeat thefollowing steps.Propose

    X̃n (ti) = G(

    Xn (ti−1) , ρW (ti−1 : ti ; ω) +√

    1 − ρ2Z (ti−1 : ti ; ω))

    where Xn (ti) = G (Xn (ti−1) , W (ti−1 : ti ; ω)) , and W ⊥ Z .Accept X̃n (ti) with probability

    1 ∧λ̄(φk , X̃n (ti)

    )

    λ̄ (φk , Xn (ti))

    where λ (φ, x) = exp (−φΛ(x)) is the unnormalised weight function.Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 47 / 88

  • The Filtering Problem Jittering

    Nudging

    Nudging: reduce the error by ’correcting’ the model solution such that theparticles are kept closer to the true state ⇒ a nudging term N (α) addedto the model

    (ti−1, ti ] : Given the ensemble {Xn(ti−1)}Nn=1 we want to assimilateobservational data Yti in order to obtain a new ensemble {Xn(ti)}

    Nn=1 that

    defines πNti :Obtain observation YtiEvolve Xn(ti−1)

    modified kernel−−−−−−−−−→ X̃n(ti)

    X̃n(ti) = Xn(ti−1) + f (Yti ) + Wti

    Define new weights according to the ratio between the law of X̃n(ti) and thatof Xn(ti)

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 48 / 88

  • The Filtering Problem Final Remarks

    Particle filters/sequential Monte Carlo methods are theoretically justifiedalgorithms for approximating the state of dynamical systems partially(and noisily) observed.

    Particle filters are recursive algorithms: The approximation for πt andYt+1 are the only information used in order to obtain the approximation forπt+1. In other words, the information gained from Y1, ..., Yt is embeddedin the current approximation.

    The standard particle filter is unsuitable for solving high-dimensionalproblem.

    Properly calibrated and modified, particle filters can be used to solve highdimensional problems (see also the work of Peter Jan van Leeuwen,Roland Potthast, Hans Kunsch).

    Important parameters: initial condition, number of particle, number ofobservations, correction times, observation error, etc.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 49 / 88

  • The Filtering Problem .

    Lecture III

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 50 / 88

  • The Filtering Problem List of Contents

    Lecture III

    Damped and forced incompressible 2D Euler modelFramework: Signal/Initial Condition/ObservationsParticle filter initial conditionModel Reduction/Noise Calibration/Uncertainty QuantificationParameters: No of Particles/Resampling Intervals/Observation Times

    2-layer quasi-geostrophic model

    Framework: Signal/Initial Condition/ObservationsNudging

    Final Remarks

    Based on joint work with Colin Cotter, Darryl Holm, Wei Pan, Igor Shevchenko.

    ◦ Numerically Modelling Stochastic Lie Transport in Fluid Dynamics◦ A Particle Filter for Stochastic Advection by Lie Transport (SALT): A case study forthe damped and forced incompressible 2D Euler equation◦ Modelling uncertainty using circulation-preserving stochastic transport noise in a2-layer quasi-geostrophic model◦ Data assimilation for a quasi-geostrophic model with circulation-preservingstochastic transport noise

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 51 / 88

  • The Filtering Problem Hardware comparison

    Fluid dynamics models are used extensively to describe the evolution of theatmosphere and the oceans and play a crucial role in numerical weatherprediction (NWP). Numerical weather prediction requires massive computingcapabilities as it relies on Hi-Res Scientific Computations:

    14,000 trillion calculations per second

    2 petabytes of memory

    460,000 compute cores.

    24 petabytes of storage for saving data

    Dimension of the state space O(109)

    Dimension of the observation space O(107)

    The Cray XC40 supercomputer (source: Met Office website)

    Instead: Use particle filters built lower resolutions stochastic models.

    2x Intel Xeon CPU @ 2.10GHz

    32 logical cores

    2x Nvidia Quadro M4000

    64GB memory

    Dimension of the state space 513 × 513 × 2

    Dimension of the observation space 9 × 9 × 2

    The Beast (source: Wei’s office)

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 52 / 88

  • The Filtering Problem Hardware comparison

    Damped and forcedincompressible 2D Euler equation

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 53 / 88

  • The Filtering Problem A stochastic transport model

    Example: Satellite Tracks Hurricanes Madeline and Lester in the Pacific

    Laboratory Experiment: Fluid Transport Dynamics

    High versus Low Res simulation: Euler equation

    Deterministic versus Stochastic models: Stochastic Euler

    Euler (vorticity form)

    ∂tω + u ∙ ∇ω = Q − rω, div(ω) = 0

    ω vorticity field u velocity field

    Evolution of Lagrangian fluid parcels

    dtxt = ut(xt) ⇒ dtxt = ut(xt)dt +∑

    i ξi(xt) ◦ dWi(t)

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 54 / 88

  • The Filtering Problem A stochastic transport model

    Consider a two dimensional incompressible fluid flow u defined on 2D-torusΩ = [0, Lx ] × [0, Ly ] modelled by the two-dimensional Euler equations withforcing and dampening. Let q = ẑ × curl u denote the vorticity of u, where ẑdenotes the z-axis. For a scalar field g : Ω → R, we write∇⊥ g = (−∂y g, ∂xg)

    T . Let ψ : Ω × [0,∞) → R denote the stream function.

    ∂t q + (u ∙∇) q = Q − rq

    u = ∇⊥ ψ

    Δψ = q.

    Q is the forcing term given by Q = 0.1 sin (8πx)

    r is a positive constant - the large scale dissipation time scale.

    we consider slip flow boundary condition ψ∣∣∂Ω

    = 0.

    evolution of Lagrangian fluid parcels

    dxtdt

    = u(xt , t) .

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 55 / 88

  • The Filtering Problem A stochastic transport model

    Domain is [0, 1]2

    PDE System | SPDE System

    ∂tω + u ∙ ∇ω = Q − rω | dq + ū ∙ ∇qdt +∑

    i

    ξi ∙ ∇q ◦ dW it = (Q − rq) dt

    u = ∇⊥ψ | ū = ∇⊥ψ̃

    Δψ = ω | Δψ̃ = q

    Q = 0.1 sin (8πx), r = 0.01. Boundary Condition ψ|∂Ω = 0 and ψ̃∣∣∣∂Ω

    = 0.

    PDE SPDEGrid Resolution 512x512 64x64Numerical Δt 0.0025 0.01

    Spin-up 40 ett ett: eddy turnover time L/uL ≈ 2.5 time units.Numerical scheme: a mixed continuous and discontinuous Galerkin finiteelement scheme + an optimal third order strong stability preservingRunge-Kutta, [Bernsen et al 2006, Gottlieb 2005].

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 56 / 88

  • The Filtering Problem initial condition

    Initial Condition

    Initial configuration for the vorticity

    ωspin = sin(8πx) sin(8πy) + 0.4 cos(6πx) cos(6πy)

    + 0.3 cos(10πx) cos(4πy) + 0.02 sin(2πy) + 0.02 sin(2πx)(14)

    from which we spin–up the system until an energy equilibrium state seems tohave been reached.This equilibrium state, denoted by ωinitial, is then chosen as the initial condition.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 57 / 88

  • The Filtering Problem initial condition

    Plot of the numerical PDE solution at the initial time tinitial and itscoarse-grained version done via spatial averaging and projection of the finegrid stream-function to the coarse grid.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 58 / 88

  • The Filtering Problem initial condition

    Plot of the numerical PDE solution at the final time t = tinitial + 146 large eddyturnover times (ett). The coarse-graining is done via spatial averaging andprojection of the fine grid streamfunction to the coarse grid.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 59 / 88

  • The Filtering Problem Observation

    Observations:u is observed on a subgrid of the signal grid (9 × 9 points)

    Yt (x) =

    {uSPDEt (x) + αzx , zx ∼ N (0, 1) Experiment 1uPDEt (x) + αzx , zx ∼ N (0, 1) Experiment 2

    α is calibrated to the standard deviation of the true solution over a coarsegrid cell.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 60 / 88

  • The Filtering Problem Particle Filter Initial Condition

    Particle Filter Initial Condition

    A good choice of the initial condition is esential for the successfulimplementation of the filter.

    In practice it is a reflection of the level of uncertainty of the estimate ofinitial position of the dynamical system.

    We use the initial condition is to obtain an ensemble which containparticles that are reasonably ‘close’ to the truth.Choice for the running example

    deformation - physically consistent with the system, casimirs preserved.We take a nominal value ωt0 and deform it using the following ‘modified’Euler equation:

    ∂tω + βi u(τi) ∙ ∇ω = 0 (15)

    where βi ∼ N (0, �), i = 1, . . . , Np are centered Gaussian weights with anapriori variance parameter �, and τi ∼ U (tinitial, t0) , i = 1, . . . , Np are uniformrandom numbers. Thus each u (τi) corresponds to a PDE solution in thetime period [tinitial, t0).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 61 / 88

  • The Filtering Problem Particle Filter Initial Condition

    Alternative choicesq + ζ where ζ is gaussian random field, doable but not physical, only worksfor q because it’s the least smooth of the three fields of interest . The otherfields are spatially smooth. also this breaks the SPDE well-posednesstheorem (function space regularity). Figure (ux,uy)

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 62 / 88

  • The Filtering Problem Particle Filter Initial Condition

    directly perturb ψ, by ψ + ψ̄ where ψ̄ = (I − κΔ)−1 ζ invert ellipticoperator with boundary condition ψ̄ = 0. Figure (ux , uy)

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 63 / 88

  • The Filtering Problem Model Reduction

    Model Reduction (High 7→Low Res)*

    We perform state space order reduction through a coarsening of the grid usedfor the numerical algorithm that approximates the dynamical system from2 × 5132 ≈ 0.5 × 106 to 2 × 652 = 8450. The procedure is theoreticallyjustified due to the continuity of the posterior distribution wrt the state spacemodel (see Lecture 1)

    To account for the ”missing scales” we use a “stochastic parametrization”. Wedefine a Stochastic PDE on the coarser grid:

    ∂t q + (u ∙∇) q +∞∑

    k=1

    (ξk ∙ ∇) q ◦ dBkt = Q − rq

    u = ∇⊥ ψ

    Δψ = q.

    ξk are divergence free given vector fieldsξk are computed from the true solution by using an empirical orthogonalfunctions (EOFs) procedureBkt are scalar independent Brownian motions

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 64 / 88

  • The Filtering Problem Methodology to calibrate the noise

    The reason for this “stochastic parametrization” is grounded in solid physicalconsiderations, see

    D.D. Holm, Variational principles for stochastic fluids, Proc.Roy. Soc. A, 2015.

    dxft = uft (x

    ft )dt

    dxct = uct (x

    ct )dt +

    i

    ξi(xct ) ◦ dWi(t)For each m = 0, 1, . . . , M − 1

    1 Solve dxfij (t)/dt = uft (x

    fij (t)) with initial condition x

    fij (mΔT ) = xij .

    2 Compute uct by low-pass filtering uft along the trajectory.

    3 Compute xcij (t) by solving dxcij (t)/dt = u

    ct (x

    fij (t)) with the same initial

    condition.4 Compute the difference Δxmij = x

    fij ((m + 1)ΔT ) − x

    cij ((m + 1)ΔT ), which

    measures the error between the fine and coarse trajectory.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 65 / 88

  • The Filtering Problem Methodology to calibrate the noise

    Having obtained Δxmij , we would like to extract the basis for the noise. Thisamounts to a Gaussian model of the form

    Δxmij√δt

    = Δ̃xij +N∑

    k=1

    ξkij ΔWkm,

    where ΔW km are i.i.d. Normal random variables with mean zero, variance 1.

    We estimate ξ by using Empirical orthogonal functions (EOFs). EOFscan be thought of as principal components that correspond to the spatialcorrelations of a field (see Hannachi, A. (2004), Hannachi, A., Jollie, I.,and Stephenson, D. (2007)). EOFs are the eigenvectors of thevelocity-velocity spatial covariance tensor.

    We write the data time series Δxmij , m = 0, . . . , M − 1 as a matrix F̃whose entries are two dimensional vectors, and whose rows (row indexm) correspond to serialised Δxmij . Let F := detrend(F̃ ) where the detrendfunction removes the column mean from each entry. We then estimatethe spatial covariance tensor by computing R := 1M−1 F

    T F .

    We take the EOFs to be the eigenvectors of R, ranked in descendingorder according to the eigenvalues.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 66 / 88

  • The Filtering Problem Methodology to calibrate the noise

    Number of BM (EoFs - empirical orthogonal functions)

    decide on a case by case basis

    too many will slow down the algorithm

    On the left: Number of EOFs 90% variance vs 50% (no change).On the right: Normalised spectrum of the Δx covariance operator, showingnumber of BM required to capture 50%, 70% and 90% of total variance

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 67 / 88

  • The Filtering Problem Methodology to calibrate the noise

    The aim of the calibration is to capture the statistical properties of thesub-grid fluctuations, rather than the trajectory of the flow.

    Validation of the stochastic parameterisation in terms of uncertaintyquantification for the SPDE.

    Performance of DA algorithm relies on the correct modelling of theunresolved scales.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 68 / 88

  • The Filtering Problem Methodology to calibrate the noise

    Model reduction UQ pictures for systems 256x256, 128x128 and 64x64

    (a) ux (b) uy

    (a) psi (b) q

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 69 / 88

  • The Filtering Problem Methodology to calibrate the noise

    Ensemble Distance from the “truth”

    Velocity Field d({

    q̂i , i = 1, . . . , Np}

    , ω, t)

    := mini∈{1,...,Np}‖ω(t)−q̂i (t)‖

    L2(D)

    ‖ω(t)‖L2(D)

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 70 / 88

  • The Filtering Problem Parameters

    Parameters

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 71 / 88

  • The Filtering Problem Parameters

    Number of Particles

    decide on a case by case basistoo few will not give a reasonable solutiontoo many will slow down the algorithm

    Picture Number of particles 225 (good) vs 500 (no change), 225 (good) vs 25(less good) 25 seems ok but we want as many as computationally feasible totune the algorithm

    (a) psi 225 vs 500 (b) psi 225 vs 25

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 72 / 88

  • The Filtering Problem SIR fails

    Classical Particle Filter fails !

    Histogram of weights

    Figure: example: loglikelihoods histogram, period 1 ett, 100 particles

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 73 / 88

  • The Filtering Problem Resampling Intervals

    Resampling Intervals

    small resampling intervals lead to an unreasonable increase in thecomputational effort

    large resampling intervals make the algorithm fail

    the ESS can be used as criterion for choosing the resampling time

    adapted resampling time can be used

    ESS evolution in time/observation noise

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 74 / 88

  • The Filtering Problem Results

    DA Solution for DA periods: 1 ETT and 0.2 ETT

    (a) ux (b) uy (c) psi (d) q

    Figure: DA: obs spde, period 0.2 ett

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 75 / 88

  • The Filtering Problem Results

    (a) ux (b) uy (c) psi (d) q

    Figure: DA: obs pde, period 0.2 ett

    (a) ux (b) uy (c) psi (d) q

    Figure: DA: obs pde, period 1 ett

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 76 / 88

  • The Filtering Problem Results

    Number of tempering steps/Average MCMC steps

    3. Add-on steps: Model Reduction, Tempering, Jittering

    πNt−1

    no nudging︷ ︸︸ ︷−→−→pNt

    adaptive tempering+jiterring︷ ︸︸ ︷−→−→−→−→ πNt

    Truth versus Conditional Mean: Euler equation with forcing and damping

    Truth versus Absolute Difference: Euler equation with forcing and damping

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 77 / 88

  • The Filtering Problem Results

    2-layer quasi-geostrophic model

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 78 / 88

  • The Filtering Problem Framework

    Case study: two-layer quasi-geostrophic model for a β-plane channel flow withO(106) degrees of freedom. The model is reduced by following the stochasticvariational approach for geophysical fluid dynamics introduced in Holm[2015]as a framework for deriving stochastic parametrisations for unresolved scales.The computations are done only for O(104) degrees of freedom.

    πNt−1

    nudging︷ ︸︸ ︷−→−→pNt

    adaptive tempering+jiterring︷ ︸︸ ︷−→−→−→−→ πNt

    Example: 2-layer quasi-geostrophic model

    The two-layer deterministic QG equations for the potential vorticity (PV) q:

    ∂q1∂t

    + u1 ∙ ∇q1 = νΔ2ψ1 − β∂ψ1∂x

    ,

    ∂q2∂t

    + u2 ∙ ∇q2 = νΔ2ψ2 − μΔψ2 − β∂ψ2∂x

    ,

    (16)

    where ψ is the stream function, β is the planetary vorticity gradient, μ is thebottom friction parameter, ν is the lateral eddy viscosity, and u = (u, v) is thevelocity vector. The computational domain Ω = [0, Lx ] × [0, Ly ] × [0, H] is ahorizontally periodic flat-bottom channel of depth H = H1 + H2 given by twostacked isopycnal fluid layers of depth H1 and H2.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 79 / 88

  • The Filtering Problem Framework

    Forcing in (16) is introduced via a vertically sheared, baroclinically unstablebackground flow

    ψi → −Ui y + ψi , i = 1, 2, (17)

    where the parameters Ui are background-flow zonal velocities. The PVanomaly and stream function are related through two elliptic equations:

    q1 = Δψ1 + s1(ψ2 − ψ1), (18a)

    q2 = Δψ2 + s2(ψ1 − ψ2), (18b)

    with stratification parameters s1, s2. The system (16)-(18) is augmented bythe integral mass conservation constraint

    ∂t

    ∫∫

    Ω

    (ψ1 − ψ2) dydx = 0, (19)

    by the periodic horizontal boundary conditions,

    ψ∣∣∣Γ2

    = ψ∣∣∣Γ4

    , ψ = (ψ1, ψ2) , and no-slip boundary conditions

    u∣∣∣Γ1

    = u∣∣∣Γ3

    = 0 . set at northern and southern boundaries of the domain.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 80 / 88

  • The Filtering Problem Framework

    The stochastic version of the QG equations (16) is given by:

    dq1 +

    (

    u1 dt +K∑

    k=1

    ξk1 ◦ dWkt

    )

    ∙ ∇q1 =(

    νΔ2ψ1 − β∂ψ1∂x

    )

    dt ,

    dq2 +

    (

    u2 dt +K∑

    k=1

    ξk2 ◦ dWkt

    )

    ∙ ∇q2 =(

    νΔ2ψ2 − μΔψ2 − β∂ψ2∂x

    )

    dt .

    (20)

    The stochastic terms marked in red color is the only difference from thedeterministic QG model (16), all other equations are the same as in thedeterministic case.

    Stochastic solutions on two different signal gridsGs = {129 × 65, 257 × 129} in order to highlight the effect of the modelreduction on the results.Observation process Y velocity observed at two different data gridsGd = {4 × 4, 8 × 4}.The size of the ensemble is taken to be N = 100 and the number ofBrownian motion (independent sources of stochasticity) is taken to beK = 32. This is enough to reasonably quantify the uncertainty of themodel: we showe that the spread of the ensemble will not increasesubstantially by taking more particles and/or sources of noise (BMs).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 81 / 88

  • The Filtering Problem Framework

    The observations data Yt is an M-dimensional process that consists ofnoisy measurements of the velocity field u taken at a point belonging tothe data grid Gd :

    Yt := Psd (Zt) + η,

    where Psd : Gs → Gd is a projection operator from the signal grid Gs to thedata grid Gd , η = N (0, Iσ) is a normally distributed random vector, withmean vector 0 = (0, . . . , 0) and diagonal covariance matrixIσ = diag(σ21 , . . . , σ

    2M).

    Rather than choosing an arbitrary σ = (σ1, . . . , σM) for the standarddeviation of the noise, we use the standard deviation of the velocity fieldcomputed over the coarse grid cell of the signal grid.

    We introduce the likelihood-weight function

    W(X, Y) = exp

    (

    −12

    M∑

    i=1

    ∥∥∥∥

    Psd (Xi) − Yiσi

    ∥∥∥∥

    2

    2

    )

    , (21)

    with M being the number of grid points (weather stations).

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 82 / 88

  • The Filtering Problem Framework

    In order to measure the variability of the weights (21) of particles we usethe effective sample size:

    ESS(w) =

    (N∑

    i=1

    (wi)2

    )−1

    , w := w

    (N∑

    i=1

    wi

    )−1

    , (22)

    which is close to the ensemble size N if the particles have weights thatare close to each other, and decays to one, as the ensemble degenerates(i.e. there are fewer and fewer particles with large weights and the resthave small weights).

    One should resample for the weighted ensemble if the ESS drops belowa given threshold, N∗,

    ESS< N∗.

    We chose N∗ = 80 to be our threshhold.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 83 / 88

  • The Filtering Problem Nudging

    Nudging

    The idea of nudging is to correct the solution of SPDE (20) so as to keep theparticles closer to the true state. To do so, we add a ‘nudging term’ (marked inblue) to SPDE (20),

    dqi(λ) +

    (

    u i(λ) dt +K∑

    k=1

    ξki ◦ dWkt +

    K∑

    k=1

    ξki λk dt

    )

    ∙ ∇qi(λ) = Fi dt , i = 1, 2.

    (23)

    q depends on the parameter λ. The trajectories of the particles will besolutions of this perturbed SPDE (23). To account for the perturbation, theparticles will have new weights according to Girsanov’s theorem, given by

    W(q(λ), Y, λ) = exp

    (

    ([12

    M∑

    i=1

    ∥∥∥

    Psd (qtj+1(λ)) − Ytj+1σi

    ∥∥∥

    2

    2+

    ∫ tj+1

    tj

    (

    λ2kdt2

    − λk dWk

    )]))

    .

    (24)These weights measure the likelihood of the position of the particles given theobservation, and the last term accounts for the change of probabilitydistribution from q to q(λ). We wish to choose λ so as to maximize theselikelihoods.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 84 / 88

  • The Filtering Problem Nudging

    In other words, we look to solve the equivalent minimization problem

    minλk , k∈[1..K ]

    [12

    M∑

    i=1

    ∥∥∥

    Psd (qtj+1(λ)) − Ytj+1σi

    ∥∥∥

    2

    2+

    ∫ tj+1

    tj

    (

    λ2kdt2

    − λk dWk

    )]

    (25)

    together with (23). In general this is a challenging nonlinear optimisationproblem, especially if one allows the λk ’s to vary in time.To simplify the problem, we perturb only the corrector stage of the finaltimestep before tj+1. Then the (discrete version of the) minimizationproblem (25) becomes

    minλk , k∈[1..K ]

    [12

    M∑

    i=1

    ∥∥∥

    Psd (qtj+1(λ)) − Ytj+1σi

    ∥∥∥

    2

    2+

    K∑

    k=1

    (

    λ2kδt2

    − λkΔWk

    )]

    , (26)

    where δt is the time step. Let us re-write

    qtj+1(λ) = A(qtj+1/2) +K∑

    k=1

    Bk (q̃tj+1)(ΔWk + λkδt),

    where qtj+1/2 and q̃tj+1 are computed in the prediction and the extrapolationsteps, respectively.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 85 / 88

  • The Filtering Problem Nudging

    We can then re-write the minimisation problem (26) as

    minλk , k∈[1..K ]

    V(q(λ), Y, λ), (27)

    whereV(q(λ), Y, λ) = Q + Q1(λ) + Q2(λ, ΔW1, ..., ΔWK ),

    This is a quadratic minimization problem with the optimal value λ depending(linearly) on the increments ΔW1, ..., ΔWK . This optimal choice is not allowedas the parameter λ can only be a function of all the approximation q̃tj+1 , qtj+1/2and Ytj+1 (since it needs to be adapted to the forward filtration of the set ofBrownian motions {Wk}). To ensure that this constraint is satisfied, weminimise the conditional expectation of V(q(λ), Y, λ) given the q̃tj+1 , qtj+1/2 andYtj+1 , that is

    minλk , k∈[1..K ]

    E[V(q(λ), Y, λ)|q̃tj+1 , qtj+1/2 , Ytj+1

    ].

    This functional is quadratic in λ, and hence the optimization can be done bysolving a linear system. This nudging methodology remains asymptoticallyconsistent.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 86 / 88

  • The Filtering Problem Nudging

    up1 vp1

    t [days] t [days]

    Figure: Signal grid Gs = 257 × 129.

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 87 / 88

  • The Filtering Problem Final remarks

    Incorporating stochastic transport into the fluid dynamics equation allowsquantification of model uncertainty.

    The aim of the noise calibration is to capture the statistical properties ofthe sub-grid fluctuations, rather than the trajectory of the flow.

    Validation of the stochastic parameterisation in terms of uncertaintyquantification for the SPDE.

    Performance of DA algorithm relies on the correct modelling of theunresolved scales.Data assimilation is performed using Particle Filters

    Particle filters are theoretically justified algorithms for approximating thestate of dynamical systems partially (and noisily) observed.Properly calibrated and modified, particle filters can be used to solve highdimensional data assimilation problems.One can use methodology to assess forecast reliability.

    Additional work is needed to analyse realistic models that incorporate:boundary conditions, gravity, rotation, buoyancy, bathymetry, etc

    Dan Crisan (Imperial College London) Particle filters for Data assimilation 5-7 November 2019 88 / 88

    .Syllabus

    .List of ContentsLecture 1.What is DA ?What is stochastic filtering ?

    The Stochastic filtering ProblemThe filtering problem

    The Filtering ProblemFramework: discrete/continuous timeFrameworkThe signalNotationBayes' recursion formulaThe Linear Filter

    The Linear FilterState Space ModelsFinal RemarksLecture IIList of ContentsHow do we define an approximation ?Quantized information = particlesThe classical/standard/bootstrap/garden-variety particle filter

    The Stochastic filtering ProblemThe filtering problem

    The Filtering ProblemFramework: discrete/continuous timeWhy is the high-dimensional problem hard ?TemperingJitteringFinal Remarks.List of ContentsHardware comparisonA stochastic transport modelinitial conditionObservationParticle Filter Initial ConditionModel Reduction Methodology to calibrate the noiseParametersSIR failsResampling IntervalsResultsResultsFrameworkFrameworkNudgingFinal remarks