variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘n(x b;b ) p(x )= 1 p...
TRANSCRIPT
![Page 1: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/1.jpg)
Variational data assimilationBackground and methods
Lecturer: Ross Bannister, thanks: Amos Lawless
NCEO, Dept. of Meteorology, Univ. of Reading
7�10 March 2018, Univ. of Reading
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 2: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/2.jpg)
Bayes' Theorem
Bayes' Theorem
p(x|y) =p(x)×p(y|x)
p(y)
posterior distribution =prior distribution× likelihood
normalizing constant
Prior distribution: PDF of the state before observations areconsidered (e.g. PDF of model forecast).
Likelihood: PDF of observations given that the state is x.
Posterior: PDF of the state after the observations have beenconsidered.
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 3: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/3.jpg)
The Gaussian assumption
PDFs are often described by Gaussians (normal distributions).
Gaussian PDFs are described by a mean and covariance only.
-10-5
0 5
10 15
20 -10-5
0 5
10 15
20
epsilon1
epsilon2
ε = x−xb
x∼ N(xb,B)
P(x) =1√
(2π)n det(B)×
exp−1
2
(x−xb)T
B−1(x−xb)
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 4: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/4.jpg)
Meaning of x and y
xa analysis; xb background state; δx increment (perturbation)
y observations; ym = H (x) model observations.
H (x) is the observation operator / forward model.
Sometimes x and y are for only one time (3DVar).
x-vectors have n elements; y-vectors have p elements.
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 5: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/5.jpg)
Back to the Gaussian assumption
Prior: mean xb, covariance B
P(x) =1√
(2π)n det(B)exp−1
2
(x−xb)T
B−1(x−xb)
Likelihood: mean H (x), covariance R
P(y|x) = 1√(2π)p det(R)
exp−1
2(y−H (x))TR−1 (y−H (x))
Posterior
p(x|y) =p(x)×p(y|x)
p(y)∝ exp−1
2
[(x−xb)T
B−1(x−xb)
+(y−H (x))TR−1 (y−H (x))]
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 6: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/6.jpg)
Variational DA � the idea
In Var., we seek a solution that maximizes the posteriorprobability p(x|y) (maximum-a-posteriori).
This is the most likely state given the observations (and thebackground), called the analysis, xa.
Maximizing p(x|y) is equivalent to minimizing− lnp(x|y)≡ J(x) (a least-squares problem).
p(x|y) = C exp−1
2
[(x−xb)T
B−1(x−xb)
+(y−H (x))TR−1 (y−H (x))]
J(x) = − lnC +1
2
(x−xb)T
B−1(x−xb)
+1
2(y−H (x))TR−1 (y−H (x))
= constant (ignored) +Jb(x)+Jo(x)
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 7: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/7.jpg)
Four-dimensional Var (4DVar)
Aim
To �nd the `best' estimate of the true state of the system(analysis), consistent with the observations, the background, andthe system dynamics.
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 8: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/8.jpg)
Towards a 4DVar cost function
Consider the observation operator in this case:
H (x) = H
x0...xN
=
H0 (x0)...
HN (xN)
So the Jo is (assume that R is block diagonal):
Jo =1
2(y−H (x))TR−1 (y−H (x)) =
1
2
y0−H0 (x0)...
yN −HN (xN)
T R0 0 0
0. . . 0
0 0 RN
−1 y0−H0 (x0)
...yN −HN (xN)
=
1
2
N
∑i=0
(yi −Hi (xi ))TR−1i (yi −Hi (xi ))
where xi+1 = Mi (xi )
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 9: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/9.jpg)
The 4DVar cost function (`full 4DVar')
Let (a)TA−1 (a)≡ (a)TA−1 (•)
J(x) =1
2
(x0−xb
0
)TB−10 (•)+ 1
2(y−H (x))TR−1 (•)
=1
2
(x0−xb
0
)TB−10 (•)+ 1
2
N
∑i=0
(yi −Hi (xi ))TR−1i (•)
subject to xi+1 = Mi (xi )
xb0 a-priori (background) state at t0.
yi observations at ti .
Hi (xi ) observation operator at ti .
B0 background error covariance matrix at t0.
Ri observation error covariance matrix at ti .
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 10: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/10.jpg)
How to minimize this cost function?
Minimize J(x) iteratively
Use the gradient of J ateach iteration:
xk+10 = xk0 +α∇J(xk0)
The gradient of the costfunction
∇J(x0) =
∂J/∂ (x0)1...
∂J/∂ (x0)n
−∇J points in the direction ofsteepest descent.
Methods: steepest descent(ine�cient), conjugategradient (more e�cient), . . .
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 11: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/11.jpg)
The gradient of the cost function (wrt x(t0))
Either:
1 Di�. J(x0) w.r.t. x0 with xi = Mi−1 (Mi−2 (· · ·M0(x0))).
2 Di�. J(x) = J(x0,x1, . . . ,xN) w.r.t. x0,x1, . . . ,xN subject tothe constraint
xi+1−Mi (xi ) = 0
L(x,λ ) = J(x)+N−1
∑i=0
λTi+1 (xi+1−Mi (xi )) .
Each approach leads to the adjoint method
An e�cient means of computing the gradient.
Uses the linearized/adjoint of Mi and Hi : MTi and HT
i .
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 12: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/12.jpg)
The adjoint method
Equivalent gradient formula:
1
∇J ≡ ∇J(x0) = B−10(x0−xb
0
)−
−N
∑i=0
MT0 . . .M
Ti−1H
Ti R−1i (yi −Hi (xi ))
2
λN+1 = 0
λ i = HTi R−1i (yi −Hi (xi ))+MT
i λ i+1
λ 0 = ∇Jo
∴ ∇J = ∇Jb +∇Jo = B−10(x0−xb
0
)+λ 0
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 13: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/13.jpg)
The adjoint method
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 14: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/14.jpg)
Simpli�cations and complications
The full 4DVar method is expensive and di�cult to solve.
Model Mi is non-linear.
Observation operators, Hi can be non-linear.
Linear H → quadratic cost function � easy(er) to minimize,Jo ∼ 1
2(y −ax)2/σ2o .
Non-linear H → non-quadratic cost function � hard tominimize, Jo ∼ 1
2(y − f (x))2/σ2o .
Later will recognise that models are `wrong' !
Look for simpli�cations: Complications:
Incremental 4DVar (linearized 4DVar) Weak constraint3D-FGAT (imperfect model)3DVar
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 15: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/15.jpg)
Incremental 4DVar 1
definitions: xRi+1(k) = Mi
(xRi(k)
)xi = xR
i(k)+δxi xb0 = xR
0(k)+δxb0
xi+1 = Mi (xi ) δxi+1 ≈Mi(k)δxi
Hi (xi )≈Hi
(xRi(k)
)+Hi(k)δxi
δxi ≈Mi−1(k)Mi−2(k) . . .M0(k)δx0
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 16: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/16.jpg)
Incremental 4DVar 2
J(k)(δx0) =1
2
(δx0−δxb
0
)TB−10
(•)
1
2
N
∑i=0
(yi −Hi (x
Ri(k))−Hi(k)δxi
)TR−1i
(•)
`Inner loop': iterations to �nd δx0 (as adjoint method).
`Outer loop' (k): iterate xR0(k+1) = xR
0(k)+δx0
Inner loop is exactly quadratic (e.g. has a unique minimum).
Inner loop can be simpli�ed (lower res., simpli�ed physics).
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 17: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/17.jpg)
Simpli�cation 1: 3D-FGAT
Three dimensional variational data assimilation with first guess(i.e. xR
i(k)) is computed at the appropriate time.
Simpli�cation is that Mi(k)→ I, i.e.δxi =Mi−1(k) . . .M0(k)δx0→ δx0.
J3DFGAT(k) (δx0) =
1
2
(δx0−δxb
0
)TB−10
(•)
1
2
N
∑i=0
(yi −Hi (x
Ri(k))−Hi(k)δx0
)TR−1i
(•)
↑
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 18: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/18.jpg)
Simpli�cation 2: 3DVar
This has no time dependence within the assimilation window.
Not used (these days �3D-Var� really means 3D-FGAT).
J3DVar(k) (δx0) =
1
2
(δx0−δxb
0
)TB−10
(•)
1
2
N
∑i=0
(yi −Hi (x
R0(k))−Hi(k)δx0
)TR−1i
(•)
↑
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 19: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/19.jpg)
Properties of 4DVar
Observations are treated at the correct time.
Use of dynamics means that more information can be obtainedfrom observations.
Covariance B0 is implicitly evolved,Bi =
(Mi−1(k) . . .M0(k)
)B0
(Mi−1(k) . . .M0(k)
)T.
In practice development of linear and adjoint models iscomplex.
Mi , Hi , Mi , Hi , MTi , and H
Ti are subroutines, and so
`matrices' are usually not in explicit matrix form.
But note
Standard 4DVar assumes the model is perfect.
This can lead to sub-optimalities.
Weak-constraint 4DVar relaxes this assumption.
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 20: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/20.jpg)
Weak constraint 4DVar
Modify evolution equation:
xi+1 = Mi (xi )+η i
where η i ∼ N(0,Qi )
`State formulation' of WC4DVar
Jwc (x0, . . . ,xN) = Jb +Jo +1
2
N−1
∑i=0
(xi+1−Mi (xi ))TQ−1i (•)
`Error formulation' of WC4DVar
Jwc (xo ,η0 . . . ,ηN−1) = Jb +Jo +1
2
N−1
∑i=0
ηTi Q−1i η i
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 21: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/21.jpg)
Implementation of weak constraint 4DVar
Vector to be determined (`control vector') increases from n in4DVar to n+n(N−1) in WC4DVar.
The model error covariance matrices, Qi , need to beestimated. How?
The `state' formulation (determine x0, . . . ,xN) and the `error'formulation (determine x0,η0 . . . ,ηN−1) are mathematicallyequivalent, but can behave di�erently in practice.
There is an incremental form of WC4DVar.
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 22: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/22.jpg)
Summary of 4DVar
The variational method forms the basis of many operationalweather and ocean forecasting systems, including at ECMWF,the Met O�ce, Météo-France, etc.
It allows complicated observation operators to be used (e.g.for assimilation of satellite data).
It has been very successful.
Incremental (quasi-linear) versions are usually implemented.
It requires speci�cation of B0, the background error cov.matrix, and Ri , the observation error cov. matrix.
4DVar requires the development of linear and adjoint models �not a simple task!
Weak constraint formulations require the additionalspeci�cation of Qi .
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 23: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/23.jpg)
Some challenges ahead
Methods assume that error cov. matrices are correctly known.
Representing B0.
Better models of B0.Flow dependency (e.g. Ensemble-Var or hybrid methods).
Representing Ri .
Allowing for observation error covariances.
Representing Qi .
Numerical conditioning of the problem.
Application to more complicated systems (e.g. high-resolutionmodels, coupled atmosphere-ocean DA, chemical DA).
Variational bias correction.
Moist processes, inc. clouds.
E�ective use on massively parallel computer architectures.
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation
![Page 24: Variational data assimilationdarc/training/ecmwf... · epsilon2 e =x x b x ˘N(x b;B ) P(x )= 1 p (2 p)n det (B ) exp 1 2 x x b T B 1 x x b Lecturer: Ross Bannister, thanks: Amos](https://reader034.vdocuments.us/reader034/viewer/2022052100/603a6c59be1f83208f192d5b/html5/thumbnails/24.jpg)
Selected References
Original application of 4DVar: Talagrand O, Courtier P, Variational assimilationof meteorological observations with the adjoint vorticity equation I: Theory, Q.J. R. Meteorol. Soc. 113, 1311�1328 (1987).Excellent tutorial on Var: Schlatter TW, Variational assimilation ofmeteorological observations in the lower atmosphere: A tutorial on how it works,J. Atmos. Sol. Terr. Phys. 62, 1057�1070 (2000).Incremental 4DVar: Courtier P, Thepaut J-N, Hollingsworth A, A strategy foroperational implementation of 4D-Var, using an incremental approach, Q. J. R.Meteorol. Soc. 120, 1367�1387 (1994).High-resolution application of 4DVar: Park SK, Zupanski D, Four-dimensionalvariational data assimilation for mesoscale and storm scale applications,Meteorol. Atmos. Phys. 82, 173�208 (2003).Met O�ce 4DVar: Rawlins F, Ballard SP, Bovis KJ, Clayton AM, Li D,Inverarity GW, Lorenc AC, Payne TJ, The Met O�ce global four-dimensionalvariational data assimilation scheme, Q. J. R. Meteorol. Soc. 133, 347�362(2007).Weak constraint 4DVar: Tremolet Y, Model-error estimation in 4D-Var, Q. J.R. Meteorol. Soc. 133, 1267�1280 (2007).Inner and outer loops: Lawless, Gratton & Nichols, QJRMS, 2005; Gratton,Lawless & Nichols, SIAM J. on Optimization (2007).More detailed survey of variational methods than can be done in this lecture(plus ensemble-variational, hybrid methods): Bannister R.N., A review ofoperational methods of variational and ensemble-variational data assimilation,Q.J.R. Meteor. Soc. 143, 607-633 (2017).
Lecturer: Ross Bannister, thanks: Amos Lawless Variational data assimilation