1 correlation between system and observation errors in data...
TRANSCRIPT
Correlation between system and observation errors in data assimilation1
Tyrus Berry and Timothy Sauer∗2
George Mason University, Fairfax, VA 22030, USA3
∗Corresponding author address: Timothy Sauer, George Mason University, Fairfax, VA 22030,
USA
4
5
E-mail: [email protected]
Generated using v4.3.2 of the AMS LATEX template 1
ABSTRACT
Accurate knowledge of two types of noise, system and observational, is
an important aspect of Bayesian filtering methodology. Traditionally, this
knowledge is reflected in individual covariance matrices for the two noise
contributions, while correlations between the system and observational noises
are ignored. We contend that in practical problems, it is unlikely that sys-
tem and observational errors are uncorrelated, in particular for geophysically-
motivated examples where errors are dominated by model and observation
truncations. Moreover, it is shown that accounting for the cross-correlations
in the filtering algorithm, for example in a correlated ensemble Kalman filter,
can result in significant improvements in filter accuracy for data from typical
dynamical systems. In particular, we discuss the extreme case where the two
types of errors are maximally correlated relative to the individual covariances.
7
8
9
10
11
12
13
14
15
16
17
18
2
1. Introduction19
Consider a discrete-time nonlinear dynamical system with state variable xi ∈ RN and observa-20
tions yi ∈ RM given by21
xi+1 = f (xi,ωi) (1)
yi+1 = h(xi+1,νi+1) (2)
where ωi is called the system or dynamical noise (or the stochastic forcing) and νi+1 is called22
the observation noise. In practice these noise terms are needed to account for model mismatch,23
truncation errors due to differing resolutions, and stochastic terms such as instrument errors. There24
has been considerable recent interest in the implications of these different sources of error, for25
example in Satterfield et al. (2017); Hodyss and Nichols (2015); Van Leeuwen (2015); Janjic et al.26
(2017).27
On the other hand, most filtering algorithms are designed based on a separation of dynamical and28
observation noise. The nonlinear filtering literature typically considers noise ω ∼N (0,Q) and29
ν ∼N (0,R), allowing correlation within type, but tends to dismiss correlation between ω and ν .30
In this article, we argue the importance of modeling correlations between system and observation31
noise. Specifically, we will consider Kalman filters and ensemble Kalman filters (EnKF) that are32
derived based on the assumption that33
ωi
νi+1
∼N (0,C), where C =
Q S
S> R
(3)
and C is assumed to be symmetric and positive semi-definite. The N×M matrix S contains the34
cross-covariances between the variables ωi and νi+1.35
In order for C to be positive semi-definite, both Q and R must be positive semi-definite. The36
classical viewpoint corresponds to simply extracting Q and R and ignoring S. However, it is37
3
reasonable to expect that in many physical systems, the truncations causing the noise in the state38
of the system would also affect the sensor or observation system. The goal of this paper is to39
establish that (1) correlations between system and observation errors are likely to be common40
in applied data assimilation problems due to truncation of infinite dimensional solutions and (2)41
incorporating these correlations can dramatically improve filter results.42
If we consider the true state to be a function of space and time evolving in an infinite-dimensional43
function space, then the truncated true state and the observation are essentially two different finite-44
dimensional projections of this infinite-dimensional space (Dee 1995; Janjic and Cohn 2006; Oke45
and Sakov 2008). In Section 2 we will show that the errors between the exact projections and46
the finite-dimensional approximations are correlated for generic observations. Error correlations47
arising from model truncation have been previously observed (Mitchell and Daley 1997; Hamill48
and Whitaker 2005; Liu and Rabier 2002). In particular, models are often composed of discrete49
dynamics occurring at points of a two or three dimensional grid. Remote observations by satellite50
or radiosonde can be viewed as integrations over a region including several grid points. Here, we51
provide a general framework to explain correlations between these quantities. We also consider52
other source of error such as model mismatch and instrument error, and we show that significant53
correlations persist except in the case where instrument error dominates (since this error will be54
modeled as white noise that is uncorrelated with the state).55
In Section 3, a correlated version of the EnKF is developed that takes the correlations in (3)56
into account, and recovers the Kalman equations for linear systems. In Section 4, the correlated57
unscented Kalman filter (CUKF, an unscented version of the EnKF) is applied to examples from58
Section 2. Using an appropriate S substantially improves output accuracy, while a filter that ignores59
the cross-correlation matrix S can lead to dramatically suboptimal results.60
4
In Section 5 we investigate the effect of correlations in greater detail. First, we show that in the61
linear case when M = N, for any Q and R there exists a “maximal” S for which perfect recovery of62
the state variables is possible. Secondly, we demonstrate examples of perfect recovery in nonlinear63
systems with such maximal S. In these examples, if one ignores a maximal S (setting S = 0 in the64
filter while the true S is maximal), variables that would have been perfectly recovered by using the65
true S will instead be estimated with variance on the order of the entries in R (see Figure 6 for an66
example).67
2. Correlation between system and observation errors68
To understand how correlations arise in applied data assimilation, we must first leave behind69
the idealized scenario described in equations (1) and (2). Following Dee (1995), we describe the70
true evolution and observation processes by replacing the discrete solution xi ∈RN with an infinite-71
dimensional solution x(z, t) which has some regularity (continuity or differentiability) in the spatial72
variable z and temporal variable t. We should note that the following analysis is very similar to73
Satterfield et al. (2017), except that we consider an infinite dimensional solution x instead of a high74
resolution solution (which is denoted xtH in Satterfield et al. (2017)).75
The discrete time solution and the observations can be viewed as projections of this continuous76
solution. The desired finite-dimensional discrete time solution77
xi = Πi(x)
is a projection of the continuous solution. Meanwhile, the finite-dimensional discrete time obser-78
vations79
yoi = Hi(x)+ ε I
i
5
are given by another projection of the solution Hi, plus an instrument noise term ε Ii . Define the80
system error as the local truncation error (LTE) of the discrete solver f , namely81
wi ≡ xi+1− f (xi) = Πi+1(x)− f (Πi(x)),
and define the observation error by82
εoi+1 ≡ yo
i+1− h(xi+1) = Hi+1(x)− h(Πi+1(x))+ ε Ii+1
where Hi is the true observation projection and h is the approximate discrete observation function.83
Letting h be the optimal discrete observation function, we can further decompose the observation84
error in terms of the representation error85
εRi+1 ≡Hi+1(x)−h(Πi+1(x))
and the observation model error86
εH ≡ h(Πi+1(x))− h(Πi+1(x))
so that the total observation error becomes87
εoi+1 = εR
i+1 + εHi+1 + ε I
i+1.
The above definitions are very similar to those found in equations (1-7) in Satterfield et al. (2017)88
except that we have replaced their high resolution observation function HL and the truncation89
smoother Ssc with infinite dimensional projections H and Π respectively.90
For simplicity we assume that wi,vi+1 are both mean zero and define the error variances to be91
Q = limT→∞
1T
T
∑i=1
wiw>i and R = limT→∞
1T
T
∑i=1
εoi (ε
oi )>.
We briefly note that, while it may be reasonable to assume that the representation and observation92
model errors are uncorrelated from the instrument error ε I , their cross-covariance is93
limT→∞
1T
T
∑i=1
εRi (ε
Hi )> = lim
T→∞
1T
T
∑i=1
(Hi(x)−h(Πi(x)))(h(Πi(x))− h(Πi(x)))>
6
which seems unlikely to be zero. However, our main concern here is the cross-covariance between94
system and observation error, which is defined as95
S = limT→∞
1T
T
∑i=1
wi(εoi+1)
>
= limT→∞
1T
T
∑i=1
[(Πi+1(x)− f (Πi(x)))(Hi+1(x)−h(Πi+1(x)))>
+(Πi+1(x)− f (Πi(x)))(h(Πi(x))− h(Πi(x)))>]
(assuming uncorrelated instrument errors). In this general setting it is already puzzling why one96
would assume that S = 0. Indeed, the only time when we would expect the correlation matrix97
S to be insignificant is when the observation errors are strongly dominated by instrument errors98
that are uncorrelated to the system error, which is shown numerically in Fig. 5(c). By averaging99
over the full time series, we are defining global covariance matrices which are fixed in time. More100
generally, one could also consider time varying covariance matrices which are either localized in101
time or in state space. However, this would only change the indices of the averages above and102
in most situations one should expect nonzero S matrices. In the next section we will show ex-103
plicit examples of substantial correlation between system and observation errors in many practical104
situations.105
a. Evaluation and Averaging Projections106
We will assume that the finite dimensional dynamics f and observation function h are consistent,107
meaning that the errors wi and εRi+1 go to zero in the limit of small discretization parameters ∆z in108
space and ∆t in time. As an example, consider the case when the evolution of the full solution x is109
governed by a PDE110
∂∂ t
x = F (x),
7
and consider the projection of the state onto a grid~z = {z j} at time ti, namely111
Πi(x) = x(~z, ti).
If we assume that the solution x has n+ 1 continuous derivatives in space and m+ 1 in time, we112
can use a solver that is order n in space and order m in time to obtain the system error113
wi = x(~z, ti+1)− f (x(~z, ti)) = ai∆tm +bi∆zn +h.o.t.
where for simplicity we assume a uniform spatial grid ∆z in each dimension. The coefficients ai114
and bi depend on the derivatives of x within ∆z and ∆t of (~z, ti).115
Now consider the associated observation operator H . Rather than sampling at an instantaneous116
time, most observation modes have an associated time constant, and an average value of an interval117
[ti− δ , ti + δ ] with some weight function Ψ is returned. Similarly, the observation may involve118
multiple spatial grid points, as in the case of satellite observations involving radiative transfer119
that explicitly integrate over the entire vertical grid. Even for very local observations, the true120
observing system may be located between grid points, thereby involving interpolation between121
gridpoints. Thus we assume the true observation has the form122
Hi+1(x) =∫
|s−ti|<δ
∫
||~w−~z||<εx(~w,s)Ψ(~w,s)d~wds
meaning that a consistent observation function h should be a quadrature rule for approximating123
this integral. Assuming that the discrete observation function has order q ≤ n convergence in124
space and order r ≤ m in time then the observation error is125
εRi+1 = Hi+1(x)−h(x(~z, ti+1)) = ci+1∆tr +di+1∆zq +h.o.t.
where the coefficients depend on the derivatives of x within ∆z and ∆t of (~z, ti+1).126
The situation described above is common in applications, namely, where the discrete solution127
is given by (or equivalent to) evaluation on a grid and the true observation operator is a local128
8
weighted average of the full solution. In this case, we find the cross-covariance of the system and129
observation errors to be (excluding higher order terms)130
S = limT→∞
T
∑i=1
(ai∆tm +bi∆zn)(ci+1∆tr +di+1∆zq)
and in the limit of small ∆z and ∆t, the derivatives in a,b,c,d will all be evaluated at points less131
that (∆z,∆t) apart. Therefore, up to higher order terms, a,b,c,d can be re-written in terms of132
derivatives evaluated at the same point (~z, ti). Notice in particular that when m = r, the coefficients133
a and c are the same up to a scalar, and similarly when n = q the coefficients b and d are linear134
combinations of the same order derivatives. While it is possible for these terms to combine so as135
to exactly cancel when averaged over time, the correlation will often be nonzero.136
As a special case, consider the situation when both the system and observation errors are dom-137
inated by the same single variable (either time or one of the spatial variables). In this case, the138
leading order terms would differ only by a constant, so that up to higher order terms the system139
error and observation error would be multiples of one another. This not only implies that S 6= 0 but140
also, as we will show in Section 5, that the system and observation errors are maximally correlated141
up to higher order corrections, so that S is as large as possible relative to the individual variances.142
For example, in the case of satellite observations of radiative transfer, the true observation inte-143
grates over the entire vertical component of the atmosphere, whereas the integral may be very144
localized in time and the horizontal variables. This would suggest a relatively large error in terms145
of vertical ∆z (depending on the number of vertical grid points and the order of the quadrature rule146
used to estimate the radiative transfer) even if the model was perfectly specified. If the vertical147
model error also dominated the model error then we would expect a high correlation of the model148
and observation errors.149
9
Similar to the above analysis, we can also consider the observation model error to be a difference150
of quadrature rules given by the optimal observation function h and an approximate function h.151
With these assumptions we find152
εHi+1 = h(x(~z, ti+1))− h(x(~z, ti+1)) = ∑
j(α j− α j)x(z j, ti+1)
where α and α are the quadrature weights for h and h respectively. Since both observation func-153
tions are assumed to be local, we again obtain an error in terms of ∆z and ∆t according to the order154
of agreement between h and h. In practice, either the model error or the representation error could155
be the dominant term, but in either case we find nontrivial correlation with the system errors. In156
what follows, we focus on examples where representation error dominates, since this term will be157
the easiest to estimate in practice. The importance of treating correlated errors will hold in either158
case. We now turn to some concrete examples.159
b. Time-averaged observations of an ODE160
First consider the case when the true solution x is discrete in space, meaning that x = x(t) ∈ RN161
is a vector evolving continuously in time. Assume that the true evolution of x is governed by an162
ODE163
x′ = F (x)
and the projection xi = Πi(x) = x(ti) is evaluation at a discrete time grid {ti}. If the discrete164
evolution operator f is Euler’s method, the system error is165
wi = xi+1− f (xi)
= xi+1− (xi +∆tF (xi))
=12(∆t)2x′′(ti)+O((∆t)3)
=12(∆t)2x′′(ti+1)+O((∆t)3) (4)
10
where we have used x′′(ti+1) = x′′(ti)+O(∆t).166
We assume the true observation is an unweighted average over a short interval [ti+1−δ , ti+1+δ ],167
namely168
Hi+1(x) =1
2δ
∫ ti+1+δ
ti+1−δx(s)ds.
Consider the discrete observation function h to be a consistent quadrature rule using the grid points169
falling within the interval [ti+1−δ , ti+1+δ ]. In particular when δ ≤ ∆t we find h(xi+1) = xi+1 and170
the representation error is171
εRi+1 = Hi+1(x)−h(xi+1) =
12δ
∫ ti+1+δ
ti+1−δx(s)ds− xi+1.
Expanding x(s) in a Taylor series centered at ti+1 and cancelling odd terms we find172
εRi+1 =
12δ
∫ ti+1+δ
ti+1−δxi+1 +
(s− ti+1)2
2x′′(ti+1)
+O((s− ti+1)4)ds− xi+1
=δ 2
6x′′(ti+1)+O(δ 4). (5)
Comparing equations (4) and (5) shows that up to leading order, εRi+1 and wi are directly propor-173
tional, meaning that if the leading order terms are nonzero, they are correlated.174
If a backward Euler method were used instead of a forward Euler method, the leading term of175
the system error would be −∆t2
2 x′′(ti+1) which is negatively correlated with vi = (δ 2/6)x′′(ti+1).176
Moreover, if the observations arose from an asymmetric average (eg. over [ti+1−δ , ti+1]) then the177
observation error would be in terms of the first derivative of x instead of the second derivative,178
however these different derivatives are still likely to be correlated since179
x′′ =ddt
x′ =ddt
F (x) = DF (x)x′.
If higher order methods were used, then the errors would involve higher order derivatives, which180
are still highly likely to be correlated.181
11
c. Estimation of the full covariance matrix182
The correlations described above will be illustrated in two simple examples. To show the effects183
most clearly, we assume a perfect model. We begin with a simple ODE solver.184
Consider using forward Euler on the Lorenz-63 system Lorenz (1963)185
x1 = σ(x2− x1)
x2 = x1(ρ− x3)− x2 (6)
x3 = x1x2−βx3
where σ = 10, ρ = 28, β = 8/3. To demonstrate the correlation derived above, we first used a186
higher-order integrator (Runge-Kutta 4th order with time step 0.005) to produce a finely sampled187
ground truth signal. To define the truncated model, we used a forward Euler method with ∆t =188
0.1. We used a 21-point composite trapezoid rule to approximate the integrated observation with189
δ = 0.05. In Fig. 1(a,b) we show the correlation between the system error, in this case the local190
truncation error (LTE) of the Euler solver, and the observation error, as a function of time.191
In Fig. 1(c) we show the estimated covariance matrix C, which reveals the strong correlations192
(S 6= 0) between the system and observation errors. The covariance matrix C is estimated by193
concatenating the system and observation errors at each step into a 6-dimensional vector, and then194
computing the empirical covariance matrix of these vectors (averaged over T = 12000 discrete195
time steps). The estimated C matrix will be used in Section 4 in a nonlinear filter, and this method196
can be used to estimate the C matrix for general problems as long as one can afford a long offline197
run using a very fine discretization. Fig. 1(d-f) shows the same phenomenon for a more accurate198
solver, Runge-Kutta 4th-order with time step ∆t = 0.05. Although the system errors are much199
smaller, the correlation with observation errors are still evident.200
12
The difference between the positive correlations in Fig. 1(b,c) and negative correlations in201
Fig. 1(e,f) will have a noticeable effect in filter accuracy, as shown in Section 2a. In Section202
5, we will establish a theory explaining this disparity in the linear case.203
As a second example, consider spatiotemporal dynamics x(z, t) given by the Kuramoto-204
Sivashinsky PDE (Kuramoto and Tsuzuki 1976; Sivashinsky 1977)205
xt =−xzzzz− xzz− xxz (7)
defined on a periodic domain with length L = 100. For simplicity, we will use an explicit method206
applying RK4 in time and second order finite difference formulas for the spatial derivatives. While207
an implicit method would be stable for much larger values of ∆t, we will later see that the filter208
will be able to stably recover the signal from noisy observations even for large ∆t (the filter uses209
the observations to stabilize what would otherwise be an unstable numerical scheme). In order to210
obtain a high resolution ‘ground truth’ signal we use a grid with 512 equally spaced spatial grid211
steps and a time step of 10−4.212
To simulate a PDE integrator in practice, we truncate the model, applying the same RK4 solver213
with a reduced number of grid steps and a larger ∆t. Let P ≤ 512 be the number of spatial grid214
steps on [0,L] and let ∆z be the spatial step size, so that P∆z= L. We define an observation function215
which integrates in space as216
Hi(x) j =1
2δ
∫ z j+δ
z j−δx(w, ti)dw. (8)
where δ defines the spatial region over which the observations are averaged. So for example,217
when δ = L/128, we will first estimate the true observation Hi using a composite trapezoid rule218
with 2δ∆z + 1 = 9 grid points from the full 512 grid point solution. If we consider a truncated219
model with P = 128 grid points, then the observation function will have to be approximated using220
only 2δ∆z + 1 = 3 grid points. If we consider a truncated model with P = 64 grid points, we will221
13
approximate the observation functions using only a single grid point, in other words our coarse222
model for the observation function will be direct observation at each gridpoint. This is because the223
integration range δ has become smaller than our truncated ∆z.224
In Fig. 2 we compare the full resolution and truncated solutions for 64 grid points and ∆t =225
0.1. The observation errors (top right) are tightly correlated to the system errors, consisting of226
truncation errors from the 1-step integrator (bottom right). Both errors are also correlated with the227
underlying truth solution.228
It is helpful to view the empirical full covariance matrix C of the system plus observation er-229
rors that can be estimated from the data in Fig. 2. In Fig. 3(left) we show the matrix C where230
the sub-matrices Q,S,S>, and R have been spatially averaged (using the symmetry of (7) on the231
periodic domain) which reveals the strong correlation between the system and observation errors.232
In Fig. 3(right), we plot the sorted eigenvalues of the empirically estimated C matrix (black, solid233
curve) which result purely from the correlation of the truncation error in the model and the repre-234
sentation error in the observation.235
Next, we consider the case of a large observation model error and show that the observed corre-236
lations are still present in this case. While leaving the truncated observation function unchanged,237
we changed the true observation function to compute a weighted spatial average of the four nearest238
grid points to each observed point. Explicitly, prior to applying the composite trapezoid rule, we239
first average each location with its 4 nearest neighbors with weights240
x(z j) =x(z j)+ x(z j+1)/2+ x(z j−1)/2+ x(z j+2)/4+ x(z j−2)/4
2.5
which maintains the local structure but also implies that the quadrature rule used for the truncated241
state is no longer consistent with this new observation. In Fig. 3(right), we plot the eigenvalues242
of the empirically estimated C matrix (red, dashed curve) for this new observation which contains243
14
both observation model error and representation error. While the correlation is slightly further244
than maximal, the same strong decay of eigenvalues and almost singular behavior is present as in245
the case of representation error alone.246
In Fig. 3 we should emphasize the presence of many small eigenvalues, indicating that the C247
matrix is close to singular. To show that these small eigenvalues come from the special structure248
of the correlated errors, we artificially increased the diagonal of the S sub-matrices by 50%, and the249
resulting eigenvalues are shown as the blue dashed curve in Fig. 3. In other words, by increasing250
the correlation beyond the true correlations we destroy the special ‘almost rank deficient’ nature251
of this type of correlated error. In Section 5, we focus on this phenomenon, which indicates that252
the system and observation errors are very close to being maximally correlated. We will show that253
the strong correlation between the system and observation errors has significant consequences for254
the ability to estimate the true state from the observations.255
3. Filtering in the presence of correlations256
In this section we review versions of the Kalman filter for linear and nonlinear dynamics which257
include full correlations of system and observation errors. We begin with the linear formulas, and258
then discuss the unscented version of the ensemble Kalman filter for nonlinear models.259
a. The Kalman filter for correlated system and observation noise260
We begin by reviewing the Kalman update equations for a linear system with correlated noise261
(see for example Simon (2006)). Assume model and observation equations262
xi = f (xi−1)+Γωi−1 (9)
yi = h(xi)+ Jνi (10)
15
where F = f and H = h represent the systems dynamics and linear observable, respectively and Γ263
and J are fixed matrices. Assume (3) represents the noise covariances.264
Given the posterior estimate of the state xai−1 and covariance Pa
i−1 at step i−1, the Kalman update265
for a linear system is Simon (2006); Belanger (1974)266
xbi = Fxa
i−1
ybi = Hxb
i
Pbi = FPa
i−1F>+ΓQΓ> (11)
Ki = (Pbi H>+ΓSJ>)(HPb
i H>+HΓSJ>+ JS>Γ>H>
+ JRJ>)−1 (12)
xai = xb
i +Ki(yi− ybi )
Pai = Pb
i −Ki(HPbi + JS>Γ>) (13)
where xbi represents the forecast of the state given only the observations up to time i− 1 and Pb
i267
represents the covariance of the forecast. Similarly, ybi represents the forecast of the i-th observa-268
tion given only the observations up to time i−1, and the difference between the observed variables269
and the forecast mapped into observation space,270
εi ≡ yi− ybi
is called the innovation. These innovations are often used to estimate the system and observation271
error as in Belanger (1974); Mehra (1970, 1972); Berry and Sauer (2013). The Kalman gain matrix272
Ki optimally combines the forecast xbi with the innovation to form the posterior estimate xa
i which273
is the maximal likelihood and minimum variance estimator of the true state xi. The filter also274
produces the covariance matrix Pai of the estimator for use in the next filter step.275
16
b. The correlated unscented Kalman filter (CUKF)276
We now generalize the correlated system and observation noise filtering approach to nonlinear277
systems and we will show that for linear systems we recover exactly the equations above.278
In order to apply the unscented Kalman filter we need to generate an unscented ensemble with279
the correct correlations. Since the noise realization is independent of the current state, we consider280
the concatenated state and noise vector281
xi−1
ωi−1
νi
∼ N
(xa
i−1, Pai−1)
≡ N
xai−1
0
0
,
Pai−1 0 0
0 Q S
0 S> R
.
Notice that the concatenated state is 2N+M dimensional, and the joint covariance matrix is (2N+282
M)×(2N+M). We then form the unscented ensemble which is represented in a (2N+M)×(4N+283
2M+1) matrix284
Xai−1
Wi−1
Vi
≡(
xai−1, xa
i−1 +√
αPai−1, xa
i−1−√
αPai−1
)
where Xai−1 contains the first N rows of the ensemble, Wi−1 contains the next N rows, and Vi−1 the285
final M rows. We also define the associated ensemble weights286
wi =
1− Nα i = 1
12α i 6= 1
for i = 2, ...,2N +1. The scalar α defines the scaling of the ensemble which is often chosen to be287
α = N or α = N± 1 although Julier and Uhlmann (2004) suggests α = 3 (in the limit as α → 0288
17
the UKF approaches the extended Kalman filter (EKF)). Notice that if the matrix C is constant, the289
square root of C can be computed offline and then√
Pai−1 can be formed at each step as the block290
diagonal matrix with blocks√
Pai−1 and
√C.291
Now that we have generated an unscented ensemble with the correct correlations we can pass292
this ensemble through the nonlinear transformations defining293
Xbi = f (Xa
i−1,Wi−1)
Y bi = h(Xb
i ,Vi)
where the nonlinear functions f and h are applied to each column of the ensemble matrices to form294
the forecast ensemble matrices Xbi and Y b
i which are N× (4N +2M+1) and M× (4N +2M+1)295
respectively. Now we can compute the forecast statistics296
xbi = Xb
i ~w
ybi = Y b
i ~w
Pxi =
2N+1
∑j=2
w j
((Xb
i )·, j− xbi
)((Xb
i )·, j− xbi
)>
Pyi =
2N+1
∑j=2
w j
((Y b
i )·, j− ybi
)((Y b
i )·, j− ybi
)>
Pxyi =
2N+1
∑j=2
w j
((Xb
i )·, j− xbi
)((Y b
i )·, j− ybi
)>
where (Xbi )·, j is the j-th column of Xb
i (the j-th ensemble member).297
We can now define the unscented version of the Kalman update for correlated noise,298
Ki = Pxyi (Py
i )−1
xai = xb
i +Ki(yi− ybi )
Pai = Pb
i −Ki(Pxyi )> (14)
18
Finally, as an implementation detail, the last equation should be computed as299
Pai = Pb
i −KiPyi K>i
in order to maintain numerical symmetry.300
In Appendix A1 we show the equivalence of the CUKF and KF for linear problems with cor-301
related noise, which shows that the CUKF is a natural generalization to nonlinear problems with302
correlated errors. We note that the generalization of the CUKF approach to an ensemble square303
root Kalman filter (EnSQKF) is a straightforward extension of the same Kalman update formulas.304
Integrating correlated noise into other Kalman filters such as the ensemble Kalman filter (EnKF)305
and ensemble transform Kalman filter (ETKF) can also be achieved using the Kalman update for306
correlated noise. For large problems the covariance matrix would need to be localized in order307
to be a practical method, for example the localized ensemble transform Kalman filter (LETKF)308
can be adapted to use the unscented ensembles used here Berry and Sauer (2013). A significant309
remaining task is generalizing the ensemble adjustment Kalman filter EAKF for additive system310
noise and correlated system and observation noise. Serial filters such as the EAKF cannot cur-311
rently be applied even for additive system noise which is not correlated to the observation noise,312
and instead these filters typically use inflation to try to account for system error. Generalizing313
the serial filtering approach to allow these more general types of inflation is a a significant and314
important task and is beyond the scope of this article.315
4. Filtering systems with truncation errors316
In this section we apply the CUKF to truncated observations of the Lorenz-63 and Kuramoto-317
Sivashinsky systems as described in Section 2. The dynamics and observations considered in this318
section have no added noise, so that the system errors arise only from truncation of the numerical319
19
solvers, and the observation errors arises only from local integration. The CUKF will use the320
empirically estimated C matrices described in Section 2, and we compare to the filter results with321
the covariance matrix C modified by setting the S block equal to the zero matrix, which we denote322
as UKF.323
a. Example: Lorenz equations324
First we consider the Lorenz-63 system (6) with the observation described in section 2b. Using325
the same data generated in that example, we applied the CUKF and UKF. The estimates produced326
by these filters are shown in Fig. 4(a) (for the same time interval shown in Fig. 1). In Fig. 4(b)327
we show the errors between each filter’s estimates and the truth, compared to the observation328
errors over the same time interval. The CUKF, which uses the full C matrix, obtains significantly329
superior estimates of the true state. Averaged over 6000 filter steps (after removing the initial filter330
transient) the root mean squared error (RMSE) of standard UKF estimates is 0.29 whereas that of331
the CUKF estimates is 0.16. Compared to the RMSE of the raw observations which is 0.35 the332
UKF reduced the error by 17% while the CUKF reduced the error by 54%.333
We then repeated this experiment using the RK4 integrator instead of forward Euler with the334
same truncated time step of ∆t = 0.05 and the results are shown in Fig. 4(c-d). The RMSE of335
the UKF estimates with this integrator is 0.18 and the RMSE of the CUKF estimates is 0.16.336
Recall that the local truncation errors of RK4 were negatively correlated with the observation337
errors, resulting in relatively small differences between the UKF and CUKF for this example. The338
difference between positively and negatively correlated errors will be studied below in Section339
5. Notice that the CUKF with forward Euler obtains better estimates the the UKF using the far340
superior RK4 integrator. This shows that by using the correlations we can obtain better results341
with a much faster integrator. It also emphasizes the importance of the sign of the correlations,342
20
so that if possible one should select an integrator which yields errors that are positively correlated343
with observation errors (in the global average), possibly even if this requires using a lower order344
method.345
b. Example: Kuramoto-Sivashinsky346
Next we consider filtering the observations of the Kuramoto-Sivashinsky model (7) introduced in347
section 2c. Using a ground truth integrated with 512 spatial grid points and ∆t = 10−4 we consider348
truncated models with 64,128, and 256 grid points and ∆t = 0.05 (results were similar for ∆t = 0.1349
and ∆t = 0.025). In Fig. 5(a,b) we compare the RMSE of the UKF estimates, CUKF estimates,350
and the observations for two different spatial integration widths δ = L/512 and δ = L/128. When351
δ = L/512 the integral (8) defining the true observation is estimated using the composite trapezoid352
rule on 3 grid points of the full 512 grid point solution and the RMSE of the observation errors is353
0.01 which is 0.3% of the signal variance. In Fig. 5(a) we see that for 64 grid points the UKF does354
not reduce the error much relative to the observation error, whereas the CUKF obtains a much355
better estimate. In fact, the CUKF error with 64 grid points is comparable to the UKF error with356
256 grid points. When δ = L/128 the integral (8) is estimated using the composite trapezoid rule357
on 9 grid points of the full 512 grid point solution and the RMSE of the observation errors is 0.11358
which is 3.5% of the signal variance. As shown in Fig. 5(b), the CUKF still outperforms the UKF,359
however the difference at 64 grid points is less significant since δ < ∆z meaning that each integral360
is completely contained between grid points.361
The results of both UKF are robust for large ∆t until the numerical solver becomes extremely362
unstable, (which occurred for ∆t = 0.1 with 256 grid points, since more grid points generally363
require smaller ∆t to stabilize the solver). However, we should note that the numerical solver364
is unstable even for 64 grid points with ∆t = 0.1 and the filter is stabilizing the solver using the365
21
observations. We also examined the robustness of these results in the presence of additive Gaussian366
observation noise with covariance R in Fig. 5(c). Notice that for small R the random noise is small367
and the errors from truncation dominate, meaning the correlations in S are significant. As the368
uncorrelated noise R is increased it eventually dominates the correlated part of the observation369
errors, so that UKF and CUKF have similar performance.370
Finally, in Fig. 5(d) we show the effect of inflation in the UKF by adding a constant multiple371
of the identify to either Q or R, and in each case the best performance is found when using no372
inflation. We also tried inflating the filter background covariance matrix by multiplying by a373
constant greater than one, and this also had very little effect as shown in Fig. 5(d). These results374
indicate that inflation cannot account for the correlated error. Since the Q and R used in the375
UKF were determined empirically to be optimal in this example, the only way to improve the376
performance is to account for correlations using the CUKF.377
5. Maximally correlated random variables and perfect recoverability378
In the previous section, the importance of using the full correlation matrix C was demonstrated,379
for system and observation errors that arise naturally from truncation and averaging that is common380
in geophysical modeling and filtering. In this section, we investigate the effects of cross-correlation381
in a more systematic way. In particular, we identify the extreme case of maximally correlated382
random variables.383
a. Maximum correlation384
We begin by defining maximally correlated random variables.385
Definition 5.1 (Maximally correlated random variables) Let X ∈ RN and Y ∈ RM be random386
variables with covariances Q = E[(X −E[X ])(X −E[X ])>] and R = E[(Y −E[Y ])(Y −E[Y ])>]387
22
respectively, and let S = E[(X −E[X ])(Y −E[Y ])>] be the cross-covariance. We say that X ,Y are388
maximally correlated if the Schur complement of R in C, namely Q−SR−1S>, has minimal trace389
among all N×M matrices S. In other words trace(Q−SR−1S>) = minC{trace(Q−CR−1C>)}.390
While it is not immediately obvious from the definition, Lemma A2.1 in Appendix A2 shows391
that the roles of X and Y are symmetric, so that S also minimizes trace(R−S>Q−1S). The idea of392
maximally correlated random variables is that by choosing an appropriate S the (N+M)×(N+M)393
matrix C becomes rank deficient with rank N. Notice that we can make a linear change of the X394
variables395
X
Y
=
IN×N −SR−1
0 IM×M
X
Y
so that Y = Y is unchanged but the covariance matrix of X ,Y is396
C ≡ E
X
Y
X
Y
>
=
Q−SR−1S> 0
0 R
(15)
and the new state variables X have covariance matrix Q− SR−1S> with minimal trace. In other397
words, the variables X have minimal variance among all possible choices of S. According to the398
rank additivity formula of Guttman (1946), the rank of C is equal to the sum of the rank of R399
and the rank of its Schur complement Q− SR−1S>, meaning that rank(C) = rank(C). Thus, by400
reducing the rank of the Schur complement we are actually choosing S which minimizes the rank401
of C. Intuitively speaking, this choice of S minimizes the dimensionality of the joint noise process.402
A simple example of maximal correlation is to consider the 2× 2 case where Q = q, R = r,403
and S = s are scalars. By setting s = ±√q√
r, we find the Schur complement to be q− sr−1s =404
0. Moreover, with this choice of s the eigenvalues of C are {0,q+ r}, so that rank(W ) = 1 is405
23
minimal over all possible choices of s. In general, when N = M we can set S =√
Q√
R>
where406
√Q√
Q> = Q and√
R√
R>= R are matrix square roots (recall that matrix square roots are unique407
up to a choice of orthogonal matrix) and we find the Schur complement to be Q−SR−1S>= 0N×N .408
When N 6= M the formula for S is similar and is given in a lemma which is stated and proved in409
Appendix A2.410
It follows from Lemma A2.1 that given random variables X ∈ RN and Y ∈ RM with covariance411
matrices Q,R respectively, there always exists an N×M matrix S such that the total covariance412
matrix C in (3) makes X and Y maximally correlated, and rank(W ) = N. This finding is striking,413
in the sense that if X and Y represent the system and observation errors of a dynamical system,414
respectively, and if they are maximally correlated, then the underlying noise/error process is actu-415
ally only N dimensional, despite appearing N +M dimensional. Since the observation errors are416
linear combinations of the system errors, up to an orthogonal transformation we can think of the417
process as effectively having no observation noise. We will make this rigorous below by showing418
that in the case of maximal correlations, the observation errors can be completely eliminated by419
filtering and the true state can be perfectly recovered.420
Now consider the case when Q and R are the system and observation covariances, respectively.421
From the previous lemma we can see that the easiest way to obtain maximally correlated processes422
is when the observation errors are linear combinations of the system errors (since this implies that423
C has rank N). So, returning to the discussion in Section 2a, we can now see that when the424
leading order terms in the system and observation errors only differ by a constant multiple they425
will be maximally correlated up to higher order terms. More generally, whenever the N ×M426
matrix C has rank N, the system and observation errors are maximally correlated. In particular, the427
24
small eigenvalues in Fig. 3 indicate that the system and observation errors are close to maximally428
correlated.429
b. Perfect recoverability in maximally correlated linear systems430
Consider the linear system of the form (1,2) where431
xi+1 = f (xi,ωi) = Fxi +Γωi
yi+1 = h(xi+1,νi+1) = Hxi+1 + Jνi+1,
and assume the noise is generated by432
ωi
νi+1
∼N (0,C), where C =
Q S
S> R
as in (3). In this section we will show that when the covariance matrices Q and R are maximally433
correlated, meaning that S is chosen as in Lemma A2.1, the state variables become perfectly434
recoverable, meaning that the limiting variance of the Kalman filter estimates of those variables is435
zero. Of course, in real applications we do not get to choose S. Our purpose here is to demonstrate436
the maximal effect that S can have on the ability to estimate random variables. As a consequence,437
if the true S were maximal and one instead used a suboptimal filter with S = 0, the relative loss of438
accuracy would be ‘infinite’ (since perfect reconstruction was possible with the true S). Although439
the results in this section only apply to linear filtering problems, in section 5c we will show similar440
empirical results for nonlinear filtering problems.441
We show the effect that the maximal correlation S has on the stationary posterior covariance P of442
a Kalman filter. Without loss of generality, in this section we will assume Γ = IN×N and J = IM×M443
since we may replace C by C = BCB> where B is block diagonal with blocks Γ and J. Substituting444
equations (11) and (12) into equation (13) and setting Pai = Pa
i−1 = P, we find the discrete-time445
25
algebraic Riccati equation (DARE)446
P = FPF>+Q (16)
− (FPF>H>+QH>+S)(HFPF>H>+HQH>
+ HS+S>H>+R)−1(HFPF>+HQ+S>).
If (16) has a solution that is stabilizing, meaning that all the eigenvalues of (I−KH)F are inside447
the unit circle (where K = K∞ is defined by (12) using the solution P), then this solution is unique448
and is the limiting covariance matrix of the Kalman filter as shown in Ran and Vreugdenhil (1988).449
We can now state the following result, the proof can be found in Appendix A3.450
Theorem 5.2 Assume that all the eigenvalues of the matrix451
(I−HS(HS+R)−1)F
lie inside the unit circle. Then the limiting covariance matrix P of a Kalman filtering problem with452
maximally correlated noise processes is zero when M ≥ N. In other words, all state variables are453
perfectly recoverable. When M < N, if the general stability condition on (I−KH)F is met, the454
limiting covariance matrix P is zero when projected onto the top M eigenvectors of Q.455
Notice that the Kalman filter has an asymmetry between Q and R, which is not present in the456
definition of maximal correlation, due to their differing roles in the dynamics. The consequence457
of this asymmetry is seen in the stability condition. For simplicity, consider the case N = M = 1,458
when HS is positive the stability condition is met for all |F |< 1 , and even for some |F |> 1 since459
|(1−HS(HS+R)−1)|< 1. Conversely, when HS is negative, we find |(1−HS(HS+R)−1)|> 1460
and stability of F is no longer sufficient.461
To demonstrate this result, we applied the numerical DARE solver implemented by MATLAB462
to a linear system with F = λ IN×N where λ will be varied to demonstrate the effect of stability463
26
and H = IN×N and Q = IN×N , R = 4IN×N . In order to show how the filter estimates improve as S464
approaches the maximal choice, we let S = (2−δ j)IN×N for δ j = 2− j and j = 0,1, ...,18. In this465
case, the correlation in S is positive and we find that the stabilization criterion is466
eig((I−HS(HS+R)−1)F) = (1−2/(2+4))λ < 1
if and only if λ < 1.5. In Fig. 6(a) we plot the mean of the diagonal elements of the numerical467
solution P to (16) against the trace of the Schur complement trace(Q− SR−1S>) = 4Nδ j−Nδ 2j468
for all values of j. The different curves correspond to different values of λ chosen near 1.5. Notice469
that for λ < 1.5, as the noise approached maximally correlated, the filter estimates approach the470
true state up to the limits of numerical precision. When λ > 1.5, P = 0 is still a solution of471
the DARE but is no longer stabilizing and so the filter converges to a covariance matrix that has472
variances greater than zero.473
To show the effect of negative correlation, we next consider the case S = −(2− δ j)IN×N for474
δ j = 2− j and j = 0,1, ...,18. In this case, the correlation in S is negative and we find that the475
stabilization criterion is476
eig((I−HS(HS+R)−1)F) = (1+2/(−2+4))λ < 1
if and only if λ < 0.5. Notice that in this case the dynamics are required to be stable (λ < 1) in477
order to stabilize the P = 0 solution, whereas with positive correlations the dynamics could be478
unstable (λ > 1). In Fig. 6(b) we plot the mean of the diagonal elements of the numerical solution479
P to (16) against the trace of the Schur complement trace(Q− SR−1S>) = 4Nδ j −Nδ 2j for all480
values of j. The different curves correspond to different values of λ chosen near 0.5. Notice that481
for λ < 0.5, as the noise approached maximally correlated, the filter estimates approach the true482
state up to the limits of numerical precision. When λ > 0.5, P = 0 is still a solution of the DARE483
27
but is not longer stabilizing and so the filter converges to a covariance matrix with variances greater484
than zero.485
Finally, we note that a standard form for the DARE used in numerical solvers, such as MAT-486
LAB, is487
0 = A>PA−E>PE
− (A>PB+ S)(B>PB+ R)−1(B>PA+ S>)+ Q
and (16) can be put in this form by setting E = I, A = F>, B = F>H>, Q = Q, S = QH>+S, and488
R = HQH>+HS+S>H>+R.489
c. Examples of UKF and perfect recovery in nonlinear systems490
In this section we will apply the UKF to synthetic data sets generated with nonlinear dynamics491
where the system and observation errors are Gaussian distributed pseudorandom numbers. A492
surprising result is that despite the nonlinearity, we still obtain perfect recovery up to numerical493
precision for maximally correlated errors. Moreover, in analogy to the linear case, perfect recovery494
is not possible when the instabilities in the nonlinear dynamics become sufficiently strong.495
We first consider the Lorenz-63 system introduced above (6). We consider the discrete time496
dynamics xi+1 = f (xi) to be given by applying the RK4 solver with ∆t = 0.01 to the chaotic vector497
field in (6) and the direct observation function h(xi+1) = xi+1. We artificially add substantial498
system and observation noise of covariance Q= I3×3 and R= 4I3×3, respectively. According to the499
remarks preceding Lemma A2.1, the system and observation noise are maximally correlated when500
S = 2I3×3, which implies the Schur complement of Q and R is the zero matrix. To test the recovery501
of the deterministic variables x,y,z, we set S = (2− δ j)I3×3 for δ j = 2− j and j = 0,1, ...,18,502
implying that δ j is the trace of the Schur complement. A time series of Lorenz-63 was produced503
28
with noise specified from Q,R and S and the UKF algorithm of Section 3a was applied. Results504
for RMSE of the recovered variables x,y,z are shown in Fig. 7(a). In the limit as δ → 0 and505
S approaches maximal correlation, we find perfect recovery of the true state using the CUKF506
algorithm with the true covariance matrix C, as foreshadowed by the linear case. We repeated this507
experiment for the alternative parameter value σ = 350 in (6), which yields a globally attracting508
periodic orbit, and obtained very similar results, also shown in Fig. 7(a).509
Since perfect recovery in the linear case depended on the degree of stability of the dynamics, we510
next investigate the effect of the Lyapunov exponents of a chaotic dynamical system on the ability511
to obtain perfect recovery. We consider the chaotic Lorenz-96 system, a 40-dimension ODE given512
by513
xi = xi−1xi+1− xi−1xi−2− xi +F (17)
where F determines the size and number of the positive Lyapunov exponents of the chaotic dy-514
namics Lorenz (1996). Using Q = I40×40, R = 4I40×40, the maximal correlation occurs when515
S = 2I40×40. We set S = (2− δ j)I40×40 for δ j = 2− j and j = 0,1, . . . ,18 and generated time516
series with noise from Q,R and S as above. The CUKF algorithm was applied to recover the 40-517
dimensional state. In Fig. 7(b) we show that for F = 6 we obtain perfect recovery in the case of518
maximal correlation between system and observation noise. However, as F increases, the system519
becomes more strongly chaotic and the perfect recovery breaks down. Notice that as in the linear520
case, the failure of perfect recovery occurs very sharply between F = 7 and F = 9. This suggests521
that in analogy to the linear result, some form of stability condition is likely necessary for perfect522
recovery.523
29
6. Discussion524
Approximating a dynamical system on a grid is pervasive in geophysical data assimilation ap-525
plications. For dynamical processes, time is usually handled in discrete fashion. We have shown526
that correlation between system and observation errors should be expected when the system errors527
derive from local truncation errors from differential equation solvers, both in discrete time and528
on a spatial grid, and when observational error is dominated by either observation model error or529
representation error.530
In Section 3, we introduced an approach to the ensemble Kalman filter that accounts for the cor-531
relations between system and observation errors. In particular, we showed that for spatiotemporal532
problems, extending the covariance matrix to allow cross-correlations can reduce filtering error533
as much as a significant increase in grid resolution. Of course, obtaining more precise estimates534
of the truth with much coarser discretization allows faster runtimes and/or larger ensembles to be535
used.536
Correlations are most significant when other independent sources of observation and system537
error are small compared to the truncation error. Of course, other sources of error, such as model538
error, may influence both the state and the observations leading to further significant correlations,539
but for simplicity we focus on correlations arising in the perfect model scenario. It is reasonable to540
expect that in many physical systems, the noise affecting the state of the system would also affect541
the sensor or observation system.542
The generalization of the CUKF to an ensemble square root Kalman filter (EnSQKF) is a543
straightforward extension. However, it remains to extend the ensemble adjustment Kalman fil-544
ter EAKF for additive system noise to correlations between system and observation noise. An545
EAKF formulation is critical for situations when the ensemble size is necessarily much smaller546
30
than either the state or observation dimensions (N and M respectively). This situation is common547
when the covariance matrices which are explicitly used in the UKF approach above do not fit in548
memory. A significant challenge in this formulation is that we cannot appropriately inflate the549
ensemble since we assume the full correlation matrix C is of maximum rank, and any inflation550
of the small ensemble would only match the inflation in the subspace spanned by the ensemble.551
A promising alternative is to follow the approach of Whitaker and Hamill (2002) and design an552
alternative gain matrix Ki such that the analysis ensemble Xai has the same covariance as applying553
the Kalman gain K to an appropriately inflated ensemble.554
In this article, we have not dealt with the question of realtime estimation of the full covariance555
matrix C. The importance of correctly specifying the Q and R matrices was first demonstrated for556
the Kalman filter in Mehra (1970, 1972) and for nonlinear filters in Berry and Sauer (2013). We557
consider sequential methods for estimation of the full covariance matrix in parallel with filtering558
to be a fruitful area of future research.559
Acknowledgments. We thank three reviewers whose helpful suggestions led to a much improved560
paper. This research was partially supported by National Science Foundation grant DMS1723175.561
APPENDIX562
A1. Equivalence of CUKF and KF for linear problems with correlated errors563
In order to justify our definition of the CUKF in Section 3b, we will show that for linear systems564
the update (14) is equivalent to the Kalman filter equations given in Section 3a. We can define the565
covariance of the forecast by expanding the innovation as566
εi ≡ yi− ybi = H(xi− xb
i )+ Jνi
= HF(xi−1− xai−1)+HΓωi−1 + Jνi (A1)
31
and writing HΓωi−1 + Jνi = [HΓ, J]
ωi−1
νi
we find567
Pyi ≡ E[εiε>i ]
= HFPai−1F>H>+[HΓ, J]W [HΓ, J]>
= HFPai−1F>H>+HΓQΓ>H>+HΓSJ>+ JS>Γ>H>+ JRJ>
where we recall that E
ωi−1
νi
ωi−1
νi
>
= W and E[(xi−1− xai−1)(xi−1− xa
i−1)>] = Pa
i−1.568
Notice that569
HPbi H> = HFPa
i−1F>H>+HΓQΓ>H>
so that570
Pyi = HPb
i H>+HΓSJ>+HΓQΓ>H>+ JS>Γ>H>+ JRJ>
which implies that we can rewrite the Kalman gain equation as571
Ki = (Pbi H>+ΓSJ>)(Py
i )−1.
32
Similarly, we can define the cross-correlation between the state and observation as572
Pxyi ≡ E[(xi− xb
i )(yi− ybi )>]
= E[(F(xi−1− xa
i−1)+Γωi−1)
(HF(xi−1− xai−1)+ HΓωi−1 + Jνi)
>]
= FPai−1F>H>+ΓQΓ>H>+ΓSJ>
= Pbi H>+ΓSJ>
and finally we can write the Kalman gain as573
Ki = Pxyi (Py
i )−1.
which agrees with the definition used in (14) for our version of the unscented Kalman filter.574
A2. Maximal correlation when N 6= M575
In Section 5 we showed how to define the maximal correlation matrix C when N = M. In the576
following lemma we derive the formula when N 6= M.577
Lemma A2.1 Let Q,R be symmetric positive-definite matrices with eigendecompositions Q =578
UΛU> and R = V LV>. Denote diagonal entries by Λ1,1 ≥ Λ2,2 ≥ ·· · ≥ ΛN,N and L1,1 ≥ L2,2 ≥579
· · · ≥ LM,M.580
1. If N ≥M, let U be the first M columns of U and Λ be the first M×M block of Λ and let V =V581
and L = L.582
2. If N ≤M, let U = U and Λ = Λ and let V be any N columns of V and L the corresponding583
N×N block of L.584
Then trace(Q−SR−1S>) is minimized over all N×M matrices S by585
S = UΛ1/2GL1/2V>
33
for any orthogonal matrix G. Moreover, for any maximal S we have rank(Q− SR−1S>) =586
max{N−M,0} and trace(Q−SR−1S>) = ∑Ni=M+1 Λi,i.587
Proof. Notice that588
SR−1S> = UΛ1/2GL1/2V>V L−1V>V L1/2G>Λ1/2U>
= UΛU> (A2)
so that Q−SR−1S =UΛU> where Λ replaces the first M diagonal entries of Λ with zeros if N >M589
and Λ = 0 if N ≤M.590
A3. Proof of Theorem 5.2591
Proof. It suffices to show that P= 0 is a solution to the DARE. Let R1/2 be the unique symmetric592
square root of R and let R−1/2 be its inverse. Setting A = HSR−1/2 +R1/2, notice that593
I = A>(AA>)−1A
= A>(HSR−1S>H>+HS
+ S>H>+R)−1A
and multiplying on the left by SR−1/2 and on the right by R−1/2S> we have594
SR−1S = (SR−1S>H>+S)(HSR−1S>H>+HS
+ S>H>+R)−1(HSR−1S>+S>).
Finally, since S is maximal with M ≥ N we have Q = SR−1S> by Lemma A2.1 which implies that595
Q = (QH>+S)(HQH>+HS+S>H>+R)−1(HQ+S>).
34
Since the previous equation is exactly (16) with P = 0, this shows that P = 0 is a solution to the596
DARE. Moreover, since P = 0 implies that the limiting Kalman gain is597
K = (QH>+S)(HQH>+HS+S>H>+R)−1
= SR−1/2A>(AA>)−1
= SR−1/2A−1
= SR−1/2(HSR−1/2 +R1/2)−1
= S(HS+R)−1
where A is invertible since it is the square root of an invertible matrix. Thus the stabilizing condi-598
tion is that the matrix599
(I−HK)F = (I−HS(HS+R)−1)F
has eigenvalues inside the unit circle. Since P = 0 solves the DARE and is stabilizing, it is the600
limiting covariance matrix of the Kalman filtering problem.601
In the case when M < N, recall that Q = UΛU> is the eigendecomposition from Lemma A2.1602
and U contains the first M columns of U so that Q = UΛU + Q. Since S is maximal we have603
that Q− Q = SR−1S> and multiplying both sides by U> on the left and U on the right we find604
Λ = U>SR−1S>U = SR−1S> where S = U>S. Similarly, setting P = U>PU , F = U>FU , and605
H = HU we can rewrite the DARE (again multiplying both sides by U> on the left and U on606
the right). The result is precisely the DARE from (16) with P,F,H,S,Q replaced by P, F , H, S, Λ607
respectively. Since Λ = SR−1S>, we have reduced to the case above, so P = 0 is a solution of608
this DARE. In other words P satisfying U>PU = 0 is a solution of the DARE, and projecting609
the DARE onto the eigenvectors of U orthogonal to U would yield another DARE which would610
need to be satisfied with a non-zero solution. Moreover, the resulting Kalman gain and stability611
condition become non-trivial in this case, but if the stability condition for the DARE is met, then612
35
we find a limiting covariance matrix which is zero when projected onto the top M eigenvectors of613
Q.614
References615
Belanger, P. R., 1974: Estimation of noise covariance matrices for a linear time-varying stochastic616
process. Automatica, 10 (3), 267–275.617
Berry, T., and T. Sauer, 2013: Adaptive ensemble Kalman filtering of non-linear systems. Tellus618
A: Dynamic Meteorology and Oceanography, 65 (1), 20 331, doi:10.3402/tellusa.v65i0.20331.619
Dee, D. P., 1995: On-line estimation of error covariance parameters for atmospheric data assimi-620
lation. Monthly Weather Review, 123 (4), 1128–1145, doi:10.1175/1520-0493(1995)123〈1128:621
OLEOEC〉2.0.CO;2, URL https://doi.org/10.1175/1520-0493(1995)123〈1128:OLEOEC〉2.0.622
CO;2, https://doi.org/10.1175/1520-0493(1995)123〈1128:OLEOEC〉2.0.CO;2.623
Guttman, L., 1946: Enlargement methods for computing the inverse matrix. Ann. Math. Stat., 17,624
336–343.625
Hamill, T. M., and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in626
ensemble data assimilation: A comparison of different approaches. Monthly Weather Review,627
133 (11), 3132–3147.628
Hodyss, D., and N. Nichols, 2015: The error of representation: basic understanding. Tellus A:629
Dynamic Meteorology and Oceanography, 67 (1), 24 822, doi:10.3402/tellusa.v67.24822, URL630
https://doi.org/10.3402/tellusa.v67.24822, https://doi.org/10.3402/tellusa.v67.24822.631
Janjic, T., and S. E. Cohn, 2006: Treatment of observation error due to unresolved scales in632
atmospheric data assimilation. Monthly Weather Review, 134 (10), 2900–2915.633
36
Janjic, T., and Coauthors, 2017: On the representation error in data assimilation. Quarterly Jour-634
nal of the Royal Meteorological Society, doi:10.1002/qj.3130, URL http://centaur.reading.ac.635
uk/71548/.636
Julier, S. J., and J. K. Uhlmann, 2004: Unscented filtering and nonlinear estimation. Proceedings637
of the IEEE, 92 (3), 401–422.638
Kuramoto, Y., and T. Tsuzuki, 1976: Persistent propagation of concentration waves in dissipative639
media far from thermal equilibrium. Progress of Theoretical Physics, 55 (2), 356–369.640
Liu, Z.-Q., and F. Rabier, 2002: The interaction between model resolution, observation resolution641
and observation density in data assimilation: A one-dimensional study. Quarterly Journal of the642
Royal Meteorological Society, 128 (582), 1367–1386.643
Lorenz, E. N., 1963: Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20 (2),644
130–141.645
Lorenz, E. N., 1996: Predictability: A problem partly solved. Proc. Seminar on Predictability,646
Vol. 1.647
Mehra, R., 1970: On the identification of variances and adaptive Kalman filtering. IEEE Transac-648
tions on Automatic Control, 15 (2), 175–184.649
Mehra, R., 1972: Approaches to adaptive filtering. IEEE Transactions on Automatic Control,650
17 (5), 693–698.651
Mitchell, H. L., and R. Daley, 1997: Discretization error and signal/error correlation in atmo-652
spheric data assimilation. Tellus A, 49 (1), 32–53, doi:10.1034/j.1600-0870.1997.00003.x, URL653
http://dx.doi.org/10.1034/j.1600-0870.1997.00003.x.654
37
Oke, P. R., and P. Sakov, 2008: Representation error of oceanic observations for data assimilation.655
Journal of Atmospheric and Oceanic Technology, 25 (6), 1004–1017.656
Ran, A., and R. Vreugdenhil, 1988: Existence and comparison theorems for algebraic Riccati657
equations for continuous- and discrete-time systems. Linear Algebra and its Applications, 99,658
63 – 83, doi:http://dx.doi.org/10.1016/0024-3795(88)90125-5, URL http://www.sciencedirect.659
com/science/article/pii/0024379588901255.660
Satterfield, E., D. Hodyss, D. D. Kuhl, and C. H. Bishop, 2017: Investigating the use of ensemble661
variance to predict observation error of representation. Monthly Weather Review, 145 (2), 653–662
667, doi:10.1175/MWR-D-16-0299.1, URL https://doi.org/10.1175/MWR-D-16-0299.1, https:663
//doi.org/10.1175/MWR-D-16-0299.1.664
Simon, D., 2006: Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches.665
Wiley-Interscience.666
Sivashinsky, G., 1977: Nonlinear analysis of hydrodynamic instability in laminar flames I. Deriva-667
tion of basic equations. Acta Astronautica, 4 (11-12), 1177–1206.668
Van Leeuwen, P. J., 2015: Representation errors and retrievals in linear and nonlinear data assim-669
ilation. Quarterly Journal of the Royal Meteorological Society, 141 (690), 1612–1623.670
Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observa-671
tions. Monthly Weather Review, 130 (7), 1913–1924.672
38
LIST OF FIGURES673
Fig. 1. Demonstrating correlated noise in the truncated L63 system. (a) Comparing the true x-674
coordinate of L63 (grey) to a one-step forecast using the forward Euler method (blue, solid675
curve) with ∆t = 0.05 and the integrated observation (red, dashed curve). (b) Comparing the676
system error (defined as LTE) to the observation error, note the correlation. (c) Empirical677
covariance matrices, C with red lines dividing the Q,S,S>, and R blocks. (d-f) same as (a-c)678
except using the RK4 integrator with the same coarse time step ∆t = 0.05. Color ranges679
in (c) and (f) are selected to emphasize the S matrix and may saturate for Q and R. Notice680
positive correlations in (b,c) and negative correlations in (e,f). . . . . . . . . . . 41681
Fig. 2. Top row: (left) Ground truth 512 grid point solution (middle left) the same solution deci-682
mated to 64 grid points (middle right) the observation, which integrates the leftmost solution683
over 9 gridpoints before truncating (right) Observation error, which is the difference between684
the middle two solutions. Bottom row: (left and middle left) Same as top row, (middle right)685
1-step integrator output from the truncated model, using 64 grid points and ∆t = 0.1 (right)686
System error, difference between the middle two solutions. . . . . . . . . . . 42687
Fig. 3. For the Kuramoto-Sivashinsky model truncated onto 64 grid points with ∆t = 0.1 we show688
(left) the spatially averaged C matrix, note that the cross-covariance between dynamical689
truncation errors and observation representation errors has a larger magnitude than the ob-690
servation errors and (right) the eigenvalues of C (black, solid curve) decay quickly. The691
presence of eigenvalues which are very close to zero indicates that the matrix C is close to692
maximally correlated as we will show in Sec. 5. We also show the eigenvalues for a correla-693
tion matrix computed in the presence of both observational model error and representation694
error (red, dashed curve). Finally, we show the eigenvalues after the diagonal of the S matrix695
is increased by 50% (blue, dotted curve). . . . . . . . . . . . . . . . 43696
Fig. 4. (a) Comparison of the true solution, x (gray) and its discrete time samples xi (black cir-697
cles) and integrated observations yi (green circles) with the UKF estimates (blue, solid) and698
CUKF estimates (red, dotted) over the same time interval shown in Fig. 1. (b) Errors com-699
puted by subtracting the true discretized signal from the observation (green circles), the UKF700
estimates (blue, solid), and the CUKF estimate (red, dotted). (c-d) Same as (a-b) but using701
the RK4 integrator with the same ∆t = 0.05. . . . . . . . . . . . . . . 44702
Fig. 5. (a-b) Comparison of filter results using the UKF without correlations to filtering with cor-703
relations (CUKF) on the Kuramoto-Sivashinsky model truncated in space to 64,128, and704
256 grid points for observations integrated over (a) δ = L/512, (b) δ = L/128. (c) For 64705
grid points and ∆t = 0.1, we show the robustness of the results after adding various levels of706
Gaussian instrument noise with variance R to the observations. (d) For the case R = 10−2 we707
test the UKF with inflation by adding the identity matrix times a constant to Q (blue, solid)708
or R (red, dashed). We also show the effect of inflating the filter background covariance709
(black, dotted) where the the x-axis indicates inflation percentage. In each case inflation710
degraded the filter performance. . . . . . . . . . . . . . . . . . 45711
Fig. 6. Mean squared error of filter estimates for linear models F = λ IN×N with (a) positive cor-712
relations and (b) negative correlations. Black curve is based on the filter using S = 0, all713
other curves use the true S. Notice that we obtain perfect recovery to the limit of numerical714
precision when the stability criterion is satisfied, namely λ < 3/2 for positive correlation (a)715
and λ < 1/2 for negative correlation (b). . . . . . . . . . . . . . . . 46716
Fig. 7. Mean squared error of filter estimates with positively correlated noise for (a) L63 in periodic717
and chaotic parameter regimes and (b) L96 dynamical systems for various values of the718
39
forcing parameter. Black curve is based on the filter using S = 0 (UKF), all other curves use719
the true S (CUKF). . . . . . . . . . . . . . . . . . . . . . 47720
40
4 M O N T H L Y W E A T H E R R E V I E W
17 17.5 18 18.5 19 19.5 20
Time
-20
-15
-10
-5
0
5
10
15
20
x
(a)
17 17.5 18 18.5 19 19.5 20
Time
-2
-1
0
1
2
3
Err
or
(b)
1 2 3 4 5 6
1
2
3
4
5
6
-1
0
1
(c)
17 17.5 18 18.5 19 19.5 20
Time
-20
-15
-10
-5
0
5
10
15
20
x
(d)
17 17.5 18 18.5 19 19.5 20
Time
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Err
or
(e)
1 2 3 4 5 6
1
2
3
4
5
6
-2e-3
0
2e-3
(f)
FIG. 1. Demonstrating correlated noise in the truncated L63 system.(a) Comparing the true x-coordinate of L63 (grey) to a one-step fore-cast using the forward Euler method (blue, solid curve) with Dt = 0.05and the integrated observation (red, dashed curve). (b) Comparing thesystem error (defined as LTE) to the observation error, note the corre-lation. (c) Empirical covariance matrices, C with red lines dividing theQ,S,S>, and R blocks. (d-f) same as (a-c) except using the RK4 inte-grator with the same coarse time step Dt = 0.05. Color ranges in (c) and(f) are selected to emphasize the S matrix and may saturate for Q and R.Notice positive correlations in (b,c) and negative correlations in (e,f).
c. Estimation of the full covariance matrix
The correlations described above are illustrated in twosimple examples. To show the effects most clearly, weassume a perfect model. We begin with a simple ODEsolver.
Consider using forward Euler on the Lorenz-63 systemLorenz (1963)
x1 = s(x2 � x1)
x2 = x1(r � x3)� x2 (6)x3 = x1x2 �bx3
where s = 10, r = 28, b = 8/3. To demonstrate the corre-lation derived above, we first used a higher-order integra-tor (Runge-Kutta 4th order with time step 0.005) to pro-duce a finely sampled ground truth signal. To define thetruncated model, we used a forward Euler method withDt = 0.1. We used a 21-point composite trapezoid ruleto approximate the integrated observation with d = 0.05.
In Fig. 1(a,b) we show the correlation between the systemerror, in this case the local truncation error (LTE) of theEuler solver, and the observation error, as a function oftime.
In Fig. 1(c) we show the estimated covariance matrix C,which reveals the strong correlations (S 6= 0) between thesystem and observation errors. The covariance matrix Cis estimated by concatenating the system and observationerrors at each step into a 6-dimensional vector, and thencomputing the empirical covariance matrix of these vec-tors (averaged over T = 12000 discrete time steps). Theestimated C matrix will be used in Section 4 in a nonlinearfilter, and this method can be used to estimate the C matrixfor general problems as long as one can afford a long of-fline run using a very fine discretization. Fig. 1(d-f) showsthe same phenomenon for a more accurate solver, Runge-Kutta 4th-order with time step Dt = 0.05. Although thesystem errors are much smaller, the correlation with ob-servation errors are still evident.
The difference between the positive correlations inFig. 1(b,c) and negative correlations in Fig. 1(e,f) will havea noticeable effect in filter accuracy, as shown in Sectiona. In Section 5, we will establish a theory explaining thisdisparity in the linear case.
FIG. 2. Top row: (left) Ground truth 512 grid point solution (mid-dle left) the same solution decimated to 64 grid points (middle right)the observation, which integrates the leftmost solution over 9 gridpointsbefore truncating (right) Observation error, which is the difference be-tween the middle two solutions. Bottom row: (left and middle left)Same as top row, (middle right) 1-step integrator output from the trun-cated model, using 64 grid points and Dt = 0.1 (right) System error,difference between the middle two solutions.
As a second example, consider spatiotemporal dynam-ics x(z, t) given by the Kuramoto-Sivashinsky PDE (Ku-ramoto and Tsuzuki 1976; Sivashinsky 1977)
xt = �xzzzz � xzz � xxz (7)
defined on a periodic domain with length L = 100. Forsimplicity, we will use an explicit method applying RK4in time and second order finite difference formulas forthe spatial derivatives. While an implicit method would
FIG. 1. Demonstrating correlated noise in the truncated L63 system. (a) Comparing the true x-coordinate
of L63 (grey) to a one-step forecast using the forward Euler method (blue, solid curve) with ∆t = 0.05 and the
integrated observation (red, dashed curve). (b) Comparing the system error (defined as LTE) to the observation
error, note the correlation. (c) Empirical covariance matrices, C with red lines dividing the Q,S,S>, and R blocks.
(d-f) same as (a-c) except using the RK4 integrator with the same coarse time step ∆t = 0.05. Color ranges in
(c) and (f) are selected to emphasize the S matrix and may saturate for Q and R. Notice positive correlations in
(b,c) and negative correlations in (e,f).
721
722
723
724
725
726
727
41
4 M O N T H L Y W E A T H E R R E V I E W
17 17.5 18 18.5 19 19.5 20
Time
-20
-15
-10
-5
0
5
10
15
20
x
(a)
17 17.5 18 18.5 19 19.5 20
Time
-2
-1
0
1
2
3
Err
or
(b)
1 2 3 4 5 6
1
2
3
4
5
6
-1
0
1
(c)
17 17.5 18 18.5 19 19.5 20
Time
-20
-15
-10
-5
0
5
10
15
20
x
(d)
17 17.5 18 18.5 19 19.5 20
Time
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Err
or
(e)
1 2 3 4 5 6
1
2
3
4
5
6
-2e-3
0
2e-3
(f)
FIG. 1. Demonstrating correlated noise in the truncated L63 system.(a) Comparing the true x-coordinate of L63 (grey) to a one-step fore-cast using the forward Euler method (blue, solid curve) with Dt = 0.05and the integrated observation (red, dashed curve). (b) Comparing thesystem error (defined as LTE) to the observation error, note the corre-lation. (c) Empirical covariance matrices, C with red lines dividing theQ,S,S>, and R blocks. (d-f) same as (a-c) except using the RK4 inte-grator with the same coarse time step Dt = 0.05. Color ranges in (c) and(f) are selected to emphasize the S matrix and may saturate for Q and R.Notice positive correlations in (b,c) and negative correlations in (e,f).
c. Estimation of the full covariance matrix
The correlations described above are illustrated in twosimple examples. To show the effects most clearly, weassume a perfect model. We begin with a simple ODEsolver.
Consider using forward Euler on the Lorenz-63 systemLorenz (1963)
x1 = s(x2 � x1)
x2 = x1(r � x3)� x2 (6)x3 = x1x2 �bx3
where s = 10, r = 28, b = 8/3. To demonstrate the corre-lation derived above, we first used a higher-order integra-tor (Runge-Kutta 4th order with time step 0.005) to pro-duce a finely sampled ground truth signal. To define thetruncated model, we used a forward Euler method withDt = 0.1. We used a 21-point composite trapezoid ruleto approximate the integrated observation with d = 0.05.
In Fig. 1(a,b) we show the correlation between the systemerror, in this case the local truncation error (LTE) of theEuler solver, and the observation error, as a function oftime.
In Fig. 1(c) we show the estimated covariance matrix C,which reveals the strong correlations (S 6= 0) between thesystem and observation errors. The covariance matrix Cis estimated by concatenating the system and observationerrors at each step into a 6-dimensional vector, and thencomputing the empirical covariance matrix of these vec-tors (averaged over T = 12000 discrete time steps). Theestimated C matrix will be used in Section 4 in a nonlinearfilter, and this method can be used to estimate the C matrixfor general problems as long as one can afford a long of-fline run using a very fine discretization. Fig. 1(d-f) showsthe same phenomenon for a more accurate solver, Runge-Kutta 4th-order with time step Dt = 0.05. Although thesystem errors are much smaller, the correlation with ob-servation errors are still evident.
The difference between the positive correlations inFig. 1(b,c) and negative correlations in Fig. 1(e,f) will havea noticeable effect in filter accuracy, as shown in Sectiona. In Section 5, we will establish a theory explaining thisdisparity in the linear case.
FIG. 2. Top row: (left) Ground truth 512 grid point solution (mid-dle left) the same solution decimated to 64 grid points (middle right)the observation, which integrates the leftmost solution over 9 gridpointsbefore truncating (right) Observation error, which is the difference be-tween the middle two solutions. Bottom row: (left and middle left)Same as top row, (middle right) 1-step integrator output from the trun-cated model, using 64 grid points and Dt = 0.1 (right) System error,difference between the middle two solutions.
As a second example, consider spatiotemporal dynam-ics x(z, t) given by the Kuramoto-Sivashinsky PDE (Ku-ramoto and Tsuzuki 1976; Sivashinsky 1977)
xt = �xzzzz � xzz � xxz (7)
defined on a periodic domain with length L = 100. Forsimplicity, we will use an explicit method applying RK4in time and second order finite difference formulas forthe spatial derivatives. While an implicit method would
FIG. 2. Top row: (left) Ground truth 512 grid point solution (middle left) the same solution decimated to
64 grid points (middle right) the observation, which integrates the leftmost solution over 9 gridpoints before
truncating (right) Observation error, which is the difference between the middle two solutions. Bottom row:
(left and middle left) Same as top row, (middle right) 1-step integrator output from the truncated model, using
64 grid points and ∆t = 0.1 (right) System error, difference between the middle two solutions.
728
729
730
731
732
42
1 64 128
1
64
128 -0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
1 64 128
n
10-10
10-8
10-6
10-4
10-2
100
Eig
enva
lue,
n
Representation ErrorObs. Model Error+Representation ErrorIncreased Correlation
Figure 3: For the Kuramoto-Sivashinsky model truncated onto 64 grid points with �t = 0.1 we show (a) the spatially averagedC matrix, note the strong correlations between system and observation errors and (b) the eigenvalues of C. The presence ofeigenvalues which are very close to zero indicates that the matrix C is close to maximally correlated as we will show in Sec. 4.
2.1. The Kalman filter for correlated system and observation noise
We begin by reviewing the Kalman update equations for a linear system with correlated noise (see forexample [16]). Assume model and observation equations
xi = f(xi�1) + �!i�1 (9)
yi = h(xi) + J⌫i (10)
where F = f and H = h represent the systems dynamics and linear observable, respectively and � and Jare fixed matrices. Assume (3) represents the noise covariances.
Given the posterior estimate of the state xai�1 and covariance P a
i�1 at step i� 1, the Kalman update fora linear system is
xbi = Fxa
i�1
ybi = Hxb
i
P bi = FP a
i�1F> + �Q�> (11)
Ki = (P bi H> + �SJ>)(HP b
i H> + H�SJ> + JS>�>H> + JRJ>)�1 (12)
xai = xb
i + Ki(yi � ybi )
P ai = P b
i � Ki(HP bi + JS>�>) (13)
where xbi represents the forecast of the state given only the observations up to time i� 1 and P b
i representsthe covariance of the forecast. Similarly, yb
i represents the forecast of the i-th observation given only theobservations up to time i � 1, and the di↵erence between the true observation and the forecast,
✏i ⌘ yi � ybi
is called the innovation. The Kalman gain matrix Ki optimally combines the forecast xbi with the innovation
to form the posterior estimate xai which is the maximal likelihood and minimum variance estimator of the
true state xi. The filter also produces the covariance matrix P ai of the estimator for use in the next filter
step.
2.2. The correlated unscented Kalman filter (CUKF)
We now generalize the correlated system and observation noise filtering approach to nonlinear systemsand we will show that for linear systems we recover exactly the equations above.
8
FIG. 3. For the Kuramoto-Sivashinsky model truncated onto 64 grid points with ∆t = 0.1 we show (left) the
spatially averaged C matrix, note that the cross-covariance between dynamical truncation errors and observation
representation errors has a larger magnitude than the observation errors and (right) the eigenvalues of C (black,
solid curve) decay quickly. The presence of eigenvalues which are very close to zero indicates that the matrix
C is close to maximally correlated as we will show in Sec. 5. We also show the eigenvalues for a correlation
matrix computed in the presence of both observational model error and representation error (red, dashed curve).
Finally, we show the eigenvalues after the diagonal of the S matrix is increased by 50% (blue, dotted curve).
733
734
735
736
737
738
739
43
17 17.5 18 18.5 19 19.5 20
Time
-20
-15
-10
-5
0
5
10
15
20x
(a)
17 17.5 18 18.5 19 19.5 20
Time
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Err
or
(b)
17 17.5 18 18.5 19 19.5 20
Time
-20
-15
-10
-5
0
5
10
15
20
x
(c)
17 17.5 18 18.5 19 19.5 20
Time
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Err
or
(d)
Figure 4: (a) Comparison of the true solution, x (gray) and its discrete time samples xi (black circles) and integrated obser-vations yi (green circles) with the QR UKF estimates (blue, solid) and QRS UKF estimates (red, dotted) over the same timeinterval shown in Fig. 1. (b) Errors computed by subtracting the true discretized signal from the observation (green circles), theQR UKF estimates (blue, solid), and the QRS UKF estimate (red, dotted). (c-d) Same as (a-b) but using the RK4 integratorwith the same �t = 0.05.
3.1. Example: Lorenz equations
First we consider the Lorenz-63 system (6) with the observation described in section 1.2. Using the samedata generated in that example, we applied the CUKF and UKF. The estimates produced by these filters areshown in Fig. 4(a) (for the same time interval shown in Fig. 1). In Fig. 4(b) we show the errors between eachfilter’s estimates and the truth, compared to the observation errors over the same time interval. Notice thatthe CUKF, which uses the full C matrix, obtains significantly superior estimates of the true state. Averagedover 6000 filter steps (after removing the initial filter transient) the root mean squared error (RMSE) ofstandard UKF estimates is 0.29 whereas that of the CUKF estimates is 0.16. Compared to the RMSE ofthe raw observations which is 0.35 the UKF reduced the error by 17% while the CUKF reduced the errorby 54%.
We then repeated this experiment using the RK4 integrator instead of forward Euler with the sametruncated time step of �t = 0.05 and the results are shown in Fig. 4(c-d). The RMSE of the UKF estimateswith this integrator is 0.18 and the RMSE of the CUKF estimates is 0.16. Recall that the local truncationerrors of RK4 were negatively correlated with the observation errors, resulting in relatively small di↵erencesbetween the UKF and CUKF for this example. The di↵erence between positively and negatively correlatederrors will be studied below in Section 4. Notice that the CUKF with forward Euler obtains better estimates
11
FIG. 4. (a) Comparison of the true solution, x (gray) and its discrete time samples xi (black circles) and
integrated observations yi (green circles) with the UKF estimates (blue, solid) and CUKF estimates (red, dotted)
over the same time interval shown in Fig. 1. (b) Errors computed by subtracting the true discretized signal from
the observation (green circles), the UKF estimates (blue, solid), and the CUKF estimate (red, dotted). (c-d)
Same as (a-b) but using the RK4 integrator with the same ∆t = 0.05.
740
741
742
743
744
44
64 128 256
Number of Grid Points
0
0.002
0.004
0.006
0.008
0.01
0.012R
MS
E
Observation ErrorUKF (S=0)CUKF
(a)
64 128 256
Number of Grid Points
0
0.02
0.04
0.06
0.08
0.1
0.12
RM
SE
Observation ErrorUKF (S=0)CUKF
(b)
10-2 10-1 100
R1/2
10-2
10-1
100
RM
SE
Observation ErrorUKF (S=0)CUKF
(c)
10-2 10-1 100
Additive Inflation
10-2
10-1
100
RM
SE
Observation ErrorUKF+Inflated QUKF+Inflated RUKF+Inflated Background (%)
(d)
Figure 5: (a-b) Comparison of filter results using the UKF without correlations to filtering with correlations (CUKF) on theKuramoto-Sivashinsky model truncated in space to 64, 128, and 256 grid points for observations integrated over (a) � = L/512,(b) � = L/128. (c) For 64 grid points and �t = 0.1, we show the robustness of the results after adding various levels of Gaussianrandom noise with variance R to the observations. (d) For the case R = 10�2 we test the UKF with inflation by adding theidentity matrix times a constant to Q (blue, solid) or R (red, dashed); in each case inflation degraded the filter performance.
so that Y = Y is unchanged but the covariance matrix of X, Y is,
C ⌘ E
"X
Y
� X
Y
�>#=
IN⇥N �SR�1
0 IM⇥M
� Q SS> R
� IN⇥N 0
�R�1S> IM⇥M
�
=
Q � SR�1S> 0
0 R
�(16)
and the new state variables X have covariance matrix Q � SR�1S> with minimal trace. In other words,the variables X have minimal variance among all possible choices of S. According to a standard result, therank of C is equal to the sum of the rank of R and the rank of its Schur complement Q�SR�1S>, meaningthat rank(C) = rank(C). Thus, by reducing the rank of the Schur complement we are actually choosing Swhich minimizes the rank of C. Intuitively speaking, this choice of S minimizes the dimensionality of thejoint noise process.
A simple example of maximal correlation is to consider the 2 ⇥ 2 case where Q = q, R = r, and S = sare scalars. By setting s = ±p
qp
r, we find the Schur complement to be q � sr�1s = 0. Moreover, withthis choice of s the eigenvalues of C are {0, q + r}, so that rank(W ) = 1 is minimal over all possible choices
of s. In general, when N = M we can set S =p
Qp
R>
wherep
Qp
Q>
= Q andp
Rp
R>
= R are matrix
13
FIG. 5. (a-b) Comparison of filter results using the UKF without correlations to filtering with correlations
(CUKF) on the Kuramoto-Sivashinsky model truncated in space to 64,128, and 256 grid points for observations
integrated over (a) δ = L/512, (b) δ = L/128. (c) For 64 grid points and ∆t = 0.1, we show the robustness of the
results after adding various levels of Gaussian instrument noise with variance R to the observations. (d) For the
case R = 10−2 we test the UKF with inflation by adding the identity matrix times a constant to Q (blue, solid) or
R (red, dashed). We also show the effect of inflating the filter background covariance (black, dotted) where the
the x-axis indicates inflation percentage. In each case inflation degraded the filter performance.
745
746
747
748
749
750
751
45
M O N T H L Y W E A T H E R R E V I E W 11
j = 0,1, ...,18. In this case, the correlation in S is nega-tive and we find that the stabilization criterion is
eig((I �HS(HS +R)�1)F) = (1+2/(�2+4))l < 1
if and only if l < 0.5. Notice that in this case the dynam-ics are required to be stable (l < 1) in order to stabilizethe P = 0 solution, whereas with positive correlations thedynamics could be unstable (l > 1). In Fig. 6(b) we plotthe mean of the diagonal elements of the numerical so-lution P to (18) against the trace of the Schur comple-ment trace(Q� SR�1S>) = 4Nd j �Nd 2
j for all values ofj. The different curves correspond to different values ofl chosen near 0.5. Notice that for l < 0.5, as the noiseapproached maximally correlated, the filter estimates ap-proach the true state up to the limits of numerical preci-sion. When l > 0.5, P = 0 is still a solution of the DAREbut is not longer stabilizing and so the filter converges to acovariance matrix which with variances greater than zero.
10-16 10-12 10-8 10-4 100
Trace of Schur Complement
10-16
10-12
10-8
10-4
100M
ean S
quare
d E
rror
S=0, =0.5
=2=1.5+1e-3
=1.5+1e-6=1.5+1e-9
=1.5+1e-12=1.5-1e-12
=1.5-1e-9=1.5-1e-6
=1.5-1e-3=0.5
(a)
10-16 10-12 10-8 10-4 100
Trace of Schur Complement
10-16
10-12
10-8
10-4
100
Mean S
quare
d E
rror
S=0, =0.4
=1.5=0.5+1e-3
=0.5+1e-6=0.5+1e-9
=0.5+1e-12=0.5-1e-12
=0.5-1e-9=0.5-1e-6
=0.5-1e-3=0.4
(b)
FIG. 6. Mean squared error of filter estimates for linear modelsF = l IN⇥N with (a) positive correlations and (b) negative correlations.Black curve is based on the filter using S = 0, all other curves use thetrue S. Notice that we obtain perfect recovery to the limit of numericalprecision when the stability criterion is satisfied, namely l < 3/2 forpositive correlation (a) and l < 1/2 for negative correlation (b).
Finally, we note that a standard form for the DARE usedin numerical solvers, such as MATLAB, is
0 = A>PA�E>PE� (A>PB+ S)(B>PB+ R)�1(B>PA+ S>)+ Q
and (18) can be put in this form by setting E = I, A = F>,B = F>H>, Q = Q, S = QH>+S, and R = HQH>+HS+S>H> +R.
c. Examples of UKF and perfect recovery in nonlinearsystems
In this section we will apply the UKF to synthetic datasets generated with nonlinear dynamics where the systemand observation errors are Gaussian distributed pseudoran-dom numbers. A surprising result is that despite the non-linearity, we still obtain perfect recovery up to numericalprecision for maximally correlated errors. Moreover, inanalogy to the linear case, perfect recovery is not possible
when the instabilities in the nonlinear dynamics becomesufficiently strong.
10-16 10-12 10-8 10-4 100
Trace of Schur Complement
10-16
10-12
10-8
10-4
100
Mean S
quare
d E
rror
CUKF, Periodic L63
UKF, Periodic L63
CUKF, Chaotic L63
UKF, Chaotic L63
(a)
10-16 10-12 10-8 10-4 100
Trace of Schur Complement
10-16
10-12
10-8
10-4
100
Mean S
quare
d E
rror
F=6
F=7
F=8
F=8.25
F=8.5
F=9
F=12
F=30
UKF
(b)
FIG. 7. Mean squared error of filter estimates with positively corre-lated noise for (a) L63 in periodic and chaotic parameter regimes and(b) L96 dynamical systems for various values of the forcing parameter.Black curve is based on the filter using S = 0 (UKF), all other curvesuse the true S (CUKF).
We first consider the Lorenz-63 system introducedabove (6). We consider the discrete time dynamics xi+1 =f (xi) to be given by applying the RK4 solver with Dt =0.01 to the chaotic vector field in (6) and the direct ob-servation function h(xi+1) = xi+1. We artificially addsubstantial system and observation noise of covarianceQ = I3⇥3 and R = 4I3⇥3, respectively. According to the re-marks preceding Lemma 5.2, the system and observationnoise are maximally correlated when S = 2I3⇥3, which im-plies the Schur complement of Q and R is the zero matrix.To test the recovery of the deterministic variables x,y,z,we set S = (2� d j)I3⇥3 for d j = 2� j and j = 0,1, ...,18,implying that d j is the trace of the Schur complement. Atime series of Lorenz-63 was produced with noise speci-fied from Q,R and S and the UKF algorithm of Section awas applied. Results for RMSE of the recovered variablesx,y,z are shown in Fig. 7(a). In the limit as d ! 0 and Sapproaches maximal correlation, we find perfect recoveryof the true state using the CUKF algorithm with the truecovariance matrix C, as foreshadowed by the linear case.We repeated this experiment for the alternative parametervalue s = 350 in (6), which yields a globally attracting pe-riodic orbit, and obtained very similar results, also shownin Fig. 7(a).
Since perfect recovery in the linear case depended onthe degree of stability of the dynamics, we next investigatethe effect of the Lyapunov exponents of a chaotic dynami-cal system on the ability to obtain perfect recovery. Weconsider the chaotic Lorenz-96 system, a 40-dimensionODE given by
xi = xi�1xi+1 � xi�1xi�2 � xi +F (18)
where F determines the size and number of the posi-tive Lyapunov exponents of the chaotic dynamics Lorenz(1996). Using Q = I40⇥40, R = 4I40⇥40, the maximal corre-lation occurs when S = 2I40⇥40. We set S = (2�d j)I40⇥40
FIG. 6. Mean squared error of filter estimates for linear models F = λ IN×N with (a) positive correlations and
(b) negative correlations. Black curve is based on the filter using S = 0, all other curves use the true S. Notice
that we obtain perfect recovery to the limit of numerical precision when the stability criterion is satisfied, namely
λ < 3/2 for positive correlation (a) and λ < 1/2 for negative correlation (b).
752
753
754
755
46
M O N T H L Y W E A T H E R R E V I E W 11
j = 0,1, ...,18. In this case, the correlation in S is nega-tive and we find that the stabilization criterion is
eig((I �HS(HS +R)�1)F) = (1+2/(�2+4))l < 1
if and only if l < 0.5. Notice that in this case the dynam-ics are required to be stable (l < 1) in order to stabilizethe P = 0 solution, whereas with positive correlations thedynamics could be unstable (l > 1). In Fig. 6(b) we plotthe mean of the diagonal elements of the numerical so-lution P to (18) against the trace of the Schur comple-ment trace(Q� SR�1S>) = 4Nd j �Nd 2
j for all values ofj. The different curves correspond to different values ofl chosen near 0.5. Notice that for l < 0.5, as the noiseapproached maximally correlated, the filter estimates ap-proach the true state up to the limits of numerical preci-sion. When l > 0.5, P = 0 is still a solution of the DAREbut is not longer stabilizing and so the filter converges to acovariance matrix which with variances greater than zero.
10-16 10-12 10-8 10-4 100
Trace of Schur Complement
10-16
10-12
10-8
10-4
100
Mean S
quare
d E
rror
S=0, =0.5
=2=1.5+1e-3
=1.5+1e-6=1.5+1e-9
=1.5+1e-12=1.5-1e-12
=1.5-1e-9=1.5-1e-6
=1.5-1e-3=0.5
(a)
10-16 10-12 10-8 10-4 100
Trace of Schur Complement
10-16
10-12
10-8
10-4
100
Mean S
quare
d E
rror
S=0, =0.4
=1.5=0.5+1e-3
=0.5+1e-6=0.5+1e-9
=0.5+1e-12=0.5-1e-12
=0.5-1e-9=0.5-1e-6
=0.5-1e-3=0.4
(b)
FIG. 6. Mean squared error of filter estimates for linear modelsF = l IN⇥N with (a) positive correlations and (b) negative correlations.Black curve is based on the filter using S = 0, all other curves use thetrue S. Notice that we obtain perfect recovery to the limit of numericalprecision when the stability criterion is satisfied, namely l < 3/2 forpositive correlation (a) and l < 1/2 for negative correlation (b).
Finally, we note that a standard form for the DARE usedin numerical solvers, such as MATLAB, is
0 = A>PA�E>PE� (A>PB+ S)(B>PB+ R)�1(B>PA+ S>)+ Q
and (18) can be put in this form by setting E = I, A = F>,B = F>H>, Q = Q, S = QH>+S, and R = HQH>+HS+S>H> +R.
c. Examples of UKF and perfect recovery in nonlinearsystems
In this section we will apply the UKF to synthetic datasets generated with nonlinear dynamics where the systemand observation errors are Gaussian distributed pseudoran-dom numbers. A surprising result is that despite the non-linearity, we still obtain perfect recovery up to numericalprecision for maximally correlated errors. Moreover, inanalogy to the linear case, perfect recovery is not possible
when the instabilities in the nonlinear dynamics becomesufficiently strong.
10-16 10-12 10-8 10-4 100
Trace of Schur Complement
10-16
10-12
10-8
10-4
100M
ean S
quare
d E
rror
CUKF, Periodic L63
UKF, Periodic L63
CUKF, Chaotic L63
UKF, Chaotic L63
(a)
10-16 10-12 10-8 10-4 100
Trace of Schur Complement
10-16
10-12
10-8
10-4
100
Mean S
quare
d E
rror
F=6
F=7
F=8
F=8.25
F=8.5
F=9
F=12
F=30
UKF
(b)
FIG. 7. Mean squared error of filter estimates with positively corre-lated noise for (a) L63 in periodic and chaotic parameter regimes and(b) L96 dynamical systems for various values of the forcing parameter.Black curve is based on the filter using S = 0 (UKF), all other curvesuse the true S (CUKF).
We first consider the Lorenz-63 system introducedabove (6). We consider the discrete time dynamics xi+1 =f (xi) to be given by applying the RK4 solver with Dt =0.01 to the chaotic vector field in (6) and the direct ob-servation function h(xi+1) = xi+1. We artificially addsubstantial system and observation noise of covarianceQ = I3⇥3 and R = 4I3⇥3, respectively. According to the re-marks preceding Lemma 5.2, the system and observationnoise are maximally correlated when S = 2I3⇥3, which im-plies the Schur complement of Q and R is the zero matrix.To test the recovery of the deterministic variables x,y,z,we set S = (2� d j)I3⇥3 for d j = 2� j and j = 0,1, ...,18,implying that d j is the trace of the Schur complement. Atime series of Lorenz-63 was produced with noise speci-fied from Q,R and S and the UKF algorithm of Section awas applied. Results for RMSE of the recovered variablesx,y,z are shown in Fig. 7(a). In the limit as d ! 0 and Sapproaches maximal correlation, we find perfect recoveryof the true state using the CUKF algorithm with the truecovariance matrix C, as foreshadowed by the linear case.We repeated this experiment for the alternative parametervalue s = 350 in (6), which yields a globally attracting pe-riodic orbit, and obtained very similar results, also shownin Fig. 7(a).
Since perfect recovery in the linear case depended onthe degree of stability of the dynamics, we next investigatethe effect of the Lyapunov exponents of a chaotic dynami-cal system on the ability to obtain perfect recovery. Weconsider the chaotic Lorenz-96 system, a 40-dimensionODE given by
xi = xi�1xi+1 � xi�1xi�2 � xi +F (18)
where F determines the size and number of the posi-tive Lyapunov exponents of the chaotic dynamics Lorenz(1996). Using Q = I40⇥40, R = 4I40⇥40, the maximal corre-lation occurs when S = 2I40⇥40. We set S = (2�d j)I40⇥40
FIG. 7. Mean squared error of filter estimates with positively correlated noise for (a) L63 in periodic and
chaotic parameter regimes and (b) L96 dynamical systems for various values of the forcing parameter. Black
curve is based on the filter using S = 0 (UKF), all other curves use the true S (CUKF).
756
757
758
47