empirical localization of observation impact in ensemble filters jeff anderson image/dares thanks to...

Empirical Localization of Observation Impact in Ensemble Filters

Jeff AndersonIMAGe/DAReS

Thanks to Lili Lei, Tim Hoar, Kevin Raeder, Nancy Collins, Glen Romine, Chris Snyder, Doug Nychka

5th EnKF Workshop, 23 May 2012 1

For an observation y and state variable x;Increments for N ensemble samples of x are:

Where is a sample regression coefficient,and is a localization.

Traditionally , but here there is no upper bound.


Definition of Localization


Empirical Localization

Have output from an OSSE.Know prior ensemble and truth for each state variable.



Have output from an OSSE.Know prior ensemble and truth for each state variable.Can get truth & prior ensemble for any potential observations.



Estimate localization for set of observations and subset of state variables.e.g. state variables at various horizontal distances from observations.



Example: how to localize impact of temperature observations (4 shown) on a U state variable that is between 600 and 800 km distant.



Given observational error variance, can compute expected ensemble mean increment for state.Plot this vs prior state truth - ensemble mean.



Do this for all state variables in subset.


Empirical LocalizationFind a least squares fit.

Slope is .

Least squares minimizes:

Same as minimizing

Posterior mean



Define set of all pairs (y, x) of potential observations and state variable instances in an OSSE.

(A state variable instance is defined by type, location and time).

Choose subsets of this set.



Find that minimizes the RMS difference between the posterior ensemble mean for x and the true value over this subset.

This can be computed from the output of the OSSE.

Can then use this localization in a new OSSE for all (y, x) in the subset.

Call the values of localization for all subsets anEmpirical Localization Function (ELF).


Lorenz-96 40-Variable ExamplesAssume all observations are located at a model grid point.

(Easier but not necessary).

Define 40 subsets of (y, x) pairs:x is 20 to the left, 19 to the left, … 1 to the left,colocated, 1 to the right, …, 19 to the right of y.


Computing ELFs

Start with a climatological ensemble.

Do set of 6000-step OSSEs.(only use last 5000 steps).

First has no localization.

Compute ELF from each.

Use ELF for next OSSE.

ELF1

ELF2

ELF3

ELF4

ELF5

No Localization


Evaluation Experiments

Start with a climatological ensemble.

Do 110,000 step assimilation, discard first 10,000 steps.

Adaptive inflation with 0.1 inflation standard deviation.

Many fixed Gaspari-Cohn localizations tested for each case.

Also five ELFs (or should it be ELVEs?).


Case 1: Frequent low-quality obs.

Identity observations.

Error variance 16.

Assimilate every standard model timestep.



N=20 Gaspari Cohn (GC) function with smallest time mean prior RMSE.



N=20 first ELF is negative for many distances, but minimum localization is 0 when this ELF is used.



Subsequent N=20 ELFs are less negative, smoother, closer to best GC.



N=20, best GC has half-width 0.2, time mean RMSE of ~1.03.



N=20, best GC has half-width 0.2, time mean RMSE of ~1.03.ELFs give RMSE nearly as small as this.



Similar results for smaller ensemble, N=10. Note larger RMSE, narrower best GC half-width.



Similar results for larger ensemble, N=40. Note smaller RMSE, wider best GC half-width.



N=40 ELFs have smaller time mean RMSE than best GC.



ELFs are nearly symmetric so can ignore negative distances.



ELF for smaller ensemble is more compact.



ELF for larger ensemble less compact, consistent with GC results.



ELFs for even bigger ensembles are broader, but noisier at large distances.


Case 2: Infrequent high-quality obs.


Error variance 1.

Assimilate every 12th standard model timestep.



For N=10, all ELF cases have smaller RMSE than best GC.



For N=20, first ELF is worse than best GC; all others better.Best GC gets wider as ensemble size grows.



For N=40, all ELFs have smaller RMSE.



N=10 ELF is non-Gaussian. Has local minimum localization for distance 1.



N=40 ELF is broader; also has local minimum for distance 1.Need a non-gaussian ELF to possibly do better than GC.


Case 3: Integral observations.

Each observation is average of grid point plus its nearest 8 neighbors on both side; total of 17 points.

(Something like a radiance observation.)



Each observation is average of grid point plus its nearest 8 neighbors on both side; total of 17 points.

(Something like a radiance observation.)

Error variance 1.


Very low information content:Assimilate 8 of these observations for each grid

point.Total of 320 observations per assimilation time.



ELFs are not very Gaussian. No values close to 1, two peaks at distance +/- 7.



ELFs are not very Gaussian. Best GC is much larger near the observation location.



RMSE is a more complicated function of the GC half-width in this case.



ELFs all have significantly smaller time mean RMSE than best GC.


Case 4: Frequent low-quality obs., imperfect model


Error variance 16.


Truth has forcing F=8 (chaotic).

Ensemble has forcing F=5 (not chaotic).



These are the localizations for the Case 1 perfect model.



Best GC is more compact for imperfect model case.Fifth ELF also more compact, but not as close to imperfect GC.


How long an OSSE does this take?

For large localization get good results with O(100) OSSE steps.Errors grow much more quickly for small localizations.


ConclusionsCan get estimates of good localization for any subset of observations and state variables from an OSSE.

If good localizations are non-Gaussian do better than Gaspari Cohn.

When Gaussian, can still be cheaper than tuning half-widths.

Can this be applied to real geophysical models?

How much could real applications be improved? Unclear…

Can localization functions be separable in large models?Loc(time diff) * Loc(horizontal dist.) * Loc(vertical dist.) * Loc(obs_type, state_type)???


Related Activities: Lili Lei Poster

Testing ELFs in global climate model (CAM),

and in WRF regional nested configuration.

Some results look very similar to earlier sampling error correction methods.

Next step, using ELFs in iterated OSSEs.


Empirical Localization Without Knowing Truth

Find that minimizes the RMSE between the posterior ensemble mean for x and observed value of x over the subset of (y, x) pairs.

This can be computed from the output of an assimilation.

Can then use this localization in a new assimilation for all (y, x) in the subset.

BUT, can only compute for pairs of OBSERVED quantities.

Can act as a way to calibrate OSSE results?


Case 1 without knowing truth


Error variance 16.


All state variables are observed, so no problem there.


Case 1 without knowing truth

Using real obs is much noisier for small localization values.Similar to using truth for larger localization values.Could be used to calibrate results from an OSSE for real assimilation use.

empirical localization of observation impact in ensemble filters jeff anderson image/dares thanks to...

Documents

th enkf workshop

empirical localization

values of localization

state variable x

subset of state variables

posterior mean slide

state variable instances

u state variable