what is data assimilation ? data assimilation: data assimilation seeks to characterize the true...
TRANSCRIPT
What is Data Assimilation ?
Data Assimilation: Data assimilation seeks to characterize the true state of an environmental system by combining information from measurements, models, and other sources.
Typical measurements for hydrologic/earth science applications:
• Ground-based hydrologic and geological measurements (stream flow, soil moisture, soil properties, canopy properties, etc.)
• Ground-based meteorological measurements (precipitation, air temperature, humidity, wind speed, etc.)
• Remotely-sensed measurements (usually electromagnetic) which are sensitive to hydrologically relevant variables (e.g. water vapor, soil moisture, etc.)
Mathematical models used for data assimilation:
• Models of the physical system of interest.
• Models of the measurement process.
• Probabilistic descriptions of uncertain model inputs and measurement errors.
A description based on combined information should be better than one obtained from either measurements or model alone.
State estimation -- System is described in terms of state variables, which are characterized from available information
Multiple data sources -- Estimates are often derived from different types of measurements (ground-based, remote sensing, etc.) measured at different times and resolutions.
State variables may fluctuate over a wide range of time and space scales -- Different scales may interact (e.g. small scale variability can have large-scale consequences)
Spatially distributed dynamic systems -- Systems are often modeled with partial differential equations, usually nonlinear.
Uncertainty -- The models used in data assimilation applications are inevitably imperfect approximations to reality, model inputs may be uncertain, and measurement errors may be important. All of these sources of uncertainty need to be considered in the data assimilation process.
The equations used to describe the system of interest are usually discretized over time and space -- Since discretization must capture a wide range of scales the resulting number of degrees of freedom (unknowns) can be very large.
Key Features of Environmental Data Assimilation Problems
State-Space Framework for Data Assimilation
State-space concepts provide a convenient way to formulate data assimilation problems. Key idea is to describe system of interest in terms of following variables:
• Input variables -- variables which account for forcing from outside the system or system properties which do not depend on the system state.
• State variables -- dependent variables of differential equations used to describe the physical system of interest, also called prognostic variables.
• Output variables -- variables that are observed, depend on state and input variables, also called diagnostic variables.
Classification of variables depends on system boundaries:
Precip.
Land
Atmosphere
Precip.
Land
Atmosphere
System includes coupled land and atmosphere -- precipitation and evapo-transpiration are state variables
System includes only land, precipitation and evapo-transpiration are input variables
ET ET
Components of a Typical Hydrologic Data Assimilation Problem
The data assimilation algorithm uses specified information about input fluctuations and measurement errors to combine model predictions and measurements. Resulting estimates are extensive in time and space and make best use of available information.
,...m iuztyMz iiii 1 ; )(ˆ ],,[ Measurement Eq:
)(0(0);τ],)(),([)( αyy 0t t ,τu,αyAty State Eq:
True
Output zi(e.g. radiobrightness)
Measurement system
Measured
State y (t)(e.g. soil moist.)
Time-invariant input (e.g. sat. hydr. cond.)
Hydrologic system
Specified (mean)
Random fluctuations
Random error,
Random fluctuations
Specified (mean)
True
True
Time-varying input u(t) (e.g. precip)
Data assimilation algorithm
Estimated states and outputs
Means and covariances of true inputs and output measurement errors
100 101 1021
1.5
2
2.5
3
3.5
90 95 100 105 110 115 120-2
-1
0
1
2
3
4
When models are discretized over time/space there are two sources of output measurement error:
• Instrument errors (measurement device does not perfectly record variable it is meant to measure).
• Scale-related errors (variable measured by device is not at the same time/space scale as corresponding model variable)
Types of Measurement Errors
When measurement error statistics are specified both error sources should be considered
Large-scale trend described by model
True value
Measurement
Instrument error
Scale-related error
* *
**
Types of Data Assimilation Problems - Temporal Aspects
Zi = [z1, z2, …, zi] =Set of all measurements through time ti
Smoothing: characterize system over time interval t t i
Use for reanalysis of historic data
t t2t1 ti
Filtering/forecasting: characterize system over time interval t t i
Use for real-time forecasting
tit2t1 t
Interpolation: no time-dependence, characterize system only at time t=t i
t=ti
Use for interpolation of spatial data (e.g. kriging)
Types of Data Assimilation Problems - Spatial Aspects
Downscaling: Characterize system at scales smaller than output measurement resolution
Upscaling: Characterize system at scales larger than output measurement resolution
143211 yyyyz
States (y1 … y4)
Measurement (z1 )
Measurements (z1 ...z4)
414
313
212
111
yz
yz
yz
yz
Downscaling and upscaling are handled automatically if measurement equation is defined approriately
State (y1)
Characterizing Uncertain Systems
What is a “good characterization” of the system states and inputs, given the vector Zi =
[z1, ..., zi] of all measurements taken through ti?
The posterior probability densities p(y| Zi) and p(u| Zi) are the ideal estimates since they
contain everything we know about the state y or input u given Zi.
Mode Mean
Std. Dev.
p[y(t)| Zi]
y(t)
In practice, we must settle for partial information about this density
u: p(u) p(u | Zi)
y: p(y) p(y | Zi)Prior Conditional
y = A(u)
Zi
• Variational DA: Derive mode of p[y(t)| Zi] by solving batch least-squares problem.
• Sequential DA: Derive recursive approximation of conditional mean (and covariance?) of p[y(t)| Zi]
The Variational/Batch Approach
is found with an iterative search. Search convergence is improved by the presence of the second (regularization) term in JB.
modeu
Most variational methods use the mode of p u|z(u| Zi) as an estimate of uncertain input
vector. State estimate is obtained by substituting into state equation:modeu
t ,modeuyAty ],)(ˆ),(ˆ[)(ˆ
u1
u2
If and u are multivariate normal is the value of that minimizes the following generalized least-squares error measure:
modeu u
]ˆ[]ˆ[ 2
1 )]ˆ(ˆ[)]ˆ(ˆ[
2
1 11 uuCuuuZZCuZZJ uuT
iiT
ii Terms that do not depend on u
The state equation is often incorporated as a constraint, using adjoint methods.
The Sequential Approach
Sequential methods are designed to propagate and update the conditional pdf in a series of discrete steps:
zi
Zi = [Zi-1 , zi ]
t0
z1
Z1 = [z1 ]
t1
z2
Z2 = [Z1 , z2 ]
t2 ti ti+1
Algorithm initialized with unconditional(prior) PDF at t0
p yi| zi-1[ yi|Zi-1 ] p y,i+1| zi[ yi+1|Zi ]
p yi| zi[ yi|Zi ]
Update i
Propagation
i to i+1p y0 [ y0]
p y1 [ y1 ]
p y1| z1[ y1|Z1 ]
Propagation 1 to 2
p y2| z1[ y2|Z1 ]
Update 1
Propagation 0 to 1
Meas. iMeas. 1 Meas. 2
In practice various approximations must be introduced.
Some Common Sequential Data Assimilation Methods
A common approximation is to assume that the conditional PDF is multivariate Gaussian. The update for conditional mean has the form:
)]([)|(ˆ)|(ˆ 1 iiii yMzKZtyZty
Some common approximations:
Direct Update forced to equal measurements where available,insertion interpolated from meas. elsewhere
Nudging: K = empirically selected constant
Optimal K derived from assumed (static) covarianceInterpolation:
Extended K derived from covariances propagated with a linearized Kalman filter: model, input fluctuations and measurement errors must be additive.
Ensemble K derived from a ensemble of random replicates propagatedKalman filter: with a nonlinear model, form of input fluctuations and measurement errors is unrestricted.
K weights measurements vs. model predictions
updatePropagated estimate
Example -- Microwave Measurement of Soil Moisture
L-band (1.4 GHz) microwave emissivity is sensitive to soil saturation in upper 5 cm. Brightness temperature decreases for wetter soils.
Objective is to map soil moisture in real time by combining microwave meas. and other data with model predictions (data assimilation).
0 0.2 0.4 0.6 0.8 10.5
0.6
0.7
0.8
0.9
1
saturation [-]
mic
row
ave
em
issi
vity
[-]
sandsiltclay
Case Study Area
Aircraft microwave measurements
SGP97 Experiment - Soil Moisture Campaign
Problem Specifications –SGP97 Ensemble Kalman Filter Example
• Hydrologic model: 1D (vertical) NOAH Land Surface Model (NOAA NCEP, Chen et al, 1996) applied at each estimation pixel
• Radiative Transfer Model: Jackson et al , 1999 model applied at each pixel
• Uncertain model inputs included in ensemble filter:
Time-varying inputs:
Precipitation (temporally uncorrelated)
Time-invariant inputs:
Porosity (upper bound on moisture content)
Wilting point (lower bound on moisture content)
Saturated hydraulic conductivity
Minimum stomatal resistance
Random fluctuations are multiplicative and lognormal (mean=1.0)
• Filter assumes that random fluctuations and measurement errors for different pixels are uncorrelated
• Random measurement errors included in ensemble filter:
Additive radiobrightness measurement noise
Relevant Time and Space Scales
Vertical Section
Soil layers differ in thickness
Note large horizontal-to-vertical scale disparity
5 cm
10 cm
Typical precipitation events
For problems of continental scale we have ~ 105 est. pixels, 105 meas, 106 states,
0.8 km
0.8 km
4.0 km
Plan View
Estimation pixels (large)Microwave pixels (small)
170 = 6/19/97
170 175 180 185 190 1950
0.005
0.01
0.015
0.02
0.025
mm
/s ** ** * * * ***** ** *
* = ESTAR observation
Some Typical Spatially Variable Model Inputs –SGP97 Example
0 0.05 0.1
Clay fraction
0 0.2 0.4 0.6 0.8
Sand fraction
RTM Inputs
0 2 4 6 8
NOAH soil class
0 2 4 6 8 10 12
NOAH vegetation class
NOAH Inputs
Meteor. Stations
El Reno
50 km
Estimation region ~ 50 by 200 km (12 by 50 pixels 4 km on a side)
Brightness Temperatures at a Typical Pixel – SGP97 Example
170 175 180 185 190 195200
210
220
230
240
250
260
270
280
290
Day of Year
Brightness Time Series -- El Reno
Brig
htn
ess
te
mp
. d
eg
. K
.
Brightness Temp. and Precip Time Series – El Reno
Conditional meanUnconditional mean
Brightness meas.Individual replicates
Pre
cip
Moisture Contents at a Typical Pixel – SGP97 Example
170 175 180 185 190 1950.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Day of Year
Saturation Time Series -- El Reno
Mo
istu
re c
on
ten
t
Unconditional mean
Individual replicatesBrightness meas. times
Local spatial average of gravitimetric meas.
Conditional mean
Moisture Content and Precip. Time Series – El RenoP
reci
p
Comparison of Some Data Assimilation Options
Direct insertion, nudging, optimal interpolation
• Easy to implement +• Updates do not account for system dynamics or input and measurement statistics –• No information on estimation accuracy –• Computationally efficient +
• Well-suited for real time applications, not optimal for smoothing problems +/-• Provides information on estimation accuracy +• Very flexible, modular, able to accommodate wide range of model error descriptions +• No need for adjoint model or for linearizations or other approximations during propagation step +• Approach is robust and easy to use +• Update assumes states are jointly normal –• Can be computationally demanding –
Ensemble Kalman filter
• Can be adapted for real time or smoothing problems +• Provides info. on estimation accuracy +• Computationally demanding, limited capability to deal with model errors -• Linearization approximation may be poor, tends to be unstable -
Extended Kalman filter
• Well-suited for smoothing problems, less convenient for real-time applications +/-• Does not provide information on estimation accuracy -• Difficult to accommodate time-dependent model errors, not robust –• Most efficient forms require derivation of an adjoint model -
Variational methods