what is data assimilation ? data assimilation: data assimilation seeks to characterize the true...

What is Data Assimilation ?

Data Assimilation: Data assimilation seeks to characterize the true state of an environmental system by combining information from measurements, models, and other sources.

Typical measurements for hydrologic/earth science applications:

• Ground-based hydrologic and geological measurements (stream flow, soil moisture, soil properties, canopy properties, etc.)

• Ground-based meteorological measurements (precipitation, air temperature, humidity, wind speed, etc.)

• Remotely-sensed measurements (usually electromagnetic) which are sensitive to hydrologically relevant variables (e.g. water vapor, soil moisture, etc.)

Mathematical models used for data assimilation:

• Models of the physical system of interest.

• Models of the measurement process.

• Probabilistic descriptions of uncertain model inputs and measurement errors.

A description based on combined information should be better than one obtained from either measurements or model alone.

State estimation -- System is described in terms of state variables, which are characterized from available information

Multiple data sources -- Estimates are often derived from different types of measurements (ground-based, remote sensing, etc.) measured at different times and resolutions.

State variables may fluctuate over a wide range of time and space scales -- Different scales may interact (e.g. small scale variability can have large-scale consequences)

Spatially distributed dynamic systems -- Systems are often modeled with partial differential equations, usually nonlinear.

Uncertainty -- The models used in data assimilation applications are inevitably imperfect approximations to reality, model inputs may be uncertain, and measurement errors may be important. All of these sources of uncertainty need to be considered in the data assimilation process.

The equations used to describe the system of interest are usually discretized over time and space -- Since discretization must capture a wide range of scales the resulting number of degrees of freedom (unknowns) can be very large.

Key Features of Environmental Data Assimilation Problems

State-Space Framework for Data Assimilation

State-space concepts provide a convenient way to formulate data assimilation problems. Key idea is to describe system of interest in terms of following variables:

• Input variables -- variables which account for forcing from outside the system or system properties which do not depend on the system state.

• State variables -- dependent variables of differential equations used to describe the physical system of interest, also called prognostic variables.

• Output variables -- variables that are observed, depend on state and input variables, also called diagnostic variables.

Classification of variables depends on system boundaries:

Precip.

Land

Atmosphere

Precip.

Land

Atmosphere

System includes coupled land and atmosphere -- precipitation and evapo-transpiration are state variables

System includes only land, precipitation and evapo-transpiration are input variables

ET ET

Components of a Typical Hydrologic Data Assimilation Problem

The data assimilation algorithm uses specified information about input fluctuations and measurement errors to combine model predictions and measurements. Resulting estimates are extensive in time and space and make best use of available information.

,...m iuztyMz iiii 1 ; )(ˆ ],,[ Measurement Eq:

)(0(0);τ],)(),([)( αyy 0t t ,τu,αyAty State Eq:

True

Output zi(e.g. radiobrightness)

Measurement system

Measured

State y (t)(e.g. soil moist.)

Time-invariant input (e.g. sat. hydr. cond.)

Hydrologic system

Specified (mean)

Random fluctuations

Random error,

Random fluctuations

Specified (mean)

True

True

Time-varying input u(t) (e.g. precip)

Data assimilation algorithm

Estimated states and outputs

Means and covariances of true inputs and output measurement errors

100 101 1021

1.5

2

2.5

3

3.5

90 95 100 105 110 115 120-2

-1

0

1

2

3

4

When models are discretized over time/space there are two sources of output measurement error:

• Instrument errors (measurement device does not perfectly record variable it is meant to measure).

• Scale-related errors (variable measured by device is not at the same time/space scale as corresponding model variable)

Types of Measurement Errors

When measurement error statistics are specified both error sources should be considered

Large-scale trend described by model

True value

Measurement

Instrument error

Scale-related error

* *

**

Types of Data Assimilation Problems - Temporal Aspects

Zi = [z1, z2, …, zi] =Set of all measurements through time ti

Smoothing: characterize system over time interval t t i

Use for reanalysis of historic data

t t2t1 ti

Filtering/forecasting: characterize system over time interval t t i

Use for real-time forecasting

tit2t1 t

Interpolation: no time-dependence, characterize system only at time t=t i

t=ti

Use for interpolation of spatial data (e.g. kriging)

Types of Data Assimilation Problems - Spatial Aspects

Downscaling: Characterize system at scales smaller than output measurement resolution

Upscaling: Characterize system at scales larger than output measurement resolution

143211 yyyyz

States (y1 … y4)

Measurement (z1 )

Measurements (z1 ...z4)

414

313

212

111

yz

yz

yz

yz

Downscaling and upscaling are handled automatically if measurement equation is defined approriately

State (y1)

Characterizing Uncertain Systems

What is a “good characterization” of the system states and inputs, given the vector Zi =

[z1, ..., zi] of all measurements taken through ti?

The posterior probability densities p(y| Zi) and p(u| Zi) are the ideal estimates since they

contain everything we know about the state y or input u given Zi.

Mode Mean

Std. Dev.

p[y(t)| Zi]

y(t)

In practice, we must settle for partial information about this density

u: p(u) p(u | Zi)

y: p(y) p(y | Zi)Prior Conditional

y = A(u)

Zi

• Variational DA: Derive mode of p[y(t)| Zi] by solving batch least-squares problem.

• Sequential DA: Derive recursive approximation of conditional mean (and covariance?) of p[y(t)| Zi]

The Variational/Batch Approach

is found with an iterative search. Search convergence is improved by the presence of the second (regularization) term in JB.

modeu

Most variational methods use the mode of p u|z(u| Zi) as an estimate of uncertain input

vector. State estimate is obtained by substituting into state equation:modeu

t ,modeuyAty ],)(ˆ),(ˆ[)(ˆ

u1

u2

If and u are multivariate normal is the value of that minimizes the following generalized least-squares error measure:

modeu u

]ˆ[]ˆ[ 2

1 )]ˆ(ˆ[)]ˆ(ˆ[

2

1 11 uuCuuuZZCuZZJ uuT

iiT

ii Terms that do not depend on u

The state equation is often incorporated as a constraint, using adjoint methods.

The Sequential Approach

Sequential methods are designed to propagate and update the conditional pdf in a series of discrete steps:

zi

Zi = [Zi-1 , zi ]

t0

z1

Z1 = [z1 ]

t1

z2

Z2 = [Z1 , z2 ]

t2 ti ti+1

Algorithm initialized with unconditional(prior) PDF at t0

p yi| zi-1[ yi|Zi-1 ] p y,i+1| zi[ yi+1|Zi ]

p yi| zi[ yi|Zi ]

Update i

Propagation

i to i+1p y0 [ y0]

p y1 [ y1 ]

p y1| z1[ y1|Z1 ]

Propagation 1 to 2

p y2| z1[ y2|Z1 ]

Update 1

Propagation 0 to 1

Meas. iMeas. 1 Meas. 2

In practice various approximations must be introduced.

Some Common Sequential Data Assimilation Methods

A common approximation is to assume that the conditional PDF is multivariate Gaussian. The update for conditional mean has the form:

)]([)|(ˆ)|(ˆ 1 iiii yMzKZtyZty

Some common approximations:

Direct Update forced to equal measurements where available,insertion interpolated from meas. elsewhere

Nudging: K = empirically selected constant

Optimal K derived from assumed (static) covarianceInterpolation:

Extended K derived from covariances propagated with a linearized Kalman filter: model, input fluctuations and measurement errors must be additive.

Ensemble K derived from a ensemble of random replicates propagatedKalman filter: with a nonlinear model, form of input fluctuations and measurement errors is unrestricted.

K weights measurements vs. model predictions

updatePropagated estimate

Example -- Microwave Measurement of Soil Moisture

L-band (1.4 GHz) microwave emissivity is sensitive to soil saturation in upper 5 cm. Brightness temperature decreases for wetter soils.

Objective is to map soil moisture in real time by combining microwave meas. and other data with model predictions (data assimilation).

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

saturation [-]

mic

row

ave

em

issi

vity

[-]

sandsiltclay

Case Study Area

Aircraft microwave measurements

SGP97 Experiment - Soil Moisture Campaign

Problem Specifications –SGP97 Ensemble Kalman Filter Example

• Hydrologic model: 1D (vertical) NOAH Land Surface Model (NOAA NCEP, Chen et al, 1996) applied at each estimation pixel

• Radiative Transfer Model: Jackson et al , 1999 model applied at each pixel

• Uncertain model inputs included in ensemble filter:

Time-varying inputs:

Precipitation (temporally uncorrelated)

Time-invariant inputs:

Porosity (upper bound on moisture content)

Wilting point (lower bound on moisture content)

Saturated hydraulic conductivity

Minimum stomatal resistance

Random fluctuations are multiplicative and lognormal (mean=1.0)

• Filter assumes that random fluctuations and measurement errors for different pixels are uncorrelated

• Random measurement errors included in ensemble filter:

Additive radiobrightness measurement noise

Relevant Time and Space Scales

Vertical Section

Soil layers differ in thickness

Note large horizontal-to-vertical scale disparity

5 cm

10 cm

Typical precipitation events

For problems of continental scale we have ~ 105 est. pixels, 105 meas, 106 states,

0.8 km

0.8 km

4.0 km

Plan View

Estimation pixels (large)Microwave pixels (small)

170 = 6/19/97

170 175 180 185 190 1950

0.005

0.01

0.015

0.02

0.025

mm

/s ** ** * * * ***** ** *

* = ESTAR observation

Some Typical Spatially Variable Model Inputs –SGP97 Example

0 0.05 0.1

Clay fraction

0 0.2 0.4 0.6 0.8

Sand fraction

RTM Inputs

0 2 4 6 8

NOAH soil class

0 2 4 6 8 10 12

NOAH vegetation class

NOAH Inputs

Meteor. Stations

El Reno

50 km

Estimation region ~ 50 by 200 km (12 by 50 pixels 4 km on a side)

Brightness Temperatures at a Typical Pixel – SGP97 Example

170 175 180 185 190 195200

210

220

230

240

250

260

270

280

290

Day of Year

Brightness Time Series -- El Reno

Brig

htn

ess

te

mp

. d

eg

. K

.

Brightness Temp. and Precip Time Series – El Reno

Conditional meanUnconditional mean

Brightness meas.Individual replicates

Pre

cip

Moisture Contents at a Typical Pixel – SGP97 Example

170 175 180 185 190 1950.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

Day of Year

Saturation Time Series -- El Reno

Mo

istu

re c

on

ten

t

Unconditional mean

Individual replicatesBrightness meas. times

Local spatial average of gravitimetric meas.

Conditional mean

Moisture Content and Precip. Time Series – El RenoP

reci

p

Comparison of Some Data Assimilation Options

Direct insertion, nudging, optimal interpolation

• Easy to implement +• Updates do not account for system dynamics or input and measurement statistics –• No information on estimation accuracy –• Computationally efficient +

• Well-suited for real time applications, not optimal for smoothing problems +/-• Provides information on estimation accuracy +• Very flexible, modular, able to accommodate wide range of model error descriptions +• No need for adjoint model or for linearizations or other approximations during propagation step +• Approach is robust and easy to use +• Update assumes states are jointly normal –• Can be computationally demanding –

Ensemble Kalman filter

• Can be adapted for real time or smoothing problems +• Provides info. on estimation accuracy +• Computationally demanding, limited capability to deal with model errors -• Linearization approximation may be poor, tends to be unstable -

Extended Kalman filter

• Well-suited for smoothing problems, less convenient for real-time applications +/-• Does not provide information on estimation accuracy -• Difficult to accommodate time-dependent model errors, not robust –• Most efficient forms require derivation of an adjoint model -

Variational methods

what is data assimilation ? data assimilation: data assimilation seeks to characterize the true...

Documents