data assimilation for complex subsurface flow fields · 2014-05-13 · 2014 politecnico di milano...
TRANSCRIPT
2014
POLITECNICO DI MILANO
Department of Civil and Environmental Engineering
Doctoral Programme in Environmental and Infrastructure Engineering
XXVI Cycle
DATA ASSIMILATION FOR COMPLEX SUBSURFACE
FLOW FIELDS
Marco PANZERI
Tutor: Prof. Alberto GUADAGNINI
Advisor: Prof. Monica RIVA
Co-advisor: Dr. Ernesto Luigi DELLA ROSSA
The Chair of the Doctoral Programme: Prof. Alberto GUADAGNINI
2014
POLITECNICO DI MILANO
Department of Civil and Environmental Engineering
Doctoral Programme in Environmental and Infrastructure Engineering
XXVI Cycle
DATA ASSIMILATION FOR COMPLEX SUBSURFACE
FLOW FIELDS
Doctoral dissertation of:
Marco PANZERI ________________________
Tutor:
Prof. Alberto GUADAGNINI ________________________
Advisor:
Prof. Monica RIVA ________________________
Co-advisor:
Dr. Ernesto Luigi DELLA ROSSA ________________________
The Chair of the Doctoral Programme:
Prof. Alberto GUADAGNINI ________________________
TABLE OF CONTENTS
Abstract ..................................................................................................................................1
Chapter 1. Introduction ....................................................................................................5
1.1 Background ..................................................................................................................5
1.2 Objectives and Outline ...............................................................................................13
Chapter 2. Data assimilation with the Kalman Filter ..........................................17
2.1 The filtering problem ..................................................................................................17
2.2 Forward step ..............................................................................................................19
2.3 Analysis step ...............................................................................................................20
Chapter 3. Kalman Filter coupled with stochastic moment equations of
transient groundwater flow ...........................................................................................25
3.1 Extended transient moment equations of groundwater flow ......................................25
3.2 Data assimilation of groundwater flow data via KF: MC-based EnKF and ME-
based approach ............................................................................................................30
3.3 Exploratory synthetic example of data assimilation and parameter estimation .........33
3.4 Comparison between MC-based EnKF and ME-based approach ..............................49
3.5 Conclusions.................................................................................................................66
Chapter 4. EnKF with complex geology ...................................................................69
4.1 Markov Mesh (MM) model ........................................................................................69
4.2 Theoretical formulation ..............................................................................................75
4.3 Synthetic example .......................................................................................................84
4.4 Conclusions...............................................................................................................104
Appendix A........................................................................................................................107
References ..........................................................................................................................115
Acknowledgements .........................................................................................................123
Estratto in italiano ..........................................................................................................125
1
Abstract
Proper modeling of subsurface flow and transport processes is key to the solution of a
wide range of engineering and environmental problems. Relevant applications include, e.g.,
the supply of fresh water for civil and industrial activities, the remediation of contaminated
aquifers or the protection of groundwater sources, the need for enhancing the recovery
efficiency of hydrocarbon reservoirs to face the ever increasing demand for energy resources,
the quantification of the risk linked to the geological disposals of nuclear wastes. Building a
subsurface flow model requires defining the spatial distribution of the input parameters
embedded in the underlying governing equations, such as permeability and porosity. Despite
the key role played by these petrophysical properties when modeling aquifer and oil
reservoirs, our knowledge of the way they are distributed within a domain of interest is scarce
in practical applications and often characterized by a high degree of uncertainty.
A well-established approach to tackle this problem is to work within a stochastic
framework, in which the permeability and the porosity fields are treated as random processes
of space. An inverse and/or data assimilation modeling framework is then employed for
conditioning these spatial distributions relying on either direct or surrogate measurements.
Among the various available inversion (or data assimilation) techniques, we focus on the
Ensemble Kalman Filter (EnKF) approach. EnKF is a data assimilation technique which is
employed to incorporate data into physical system models sequentially and as soon as they
are collected. EnKF is appropriate for large and nonlinear models of the kind required for
realistic subsurface fluid flow simulations and has traditionally entailed the use of a
(numerical) Monte Carlo (MC) approach to generate a collection of interdependent random
model representations.
Despite its increasing popularity, there are several drawbacks that undermine the
range of scenarios under which EnKF is applicable. A critical factor is the size of the
2
ensemble, i.e., the number of MC simulations employed for (ensemble) moment evaluation.
Whereas to estimate mean and covariance accurately requires many simulations, working
with large ensemble sizes and assessing MC convergence is computationally demanding.
Another common problem is that EnKF performs optimally only if the system variables (i.e.,
model parameters and state variables) can be described by a joint Gaussian distribution.
Modern reservoir models require to explicitly take into account the spatial distribution of
facies, which can be defined as distinctive and non-overlapping units forming the internal
architecture of the host rock system and which are associated with given attributes such as
porosity, permeability, mineralogy. Demarcation of diverse facies in a reservoir model is
usually accomplished through indicator functions. Due to the typically non Gaussian nature
of the latter, use of EnKF to update complex reservoir models can be fraught with severe
challenges.
The main objectives of this work are: (a) to couple EnKF with stochastic moment
equations (MEs) of transient groundwater flow to circumvent and alleviate problems related
to the finiteness of the ensemble employed in the traditional MC-based EnKF; and (b) to
develop an assimilation algorithm that is conducive to conditioning on a set of measured
production data the spatial distribution of lithofacies and of the associated petrophysical
properties for a collection of hydrocarbon reservoirs.
We propose to circumvent the need for MC through a direct solution of nonlocal
(integrodifferential) stochastic MEs that govern the space-time evolution of conditional
ensemble means (statistical expectations) and covariances of hydraulic heads and fluxes. The
purpose is to combine an approximate form of the stochastic MEs with EnKF in a way that
allows sequential updating of parameters and system states without a need for
computationally intensive MC analyses. We explore the resulting combined algorithm on
synthetic problems of two-dimensional transient groundwater flow toward a well pumping
3
water from a randomly heterogeneous confined aquifer subject to prescribed boundary
conditions. We investigate the effect of the error variances linked to available log-
conductivity data and the impact of assimilating hydraulic heads during the transient or the
pseudo-steady state regime on the quality of the calibrated mean of log-conductivity and head
fields as well as on the associated estimation variance. We also compare the performances
and accuracies of our ME- and MC-based EnKF on synthetic problems differing from each
other in the variance and (integral) autocorrelation scale of log-conductivity random fields.
We analyze the impact of the number of realizations employed in the MC-based EnKF and
the occurrence of filter inbreeding in the assimilations. We show that embedding MEs in the
EnKF scheme allows for computationally efficient real time estimation of system states and
model parameters avoiding the drawbacks which are commonly encountered in traditional
MC-based applications of EnKF. Our results confirm that a few hundred MC simulations are
not enough to overcome filter inbreeding issues, which have a negative impact on the quality
of log-conductivity estimates as well as on the predicted heads and the associated estimation
variances. Contrariwise, ME-based EnKF obviates the need for repeated simulations and is
demonstrated to be free of inbreeding issues.
We further illustrate a novel data assimilation scheme conducive to updating both
facies and petrophysical properties of a reservoir model set characterized by complex geology
architecture. The spatial distribution of facies is treated by means of a Markov Mesh (MM)
model coupled with a multi-grid approach, according to which geological patterns are
initially reproduced at a coarse scale and are subsequently generated on grids with increasing
resolution. This allows reproducing detailed facies geometries and spatial patterns distributed
on multiple scales. The assimilation algorithm is developed within the context of a history
matching procedure and is based on the integration of the MM model within the EnKF
workflow. We test the methodology by way of a two-dimensional synthetic reservoir model
4
in the presence of two distinct facies and representing a complex meandering channel system.
We show that the proposed inversion scheme is conducive to an updated collection of facies
and log-permeability fields which maintain the geological architecture displayed by the
reference model, as opposed to the standard EnKF. We test the prediction ability of the
realizations obtained through our procedure by means of two forecast scenarios, in which
diverse flow configurations are considered following the latest assimilation time. In our first
scenario, the same flow setting imposed in the course of the assimilation time is maintained
also during the additional simulation period. A second study is performed by considering the
presence of two additional wells that become operative after the assimilation period. The
approaches tested yield a good estimation of the target production values during the first
scenario, the predictions provided by standard EnKF being characterized by the highest
degree of uncertainty. The performances of the two methodologies are markedly different
when considering the second scenario where our proposed algorithm outperforms the
standard EnKF by providing a superior match between the reference and the predicted
production curves.
5
1. Introduction
1.1 Background
Our ability of properly modeling subsurface flow and transport phenomena upon
making use of diverse information content associated with the often limited amount of data
available has a considerable impact on several engineering, environmental and energy
applications. These include, e.g., the supply of fresh water for civil and industrial activities,
the remediation of contaminated aquifers or the protection of groundwater sources, the need
for enhancing the recovery efficiency in hydrocarbon reservoirs to face the ever increasing
demand for energy resources, the quantification of the risk linked to the geological disposals
of nuclear wastes.
The motion of fluids and contaminants in sedimentary aquifers and fractured rocks is
strongly influenced by the spatial distribution of the physical properties of the geological
media, such as permeability and porosity, which are often characterized by a high degree of
spatial heterogeneity. Despite the key role played by these properties in modeling aquifers
and oil reservoirs, in practical applications our knowledge of the way they are distributed
within a domain of interest is scarce and often characterized by a high degree of uncertainty.
For these reasons, providing reliable predictions of pressure, saturation or solute
concentration values at a given location of the considered domain and taking advantage of
diverse types of information to quantify the uncertainty associated with such predictions are
often complex tasks.
In the last decades, several techniques have been developed for estimating the spatial
distribution of petrophysical properties of underground reservoirs on the basis of either direct
or indirect/surrogate measurements, with the objective of improving our ability of predicting
the system response to anthropogenic or natural forcing terms. These techniques are often
referred to as inverse modeling approaches in the groundwater hydrology community or
6
history matching in the reservoir engineering literature. The quantification of the uncertainty
associated with a model prediction requires the adoption of a probabilistic approach. In this
context the model parameters are treated as spatially correlated random fields and the
resulting governing differential equations become stochastic, thus allowing the quantification
of the space-time evolution of the probability density function of a target state variable. There
are several flavors of inverse modeling procedures which can be employed in the context of
groundwater aquifers and petroleum reservoir modeling. In recent years hydrogeologists and
petroleum reservoir engineers have devoted increasing attention to the development of data
assimilation techniques based on the concepts embedded in the Kalman Filter (KF) approach.
Kalman Filter (KF) is a well-known data assimilation technique used to incorporate
data into physical system models sequentially and as they are collected. It was originally
introduced by Kalman [Kalman, 1960] to integrate data corrupted by white Gaussian noise in
linear dynamic models the outputs of which include additive noise which is also modeled as a
Gaussian random variable. KF entails two steps: (a) a forward modeling (or forecasting) step
that propagates system states in time until new measurements become available, and (b) an
updating step that modifies/updates system states optimally in real time on the basis of such
measurements. Some modern versions of KF update system states (e.g., hydraulic heads or
pressures) and parameters (e.g., permeabilities) jointly based on measurements of one or both
variables [e.g., Vrugt et al., 2005].
Gelb [1974] proposed an Extended Kalman Filter (EKF) to deal with nonlinear
system models. EKF linearizes the model and propagates the first two statistical moments of
target model variables in time. As such it is not suitable for strongly non-linear systems of the
kind encountered in the context of groundwater flow or transport in any but mildly
heterogeneous media. EKF further requires large amounts of computer storage which limits
its use to relatively small-size problems. Evensen [1994] and Burgers et al. [1998] proposed
7
to overcome these limitations through the use of Monte Carlo (MC) simulation. Their so-
called Ensemble Kalman Filter (EnKF) approach utilizes sample mean values and
covariances to perform the updating. The development of sensors and measuring devices
capable of recording massive amounts of data in real time has rendered EnKF popular among
hydrologists, climate modelers and petroleum reservoir engineers [Oliver and Chen, 2011;
Liu et al., 2012]; assimilating such rich data sets in batch rather than sequential mode, as is
common with classical inverse frameworks such as Maximum Likelihood, would not be
feasible. Applications of EnKF to groundwater and multiphase flow problems include the
pioneering works of McLaughlin [2002] and Naevdal et al. [2005]; recent reviews are
presented by Aanonsen et al. [2009], Oliver and Chen [2011] and Liu et al. [2012].
A crucial factor affecting EnKF is the size of the "ensemble", i.e., the number (NMC)
of MC simulations (sample size) employed for moment evaluation. Whereas to estimate
mean and covariance accurately requires many simulations, working with large NMC tends to
be computationally demanding. Chen and Zhang [2006] showed that a few hundred NMC
appear to provide accurate estimates of mean log-conductivity fields. They pointed out,
however, that obtaining covariance estimates of comparable accuracy would require many
more simulations, a task they had not carried through. Efforts to reduce the dimensionality of
the problem through orthogonal decomposition of state variables have been reported by
Zhang et al. [2007] and Zeng et al. [2011, 2012].
Small sample sizes give rise to filter inbreeding [Oliver and Chen, 2011] whereby
EnKF systematically understates parameter and system state estimation errors; rather than
stabilizing as they should, these errors appear to continue decreasing indefinitely with time,
giving a false impression that the quality of the parameter and state estimates likewise keeps
improving. There is no general theory to assess, a priori, the impact that the number NMC of
MC simulations would have on the accuracy of moment estimates. We do know, however,
8
that the sample mean of a random variable converges to the population mean at a rate
proportional to 1 NMC , the sample width of a normal variable's confidence interval
converges at a rate proportional to 1/NMC for large NMC, and this latter rate is modulated by
Chebyshev's inequality as detailed in Ballio and Guadagnini [2004] and references therein.
This is enough to conclude that increasing NMC by a factor of a few hundred, as is often
done, would likely not lead to marked improvements in accuracy. A practical solution is to
continue running MC simulations till the sample mean and variance stabilize or, if computer
time is at a premium, till their rates of change slow down markedly.
van Leeuwen [1999] showed theoretically that filter inbreeding is caused by (a)
updating a given set ("ensemble" or collection) of model output realizations with a gain
computed on the basis of this same set and (b) spurious covariances associated with gains
based on finite numbers NMC of realizations. Remedies suggested in the literature are
generally ad hoc. Houtkamer and Mitchell [1998] proposed splitting the set of MC runs into
two groups and updating each subset with a Kalman gain obtained from the other subset.
Hendricks Franssen and Kinzelbach [2008] proposed alleviating the adverse effects of filter
inbreeding by (a) dampening the amplitude of log-conductivity fluctuations, (b) correcting
the predicted covariance matrix on the basis of a comparison between the predicted ensemble
variance and the average absolute error at measurement locations, and (c) performing a large
number of realizations (in their case NMC = 1000) during the first simulation step and a
subset of realizations (NMC = 100) thereafter; a procedure similar to the latter was also
suggested in Wen and Chen [2007]. To select an optimal subset one would minimize some
measure of differences between cumulative sample distributions of hydraulic heads obtained
in the first step with (say) NMC = 1000 and NMC = 100. This, however, brings about an
artificial reduction in variance, as shown by Hendricks Franssen and Kinzelbach [2008].
Hendricks Franssen and Kinzelbach [2008] obtained best results with a combination of all
9
three techniques. Hendricks Franssen et al. [2011] observed filter inbreeding when analyzing
variably saturated flow through a randomly heterogeneous porous medium with NMC = 100
even after dampening log-conductivity fluctuations by a factor of 10. Several authors [e.g.,
Wang et al., 2007; Anderson, 2007; Liang et al., 2012; Xu et al., 2013] have noted a reduction
in filter inbreeding effects through covariance localization and covariance inflation.
Covariance localization is achieved upon multiplying each element of the updated state
covariance matrix by an appropriate localization function to reduce the effect of spurious
correlations [Houtekamer and Michell, 1998; Furrer and Bengtsson, 2007]. In the covariance
inflation methods, the forecast ensemble is inflated through multiplication of each state by a
constant or variable factor [e.g., Wang and Bishop, 2003; Liang et al., 2012; Xu et al., 2013].
Another common problem encountered in the application of the EnKF is related to the
assumption that the system variables (i.e., model parameters and state variables) can be
described by a joint Gaussian distribution. Most current reservoir models require to explicitly
take into account the spatial distribution of facies, which can be defined as distinctive and
non-overlapping units of the host rock system with specified characteristics such as porosity,
permeability, mineralogy. Like petrophysical properties, facies can often be inferred from
well logs at well locations. As is often the case, their spatial distribution between wells is
highly uncertain. A common procedure to distinguish between diverse facies in a reservoir
model is to employ indicator functions, which by their nature cannot be represented by
Gaussian distributions. This implies that using EnKF to update these types of complex
reservoir models can be problematic.
The common approach which is found in the literature is based on the transformation
of the diverse facies types into intermediate random fields that are described by Gaussian
distributions. Liu and Oliver [2005a, 2005b] used a transformation based on the truncated
pluri-Gaussian method [Le Loc'h and Galli, 1997] and focused on the estimation of the
10
boundaries between the diverse facies. They did not consider within-facies variability of
attributes (i.e., porosity and permeability) and assigned a deterministic value of permeability
and porosity to each facies type (thus, considering only across-facies variability). They
adopted two Gaussian fields and three thresholds to model the spatial distribution of three
geologic units. The key point of the work of Liu and Oliver [2005a] is to consider two
truncated Gaussian fields with fixed truncation thresholds as static parameters (i.e.,
parameters that do not vary with time during a flow simulation, such as permeability and
porosity in the absence of consolidation processes or geochemical reactions) in the state
vector to be estimated through inversion. This would overcome the problem of updating a
discrete variable (facies distribution in the domain of interest) by representing the latter by
means of two continuously distributed random processes. They applied the truncated pluri-
Gaussian method to match both hard (i.e., direct facies observations) and production data.
Disadvantages of the truncated pluri-Gaussian method include the difficulty in determining
truncation maps and structural properties of the Gaussian random fields that are suitable to
describe the internal architecture of highly complex reservoirs in terms of a small number of
truncation parameters. Moreover, although the underlying model parameters are multivariate
Gaussian, the relationship between observations and model parameters is highly non-linear
and causes the appearance of apparently unphysical updates during the assimilation step. For
this reason, iterative methods, where only the static parameters are updated and the system
states are obtained by re-starting the flow simulation from the previous assimilation time, are
often employed to alleviate these drawbacks [Aanonsen et al., 2009].
Moreno and Aanonsen [2007] proposed to combine the level set method with the
EnKF. The level set method relies on a suitable level-set function which is an implicit
representation of a given surface that is defined as the set of points at which the function
vanishes. More specifically, if a given facies in a background medium is defined on a certain
11
domain (support), the level set function is defined as the signed distance to the domain
boundary. This distance is positive inside and negative outside of the boundary separating the
facies from the background medium. The spatial dynamics of the level set function which is
deforming during data acquisition is modeled by the convection equation. Moreno and
Aanonsen [2007] assumed that the velocity field governing the evolution of the level set was
defined as a Gaussian random field and included it in the state vector to be estimated/updated.
Chang et al. [2010] improved the application of level set functions to EnKF by employing a
parameterization based on the concept of representing nodes (i.e., these are also called master
points or pilot points in the literature). Contrary to Moreno and Aanonsen [2007], they
considered only the values of the level set function at a set of so-called representing nodes as
variables of the state vector to be estimated. The values of the level set function at grid nodes
different from the selected representing nodes are obtained by linear interpolation. This
allowed these authors to alleviate the non-uniqueness associated with the identifiability of the
level set function. If the distance between representing nodes is properly chosen, these are
uncorrelated or weakly correlated and can be treated as independent from each other. The
authors applied their methodology to diverse synthetic case studies with two or three different
facies units, where the rock properties such as porosity and permeability were assumed as
constant within the same unit. Although the work of Chang et al. [2010] has introduced
important improvements in the application of the level set method to EnKF, it is not clear
how level set methodologies enables one to capture complex geological constraints of the
kind required for reproducing realistic facies geometries.
Jafarpour and McLaughlin [2008] introduced the use of the Discrete Cosine
Transform (DCT) method to history matching applications in reservoir models with complex
geology. The Discrete Cosine Transform (DCT) is a Fourier-related transform. It uses
orthonormal cosine basis functions to represent an image that, in the case we examine, can
12
correspond to the spatial distribution of state variables or model parameters. This method has
been proposed by Ahmed et al. [1974] for signal decorrelation and has been used in the
context of several applications in other fields (mostly in the context of audio and image
compression, e.g., Rao et al. [1990]). The powerful compression property of DCT allows for
retaining only a few basis functions in comparison to the total number of grid nodes. The
authors modified the EnKF scheme in such a way that the coefficients of the retained cosine
basis functions representing the spatial distribution of the state variables and model
parameters are updated to describe the distribution of the target quantities. Using DCT
parameterization contributes to dramatically reduce the dimension of the state vector and can
also mitigate the loss of structural continuity that can be observed in the context of other
approaches when the updating step is performed with reference to each node of the numerical
grid. This result can be achieved because DCT emphasizes large scale (associated with low
frequency component) rather than small scale features. One of the synthetic test cases
presented by Jafarpour and McLaughlin [2008] was a two-facies reservoir model. In this
scenario the methodology was conducive to a correct identification of large scale structures,
in the shape of elongated continuous channels, embedded in the reference fields. These
results highlighted the ability of this parameterization method to account for relatively
complex geological structures within a reservoir model.
Other proposed schemes include the work of Dovera and Della Rossa [2011], where a
composite medium is described through a multimodal density of system parameters. The
multimodality of prior parameter fields is taken into account through the theory of Gaussian
mixture (GM) models. Gaussian mixture models are based on the idea that the probability
density function (pdf) of the model parameters can be parametrically described as weighted
sums of Gaussian pdfs. The authors derived a novel set of EnKF updating equations and
coupled it with the expectation-maximization (EM) method for the evaluation of the weights
13
of the prior GM. The authors compared the performance of their method against the
traditional EnKF formulation by means of a synthetic case and concluded that their scheme
allowed obtaining an improved evaluation of the posterior distribution of the forecast
production.
Jafarpour and Khodabakhshi [2011] proposed a Probability Conditioning Method
(PCM) for conditioning the facies distribution of a collection of reservoir models in which a
deterministic value of permeability and porosity is assigned to each facies type. They
employed the EnKF scheme to update the sample mean values of the log-permeability field.
The updated mean values are then used to infer information about the distribution of facies
probabilities through the PCM. This method consists of converting a value of log-
permeability mean to a value of probability of facies occurrence through a simple linear
mapping function. The updated probability map is then combined with the snesim algorithm
[Strebelle, 2002] to simulate a new collection of facies realizations. The realizations are
therefore conditioned on the updated probability maps as well as on the production data. The
updated saturation and pressure fields are obtained by re-running the simulation from the
initial time to ensure consistency with the updated permeability fields. This methodology was
used to successfully condition the categorical permeability fields in an ensemble of synthetic
reservoirs with two or three different facies, and was demonstrated to outperform the EnKF
in the quality of the calibrated models and in the accuracy of the model forecast.
1.2 Objectives and outline
The main objectives of this work are: (a) to couple the updating step of EnKF with the
stochastic moment equations of transient groundwater flow to circumvent and alleviate
problems related to the finiteness of the ensemble employed in the traditional MC-based
EnKF, and (b) to develop an assimilation algorithm that is conducive to conditioning the
spatial distribution of the facies and of the associated petrophysical properties of a collection
14
of hydrocarbon reservoirs on a set of measured production data. The dissertation is structured
according to the objectives outlined above.
In Chapter 2, we cast the updating equations of the Kalman Filter in a Bayesian
context. The derivation follows the work of Cohn [1997] and provides the theoretical ground
on which all KF-based assimilation techniques are based. In this Chapter it is shown that
assuming that the prior distribution of the model variables and the distribution of the
measurement errors be multivariate normal leads to a Gaussian posterior distribution of the
model variables conditioned on the measured data. The mean vector and the covariance
matrix of the posterior distribution are precisely those which are determined through the
updating equations of the KF.
In Chapter 3, we propose to circumvent the need for MC through a direct solution of
nonlocal (integrodifferential) stochastic MEs that govern the space-time evolution of
conditional ensemble means (statistical expectations) and covariances of hydraulic heads and
fluxes [Tartakovsky and Neuman, 1998; Ye et al., 2004]. Such MEs have been used
successfully to analyze steady state and transient flows in randomly heterogeneous media
conditional on measured values of medium properties. Second-order approximations of these
equations have yielded accurate predictions of complex flows in heterogeneous media with
unconditional variances of (natural) log-hydraulic conductivity as high as 4.0 [Guadagnini
and Neuman, 1999].
Hernandez et al. [2003, 2006] and Riva et al. [2009] developed batch geostatistical
inverse algorithms that enable one to condition flow predictions further on measured values
of state variables (heads and fluxes) for steady state and transient flows, respectively. A field
application is described in the work of Bianchi Janetti et al. [2010]. This approach yields
Maximum Likelihood (ML) estimates of hydraulic conductivity, variogram parameters and
measurement error statistics. Parameter estimation entails the minimization of a log-
15
likelihood function which in turn requires the computation of a sensitivity matrix. The latter
step tends to be computationally intensive, especially in the case of large parameter vectors
[e.g., Alcolea et al., 2006; Riva et al., 2010].
Our purpose is to combine approximate forms of nonlocal, conditional stochastic MEs
with EnKF in a way that allows sequential updating of parameters and system states without
a need for computationally intensive ML or MC analyses. We extend the ME formulation of
Ye et al. [2004] in a way that renders it compatible with EnKF. We explore the resulting
combined algorithm on synthetic problems of two-dimensional transient groundwater flow
toward a well pumping water from a randomly heterogeneous confined aquifer subject to
prescribed head and flux boundary conditions. We investigate the effect of the error variances
linked to the measurements of log-conductivities and the impact of assimilating hydraulic
heads during the transient or the pseudo-steady state regime on the quality of the calibrated
mean of log-conductivity and head fields and on the associated estimation variance. In
addition, we compare the performances and accuracies of our ME- and the traditional MC-
based EnKF on synthetic problems differing from each other in the variance and (integral)
autocorrelation scale of the (natural) logarithm of hydraulic conductivities. We analyze the
impact of the number of realizations employed in the MC-based EnKF and the occurrence of
filter inbreeding in the performed assimilations.
In Chapter 4, we illustrate a novel inversion scheme which allows conditioning the
geological and petrophysical properties of a collection of reservoir realizations on the basis of
a set of production data. First, we present a Markov Mesh model [Stien and Kolbjørnsen,
2011] that is used to (a) describe the spatial distribution of the geological properties and (b)
reproduce their complex spatial arrangement. The MM model is coupled with a multi-grid
approach [Kolbjørnsen et al., 2013], according to which the geological patterns are initially
reproduced at a coarse scale, and are subsequently generated on increasingly finer grids. This
16
methodology allows reproducing (geological) patterns distributed on different scales. The
proposed inversion scheme is based on a three step algorithm. First, the EnKF scheme is
employed to update the sample mean of the lithofacies spatial distribution. A new collection
of facies realizations is then generated via a Markov Mesh (MM) model. During this step the
equation used to calculate the conditional probability of occurrence of a given lithotype in
each element of the computational grid ensures that the mean facies distribution obtained at
the previous step is honored. In the third step, the petrophysical properties of each reservoir
model in the collection are updated through a proposed modification of the EnKF scheme.
The cross-covariance between production data and log-permeabilities is estimated
considering the updated spatial distribution of lithofacies computed at the previous step.
Updating of log-permeabilities in a given realization relies on the estimation of the sample
cross-covariance between production data and log-permeabilities associated with a given
reference block in the reservoir upon considering only the members of the collection where
the same facies of the reference element considered in the target model realization occurs.
We test the proposed methodology by way of a two-dimensional synthetic reservoir
model in the presence of two distinct facies and representing a complex meandering channel
system. We analyze the accuracy and computational efficiency of our algorithm and
demonstrate its benefit with respect to the standard EnKF in terms of improved prediction
ability and use of information for the quantification of the uncertainty associated with the
forecast production.
17
2. Data assimilation with the Kalman Filter
In this Chapter we present a derivation of the Kalman Filter (KF) equations in the
context of Bayesian inference theory. The objective is to provide a theoretical framework and
to introduce the basic concepts that will become useful in the following Chapters. The KF
algorithm developed by Kalman [1960] considers a linear model dynamic the output of which
is corrupted with additive Gaussian noise with zero mean and given covariance matrix. Here
we intentionally restrict our discussion to the case of an exact model (i.e., without error) and
focus on techniques that extend the classical KF scheme to models characterized by nonlinear
dynamics of the kind required to model realistic subsurface fluid flow scenarios. In Section
2.1 we define the filtering problem in the framework of data assimilation. The solution of this
problem is achieved by means of two sequential steps, respectively termed forward and
analysis and described in Sections 2.2 and 2.3.
2.1 The filtering problem
The model dynamics describing groundwater and (in general) subsurface multiphase
flow consist of a system of (typically coupled) nonlinear partial differential equations
(PDEs). Let kTy be the vector containing the yN model variables of the system under study
evaluated at time kT at a finite number of discretization nodes (or elements) of a numerical
grid. This vector can include model parameters (i.e., permeability and porosity), state
variables (i.e., pressure, saturation) and production data (i.e., well flow rates, water cut,
bottom hole pressure). Model dynamics can be described through the non-linear operator ,
which yields the solution at time kT , kTy , given the model state at an earlier time 1kT , 1kT y
1k kT T y y (2.1)
18
Uncertainty associated with model parameters renders the system state kTy random, and
allows describing it by means of the probability density function (pdf) kTf y . The time
evolution of tf y within the time interval 1k kT T is governed by a set of stochastic
PDEs associated with (in general) random initial condition 1kTf y .
We introduce the following model governing system observations
k k k kT T T T
dd H y ε (2.2)
where kTd is a vector of size 1dN containing all Nd measurements available at time kT , the
matrix kTH of size d yN N is a linear operator mapping the model variables kT
y into their
observed counterparts and kT
dε is a random vector containing the dN measurement errors.
Typically, kT
dε is assumed to be normally distributed, unbiased and with known covariance
matrix
,1k
d
T
Ndε 0 (2.3)
k k k k kT T T T T
d d d d εεε ε ε ε Σ (2.4)
where ,1dN0 is a vector of size 1dN with all elements equal zero, denotes expectation
and the superscript ‘+’ stands for transpose. A common assumption is that the measurement
error vectors at different times are uncorrelated
,l k
d d
T T
N N
d dε ε 0 for k l (2.5)
The filtering problem in data assimilation is posed as the problem of describing the
evolution in time of the conditional pdf, k kT Tf y D , where the matrix kT
D denotes the set of
all observations available up to time kT , i.e.
1 2, , ,k kT TT T D d d d (2.6)
19
The filtering problem is solved by means of two sequential steps. The forward step (Section
2.2) consists on propagating in time the conditional density available at time 1kT ,
1 1k kT Tf y D , towards the corresponding pdf at time kT , 1k kT T
f y D . The latter is then
conditioned on the measurement vector kTd by means of the analysis step (Section 2.3),
allowing the evaluation of 1 ,k k k k kT T T T Tf f y D d y D .
2.2 Forward step
Suppose one is given the conditional density 1 1k kT Tf y D . The forward step requires
the evaluation of the density 1k kT Tf y D . This entails solving a stochastic PDE within the
time interval 1k kT T and subject to the random initial conditions embodied in
1 1k kT Tf y D . If the model dynamics are linear, as it is assumed in the classical KF, the
evaluation of the mean vector and of the covariance matrix associated with 1k kT Tf y D is
straightforward [Cohn, 1997]. This assumption is generally not valid in the field of
groundwater and multiphase flow in porous media, where the model dynamics can be highly
non-linear. In these cases, the solution of the forward step requires a diverse approach.
A possible strategy is to resort to Monte Carlo (MC) simulation. This technique is
based on representing the pdf 1 1k kT Tf y D through a collection of model realizations.
Propagating each member of the collection within the time interval 1k kT T using the
forward model operator (2.1) allows approximating the pdf 1k kT Tf y D at time kT and
estimating its statistical moments through the corresponding sample moments.
Monte Carlo simulations are not the only available strategy for the solution of the
forward step in the presence of non-linear system dynamics. As will be explored in this
dissertation (Chapter 3), an alternative could be formulating a system of (ensemble) moment
20
equations (MEs) which describe the temporal evolution of the statistical moments (typically,
mean and covariances) of the pdf 1kTtf y D , 1k kT t T . The format of these moment
equations is determined by the model dynamic expressed by (2.1), and should be derived ad
hoc depending on the specific context. This issue will be further explored in Chapter 3, where
a set of approximated equations describing the temporal evolution of the first and second
moment of the target pdf for a groundwater flow model will be presented and embedded in
the KF scheme.
2.3 Analysis step
The objective of the analysis step is to calculate the conditional pdf k kT Tf y D given
the density 1k kT Tf y D . Application of Bayes’ theorem allows writing
1 1
1
1
,,
k k k k k
k k k k k
k k
T T T T T
T T T T T
T T
f ff f
f
y D d D y
y D y D dd D
(2.7)
Since kTd (given kT
y ) depends only on kT
dε , which in turn is independent of 1kT D because of
(2.5), the following simplification holds
1 ,k k k k kT T T T Tf f d D y d y (2.8)
and (2.7) can be rewritten as
1
1
k k k k
k k
k k
T T T T
T T
T T
f ff
f
y D d y
y Dd D
(2.9)
The function 1k kT Tf y D is also termed the forward pdf, and is here denoted as
, kf Tf y . Following the work of [Cohn, 1997], we consider this density to be multivariate
Gaussian so that it can be parameterized through the corresponding mean vector and the
covariance matrix. These are respectively defined as
21
1 ,k k kT T f T y D y (2.10)
, , , , ,k k k k kf T f T f T f T f T
yy
y y y y Σ (2.11)
The density function of , kf T
y is then equal to
11 22, , , , , , ,1
2 exp2
yk k k k k k kNf T f T f T f T f T f T f T
f
yy yyy Σ y y Σ y y (2.12)
where denotes matrix determinant. By virtue of (2.2)-(2.4), the mean vector and the
covariance matrix of the likelihood function k kT Tf d y can be written as
k k k k k k k kT T T T T T T T
dd y H y ε y H y (2.13)
k k k k k k k k k k kT T T T T T T T T T T
d d εε
d d y d d y y ε ε y Σ (2.14)
Since kT
dε is considered to be normally distributed, the density k kT T
f d y is also Gaussian
and given by
11 22 1
2 exp2
dk k k k k k k k k kNT T T T T T T T T T
f
εε εεd y Σ d H y Σ d H y (2.15)
We can then employ (2.2) to define the mean vector and the covariance matrix of the pdf
1k kT Tf d D as
1 1 1 ,k k k k k k k k k k kT T T T T T T T T T f T d
d D H y ε D H y D H y (2.16)
1 1 1k k k k k k kT T T T T T T
d d D d d D D
1, ,k k k k k k k k kT T f T T T T f T T T
d dH y y ε H y y ε D
,k k k kT f T T T
yy εε
H Σ H Σ (2.17)
Since kT
dε and kT
y were assumed to be normally distributed, the density 1k kT Tf d D is also
Gaussian and can be written as
22
1
1 22 ,
2 dk k k k k kNT T T f T T T
f
yy εεd D H Σ H Σ
1
, , ,1exp
2k k k k k k k k k kT T f T T f T T T T T f T
yy εε
d H y H Σ H Σ d H y (2.18)
Substitution of (2.12), (2.15) and (2.18) into (2.9) yields the target posterior density
function, k kT Tf y D , also termed as the updated pdf and denoted as , ku T
f y
11 22, , , , , , ,1
2 exp2
yk k k k k k kNu T u T u T u T u T u T u T
f
yy yyy Σ y y Σ y y (2.19)
Here, the mean vector , ku Ty and the covariance matrix , ku T
yyΣ are evaluated using the
following set of equations
, , ,k k k k k ku T f T T T T f T y y K d H y (2.20)
, ,k k k k
y
u T T T f T
N yy yy
Σ I K H Σ (2.21)
1
, ,k k k k k k kT f T T T f T T T
yy yy εεK Σ H H Σ H Σ (2.22)
where yNI is the identity matrix of size y yN N , and the matrix kT
K is called the Kalman
gain. The complete set of details of the mathematical derivation of (2.20)-(2.22) can be found
in Cohn [1997] or in Tarantola [2005].
The set of equations (2.20)-(2.22) allows defining the updated pdf as a function of the
mean vector and of the covariance matrix of the forward and measurement error density
functions. The updated moments of the target pdf are then used to characterize the initial
conditions in the subsequent forward step, consisting in the evaluation of the density
1k kT Tf y D . When KF is coupled with MC simulation, the pdf of
, kf Ty is approximated
through a collection of model realizations, as discussed in Section 2.2. In this case equations
(2.20) - (2.22) do not enable one to evaluate directly the updated realizations, which are
employed to approximate the posterior pdf at time kT , , ku Tf y , and constitute the initial
23
conditions to the solution of the flow problem during the subsequent forward step. If we
indicate the collection of forward model realizations at time kT as , kf T
iy , 1, ,i NMC
(NMC being the number of Monte Carlo iteration in the sample), then the updated
realizations, , ku T
iy , can be calculated through [Evensen, 1994; Burgers et al., 1998]
, , ,ˆk k k k k ku T f T T T T f T
i i i i y y K d H y 1, ,i NMC (2.23)
Here, kT
id is a randomized measurement vector defined as
,k k kT T T
i i dd d ε 1, ,i NMC (2.24)
where ,kT
idε is a random vector having a Gaussian distribution with zero mean and covariance
matrix kT
εεΣ . In (2.24), the empirical Kalman gain matrix ˆ kT
K , given by
1
, ,ˆ ˆ ˆ ˆk k k k k k kT f T T T f T T T
yy yy εεK Σ H H Σ H Σ (2.25)
is evaluated employing the empirical covariance matrices, defined as
, ,
1
1k k
NMCf T f T
i
iNMC
y y (2.26)
, , , , ,
1
1ˆ1
k k k k k
NMCf T f T f T f T f T
i i
iNMC
yyΣ y y y y (2.27)
1
1k k
NMCT T
i
iNMC
d d (2.28)
1
1ˆ1
k k k k k
NMCT T T T T
i i
iNMC
εεΣ d d d d (2.29)
The system (2.24)-(2.30) yields the updating equations used in the Ensemble Kalman Filter
(EnKF).
Equations (2.24) - (2.30) ensure that the elements of the collection , ku T
iy , 1, ,i NMC are
realizations of the posterior distribution defined in (2.19). Evaluating the sample mean, , ku T
y ,
and the sample covariance matrix, ,ˆ ku T
yyΣ , of the updated model realizations
24
, , ,ˆk k k k k ku T f T T T T f T y y K d H y (2.30)
, , , , ,
1
1ˆ1
k k k k k
NMCu T u T u T u T u T
i i
iNMC
yyΣ y y y y
, , ,
1
1 ˆ ˆ1
k k k k k k k
y
NMCT T f T f T T T f T
N i i
iNMC
I K H y y K d d
, , ,ˆ ˆk k k k k k k
y
T T f T f T T T f T
N i i
I K H y y K d d
, ,ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆk k k k k k k k k k k
y y y
T T f T T T T T T T T f T
N N N
yy εε yyI K H Σ I K H K Σ K I K H Σ (2.31)
and comparing (2.30) - (2.31) with (2.20) - (2.22) show that in the limit of infinite sample
size the empirical moments of the updated collection converge to their corresponding
theoretical counterparts.
The state vector , kf T
y contains model parameters, state variables and production data
in most of the KF-based applications performed in the context of data assimilation in
groundwater and subsurface multiphase flow models. In these cases, assuming that the
density of , kf T
y is multivariate normal (see (2.12)) is in general sub-optimal because of the
non-linear relationship between the elements of the model state vector. For this reason the
solution obtained by means of the updating equations (2.20)-(2.22) can be considered only an
approximation of the true system state. One of the main drawbacks related to this
approximation is the appearance of unphysical updates, for which the updated model
variables do not satisfy mass conservation or saturations are associated with values which can
be negative or larger than unity.
25
3. Kalman Filter coupled with stochastic moment equations of
transient groundwater flow
This Chapter focuses on data assimilation in models of transient groundwater flow in
randomly heterogeneous media via Kalman Filter. We propose to solve the forward step
entailed in the Kalman Filter scheme through a direct solution of approximate nonlocal
(integrodifferential) moment equations (ME) that govern the space-time evolution of
conditional ensemble means (statistical expectations) and covariances of hydraulic heads and
fluxes. This procedure allows circumventing the need for computationally intensive Monte
Carlo (MC) simulation.
In Section 3.1 we extend the ME formulation of Ye et al. [2004] in a way that renders
it compatible with KF. Section 3.2 describes the key steps of the assimilation procedure
performed using the common MC-based EnKF as well as our new ME-based version. Section
3.3 explores the feasibility and accuracy of the proposed algorithm on a synthetic problem of
two-dimensional transient groundwater flow toward a well pumping water from a randomly
heterogeneous confined aquifer subject to prescribed head and flux boundary conditions. In
Section 3.4 the same flow setting is considered for nine heterogeneous systems differing from
each other in the variance and integral scale of the log-hydraulic conductivity field. A
detailed comparison of the performances and accuracies of ME- and MC-based EnKF is
presented and results and implications are discussed in Section 3.5.
3.1 Extended transient moment equations of groundwater flow
We consider transient groundwater flow in a saturated domain governed by
stochastic partial differential equations of mass balance and Darcy’s law
,, ,S
h tS t f t
t
xq x x x (3.1)
26
, ,t K h t q x x x x (3.2)
subject to initial and boundary conditions
0, 0h t H x x x (3.3)
, ,h t H tx x Dx (3.4)
, ,t Q t q x n x x Nx (3.5)
where ,h tx is hydraulic head and , tq x the Darcy flux vector at point , tx in space-
time, K x is an autocorrelated random field of scalar hydraulic conductivities, SS is
specific storage treated here as a deterministic constant, 0H x is (generally) a random
initial head field, ,f tx is (generally) a random source function of space and time, ,H tx
and ,Q tx are (generally) random head and normal flux conditions on Dirichlet boundaries
D and Neumann boundaries N , respectively, and n is a unit outward normal to N .
The Laplace transform of a function g t is defined as
0
tg e g t dt
(3.6)
where is a complex Laplace parameter. Taking the Laplace transform of (3.1) - (3.5)
yields the transformed flow equations
0, , ,S SS h f S H x x q x x x x x (3.7)
, ,K h q x x x x (3.8)
, ,h H x x Dx (3.9)
, ,Q q x n x x Nx (3.10)
27
Each random quantity in (3.7) - (3.10) can be written as the sum of its (conditional)
ensemble mean (statistical expectation) and a zero-mean random fluctuation about that mean
such that
K K K x x x (3.11)
, , ,h h h x x x (3.12)
, , , q x q x q x (3.13)
Ye et al. [2004] present and solve numerically non-local conditional stochastic MEs
satisfied by the mean and covariance of h and q and by the cross-covariance between h and
K for a special case in which all forcing terms ( f , 0H , H , and Q ) are uncorrelated with
each other and/or with K . To embed (3.7) - (3.10) in the KF scheme, the total simulation
period is segmented into a sequence of time intervals according to the number of time steps at
which measurements need to be assimilated. We solve the MEs within each time interval
1k kT T and treat the updated moments of h (and flux) at time 1kT as initial condition.
The MEs of Ye et al. are therefore extended in a way that takes these cross-correlations into
account.
Like Ye et al. [2004], we render the exact MEs workable by expanding them to
second-order in Y , the conditional standard deviation of (natural) log-conductivity
lnY Kx x , about its conditional mean, Y x . We adopt the notation of Ye et al.
[2004] and approximate the Laplace transform of conditional mean head and flux by their
leading terms up to second-order (denoted by parenthetic superscript) in Y
0 2, , ,h h h x x x (3.14)
0 2, , , q x q x q x (3.15)
The system of equations satisfied by the zero-order mean and flux is given by
28
0 0
0, , ,S SS h f S H x x q x x x x x (3.16)
0 0, ,GK h q x x x x (3.17)
0, ,h H x x Dx (3.18)
0, ,Q q x n x x Nx (3.19)
Here, expGK Y x is the conditional geometric mean of K ; f , 0H , H and
Q are ensemble mean (in part Laplace transformed) forcing terms. For simplicity we treat
f , H and Q as deterministic. Second-order corrections of head and flux are governed by
2
2 2 0 2, , , ,
2
Y
GK h h
xq x x x r x x (3.20)
2 2, , 0SS h x x q x x (3.21)
2, 0h x Dx (3.22)
2, 0 q x n x Nx (3.23)
where 2 2
Y Y x x is the conditional variance of Y x and the second-order
transformed residual flux, 22
, ,K h r x x x , in (3.20) is evaluated according
to
2 0 0, , , , , dG G YK K C G h
x y yr x x y x y y x y y
2 0
0 , , dSK h S G
xx y y y x y (3.24)
where ,YC Y Y x y x y is the conditional covariance of Y between points x and y ,
the superscript ‘+’ denoting transpose. The zero-order conditional mean random Green’s
function, 0, ,G y x , associated with (3.7) - (3.10) is obtained upon writing (3.16) -
29
(3.19) in terms of ,y and solving them subject to homogenous boundary conditions and a
Dirac delta source at x . The last integral on the right hand side of (3.24), containing the
conditional cross-correlation between hydraulic conductivity and initial head fluctuations 0h ,
is new and does not appear in equation (39) of Ye et al. [2004]. For reasons explained earlier,
this term may vanish during the first time interval 0 1T T (in particular when 0H is
deterministic) but not during later intervals. The second-order conditional cross-covariance,
2, ,Khu x y , between K x and transformed head ,h y is evaluated according to
2 0 0, , , , , , dKh G G Yu K K C h G
z zx y x z z x z z y z
2 0
0 , , dSS K h G
z x z z y z (3.25)
Corresponding equations for the conditional second-moment (variance-covariance) of
associated head prediction errors are evaluated according to
2 2 0 2, , , , , , , , ,G h Kh S hK C s u s h S C s
x x xx x y x y x x x y
2
0 ,SS h h s x x y x (3.26)
2, , , 0hC s x y Dx (3.27)
2 2 0, , , , , , 0G h KhK C s u s h
x xx x y x y x n x Nx (3.28)
Here, 2, , , , ,hC s h h s x y x y is the conditional covariance between transformed
and untransformed head fluctuations ,h x and ,h s y . The term on the right hand side of
(3.26) includes the covariance between head ,h sy at time s and initial head 0h x . This
covariance is rendered by the inverse Laplace transform with respect to s of
2 20 0
0 0, , , , dh h h K h G
z zx y z z x z y z
2 0
0 0 , , dSS h h G
z x z z y z (3.29)
30
Like Ye et al. [2004] the above MEs are solved by a Galerkin finite element method
using bilinear Lagrange interpolation functions. The finite element equations are shown in
Appendix A. Laplace back transformation into the time domain is performed using the
quotient difference algorithm of De Hoog et al. [1982]. The numerical code has been
parallelized to (i) solve (3.14) - (3.24) for different values of simultaneously, and (ii)
compute the cross-covariances and covariances (3.25) - (3.29) at subsets of grid nodes which
are uniformly distributed among available processors in a cluster.
3.2 Data assimilation of groundwater flow data via KF: MC-based
EnKF and ME-based approach
We consider the model vector
Yy
h (3.30)
where the parameter vector Y contains YN log-conductivities and the state vector h includes
hN hydraulic head values satisfying (3.1) - (3.5), so that y has dimension y Y hN N N . In
our finite element solver of (3.1) - (3.5), described above, YN is the number of elements in
which hydraulic conductivity is taken to be uniform and hN is the number of nodes at which
heads are computed.
According to the notations introduced in Chapter 2, we denote the model vector y at
time 1kT conditioned on measurements available up to time 1kT , by 1, ku T y . In line with
Tarantola [2005], Cohn [1997] and Woodbuty and Ulrych [2000] we consider 1, ku T y to be
multivariate Gaussian with mean vector
1
1
1
,
,
,
k
k
k
u T
u T
u T
Yy
h (3.31)
and covariance matrix
31
1 1
1
1 1
, ,
,
, ,
k k
k
k k
u T u T
Y Yhu T
u T u T
Yh h
C u
u C
yyΣ (3.32)
Here, 1, ku T
YC and 1, ku T
hC are the conditional covariance matrix of 1, ku T Y and 1, ku T h ,
respectively, and 1, ku T
Yhu is their cross-covariance matrix.
The forward step entailed in the KF algorithm requires solving the system of
stochastic PDEs (3.1) - (3.5) within the time interval 1k kT T and with random initial
conditions given by (3.31) - (3.32).
One way of solving the forward step is to rely on Monte Carlo (MC) simulation. As
detailed in Chapter 2, MC requires representing the density function of 1, ku T y through a
collection of model realizations, 1, ku T
jy , 1, ,j NMC . With this approach equations (3.1) -
(3.5) are solved within the time interval 1k kT T for each model realization j. This is
accomplished by employing the deterministic log-conductivity field and initial head field
contained in 1, ku T
jy . The MC solution yields the collection of forward realizations at time kT ,
, kf T
jy , 1, ,j NMC .
As an alternative to MC, we propose to solve the forward step directly through the
system of moment equations (3.7) - (3.29). These MEs are solved within the time interval
1k kT T upon setting the mean and the covariance of the log-conductivity field equal to
1, ku T Y in (3.31) and 1, ku T
YC in (3.32), respectively. This approach requires treating the
initial conditions as random. The initial head field is characterized by mean and covariance
matrix equal to 1, ku T h and 1, ku T
hC , respectively, while the cross-covariances between
conductivities and initial heads are set to 1 1, ,k ku T u T
Kh G Yhu K u . The ME solution yields second-
order approximation of mean and covariance matrix of the forward vector at time kT , , kf T
y
32
,
,
,
k
k
k
f T
f T
f T
Yy
h (3.33)
, ,
,
, ,
k k
k
k k
f T f T
Y Yhf T
f T f T
Yh h
C u
u C
yyΣ (3.34)
The measurements of Y and/or h available at time kT and the covariance matrix of
the corresponding measurement errors are then used in the analysis step of the KF algorithm.
Working with MC, (2.24) - (2.30) allow obtaining the collection of updated model
realizations , ku T
jy , 1, ,j NMC . In the ME-based approach, (2.20) - (2.22) are used for the
evaluation of the mean, , ku Ty , and covariance matrix,
, ku T
yyΣ of the target updated density
function. Figures 3.1 - 3.2 summarize the assimilation algorithms associated with MC and
ME approaches, respectively.
Figure 3.1. Flow chart of data assimilation through common MC-based EnKF.
Initial conditions:T0
Forecast: MC (3.1) - (3.5)
kTd
kT
εεΣ
Updating: EnKF (2.23) - (2.29)
Observed Data
Assimilation
0T
j
Y
h
k =1…n
1, ,j NMC
, kf T
j
Y
h1, ,j NMC
, ku T
j
Y
h1, ,j NMC
33
Figure 3.2. Flow chart of data assimilation through embedding of stochastic moment
equations of transient groundwater flow in the KF scheme.
3.3 Exploratory synthetic example of data assimilation and
parameter estimation
We explore the feasibility and accuracy of our ME-based approach by way of a two-
dimensional transient flow example. We consider a square domain measuring 40 × 40 (all
quantities are given in consistent space-time units) discretized into grid cells of size 1 × 1.
Each element has uniform hydraulic conductivity, yielding a parameter vector Y of
dimension 1600YN . Head values are prescribed or computed at 1681hN nodes, yielding
a head vector h of similar dimension. Whereas deterministic head values equal to 1 1.0H
and 2 0.0H are prescribed along the left and right boundaries, the top and bottom
boundaries are made impervious (Figure 3.3). Storativity is set equal to a uniform
deterministic value of 0.3. Initial hydraulic heads are deterministic and vary linearly between
the two constant head boundaries. Superimposed on this background gradient is convergent
, kf T
Y
h
, kf T
Y Yh
Yh h
C u
u C
Initial conditions:T0
Forecast: ME (3.7) - (3.29)
kTd
kT
εεΣ
Updating: KF (2.20) - (2.22)
Observed Data
Assimilation
0T
Y
h
k =1…n
0T
Y Yh
Yh h
C u
u C
, ku T
Y
h
, ku T
Y Yh
Yh h
C u
u C
34
flow to a centrally located well that starts pumping at a deterministic constant rate 0.3pQ
at reference time 0t . Mathematically the well is simulated by setting , 0 0f t x and
, 0 pf t Q wx x x in (3.1) where is the Dirac delta function, wx are the Cartesian
coordinates of the well and well radius is neglected.
Figure 3.3. Flow domain, nodes of the computational grid (+), boundary conditions, pumping
well (○), log-conductivity (◊) and hydraulic head (∆) measurement locations.
Values of Y in each grid cell are set equal to those generated at element centers using
a sequential Gaussian simulator [SGSIM, Deutsch and Journel, 1998]. The generated values
form a random realization (depicted in Figure 3.4) of a statistically homogeneous and
isotropic multivariate Gaussian field having variance 2.0 and exponential covariance with
integral scale 4.0. This strongly heterogeneous reference field is characterized by spatial
mean and variance equal, respectively, to 0.00 and 1.71. We solve the corresponding
deterministic flow problem through the system (3.1) - (3.5) for a time period of 80 units
0.0
10.0
20.0
30.0
40.0
0.0 10.0 20.0 30.0 40.0
Impervious boundary
Impervious boundary
Co
nst
ant h
ead
H
1
Co
nst
ant h
ead
H
2
1x
2x
14
13
1
4
75
3
12 2
6
15
16
8
18
10
9
11
20
19
17
35
80.0maxT to obtain a corresponding reference head distribution in space-time. Figure 3.5
shows the time evolution of reference head at eight selected measurement points. The vertical
line in Figure 3.5 at 30.0kT separates an early transient flow regime from a later pseudo
steady state regime during which heads are seen to vary linearly with log-time.
Figure 3.4. Spatial distribution of log-hydraulic conductivity in the reference model.
We sample the reference Y field, refY , in nine elements uniformly distributed ( mY ,
1, ,9m ) across the domain and the reference head values at 20 grid points and kN
observation times ( kT
nh , 1, ,20n , 1, , kk N ). The spatial locations of the measurement
points are indicated in Figure 3.3. The Y and h measurements are corrupted with zero-mean
white Gaussian noise, m and kT
n , having standard deviations YE and hE , respectively,
according to
m m mY Y 1, ,9m (3.35)
, k k kT T T
n n nh h 1, ,20n 1, , kk N (3.36)
0.0 10.0 20.0 30.0 40.00.0
10.0
20.0
30.0
40.0
-4.0
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
Y
1x
2x
36
Figure 3.5. Temporal evolution of reference hydraulic head (curves) and noisy measurements
(symbols) at seven locations identified in Figure 3.3.
We consider three case studies (TC1, TC2 and TC3) with diverse values of kN and
YE . Test case 1 (TC1) considers ten observation times ( 5.0;kT 10.0; 15.0; 20.0; 25.0; 30.0;
35.0; 40.0; 60.0; 80.0; k = 1,2,… kN ) and 0.1YE . In test case 2 (TC2) the measurement
error variance of Y exceeds that in TC1 by one order of magnitude, the standard deviation
being now 0.32YE . The third test case (TC3) differs from TC1 in that it includes eleven
additional observation times ( kT 3.0; 7.0; 9.0; 11.0; 13.0; 17.0; 19.0; 21.0; 23.0; 27.0;
29.0). All three test cases consider the measurement error variance of heads, 2
hE , equal to
410 .
In our example the vector kTd introduced in (2.2) contains the perturbed sample of
hydraulic head at time kT as defined in (3.36). The corresponding covariance matrix of head
measurement errors, kT
εεΣ , is diagonal homoscedastic with entries equal to 2
hE . Entries kT
ijH
Hea
d
kT
-0.8
-0.4
0.0
0.4
0.8
1.2
5 5015 30
1
4
389
6
7
37
of kTH are equal to 1 when the i-th element of kT
d is a measurement of the j-th entry of kTy
and 0 otherwise.
The elements of the vector Y containing the measurements of log-conductivity as
defined in (3.35) are employed for generating the mean and the covariance matrix of the log-
conductivity field, Y , at the initial time 0T . The perturbed samples of log-conductivity are
projected via ordinary kriging onto the centroids of all grid elements assuming knowledge of
the corresponding variogram model and parameters.
Figures 3.6 and 3.7 respectively depict estimates of Y and corresponding variances at
each assimilation step of TC1. The estimates of Y in Figure 3.6 evolve toward a pattern
similar to that of the reference Y field in Figure 3.4. The rate of evolution is fastest at early
time and slowest during the pseudo steady state period at 30kT . A similar phenomenon
was observed by Chen and Zhang [2006] when coupling EnKF with standard MC simulation,
and by Riva et al. [2009] during batch transient inversion of stochastic MEs using maximum
likelihood. Prior to the start of assimilation (at 0kT ) the estimation variance of Y in Figure
3.7 is close to the unconditional reference variance everywhere except near the nine
measurement points at which it is equal to the error measurement variance. Assimilation
brings about a rapid reduction in this estimation variance at early time and a much reduced
rate of reduction at later times.
38
Figure 3.6. Estimates of log-conductivity Y at initial times 0kT and at ten updating times for test case TC1.
Tk = 0 Tk = 5 Tk = 10 Tk = 15 Tk = 20 Tk = 25
Tk = 30 Tk = 35 Tk = 40 Tk = 60 Tk = 80
-4.0
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
Y
39
Figure 3.7. Estimation variance of Y at initial times 0kT and at ten updating times for test case TC1.
2
Y
0.0
0.5
1.0
1.5
2.0
2.5Tk = 0 Tk = 5 Tk = 10 Tk = 15 Tk = 20 Tk = 25
Tk = 30 Tk = 35 Tk = 40 Tk = 60 Tk = 80
40
These phenomena are reflected quantitatively in the temporal behaviors of YE , the average
absolute difference between estimates ,Y ku T and reference values refY at all element
centroids ix , and of YV , the average estimation variance 2 ,u t
Y
at these points, defined as
, *
1
1
x xYN
u t
Y i i refiY
E t Y YN
(3.37)
2 ,
1
1 YNu t
Y Y i
iY
V tN
x (3.38)
where k maxt T T is normalized time (assimilation take place at 0.0625;t 0.125; 0.1875;
0.25; 0.3125; 0.375; 0.4375; 0.50; 0.75; 1.00). Indeed, Figure 3.8 demonstrates that YE and
YV decrease more sharply with t at early time than during the later pseudo steady state
period.
Figure 3.8. Average absolute difference YE t between estimated and reference Y values,
and corresponding average estimation variance YV t , versus t for test case TC1.
t
YE YV
0.8
1.0
1.2
1.4
1.6
1.8
2.0
0.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
0.8
1.0
1.2
1.4
1.6
1.8
2.0
0.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
5
7
YE
YV
41
Figure 3.9 depicts scatter plots of estimated versus reference Y values at 0,kT 15,
30, and 80 together with intervals corresponding to ± two standard deviations, *,2 u t
Y x , of
the estimates about their mean values. More than 90% of the estimates are seen to lie inside
these intervals at each time kT , even as the intervals narrow with increasing kT . Linear
regression lines fitted in Figure 3.10 to the data have slopes that increase with time from 0.14
at 0kT to 0.37 at 30kT , and coefficients of determination 2R that likewise increase from
0.18 at 0kT to 0.33 at 30kT . Beyond 30kT , these variations are comparatively small.
Figure 3.9 also shows that, due to the relatively small standard deviation of Y measurement
errors, estimates of Y at the nine measurement points do not change much during the
assimilation process.
Figures 3.10 and 3.11 show that increasing the measurement error variance of Y by
one order of magnitude, as is done in TC2, has only a minor effect on the temporal behavior
of YE and YV . Including eleven additional observation times (TC3) allows obtaining values
of YE considerably reduced, while YV remains virtually unaffected. This behavior indicates
that increasing the number of assimilation data at early time improves parameter estimates
without underestimating their variance. Figure 3.12 compares slopes of regression lines fitted
to scatter plots of estimated versus reference Y values at various updating times in each test
case. The graph shows that whereas adding noise to log-conductivity measurements causes
their estimates (in terms of this slope) to deteriorate slightly for the scenarios considered,
adding early time measurements renders the estimates markedly more accurate.
42
Figure 3.9. Scatter plots of estimated and reference Y at four Tk values; corresponding
intervals of ± two standard deviations of Y estimates about their mean (gray lines); and linear
regression fits to the data (black lines), for test case TC1. Y estimates at the nine measurement
locations are highlighted in red.
-4.0
-2.0
0.0
2.0
4.0
-4.0 -2.0 0.0 2.0 4.0
-4.0
-2.0
0.0
2.0
4.0
-4.0 -2.0 0.0 2.0 4.0
-4.0
-2.0
0.0
2.0
4.0
-4.0 -2.0 0.0 2.0 4.0
-4.0
-2.0
0.0
2.0
4.0
-4.0 -2.0 0.0 2.0 4.0
refYrefY
refYrefY
Y Y
Y Y
Tk = 0 Tk = 15
Tk = 30 Tk = 80
0.14 0.17refY Y 2 0.18R
0.32 0.17refY Y 2 0.31R
0.37 0.10refY Y 2 0.33R
0.39 0.06refY Y 2 0.34R
(a) (b)
(c) (d)
43
Figure 3.10. Average absolute difference YE t between estimated and reference Y values
versus t for test cases TC1, TC2 and TC3.
Figure 3.11. Average estimation variance YV t versus t for test cases TC1, TC2 and TC3.
0.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
5
7
8
0.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
t
YE
TC1
TC2
TC3
0.80
1.00
1.20
1.40
1.60
1.80
2.00
0.0 0.2 0.4 0.6 0.8 1.00.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
5
7
8
t
YV
TC1
TC2
TC3
44
Figure 3.12. Slopes of regression lines fitted to scatter plots of estimated versus reference Y
values at various updating times in each test case.
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
5
7
8
t
TC1
TC2
TC3Reg
ress
ion
lin
e sl
op
e
45
The impact of initial hydraulic heads on estimates of Y is analyzed by repeating the
three test cases described above with stochastic initial heads, the moments of which are
obtained by solving the steady-state MEs [Guadagnini and Neuman, 1999] with kriged mean
permeability and corresponding covariance without pumping. We designate the test cases
corresponding to this initial condition as TCi_S ( 1,2,3i ). It is important to note that in this
case the cross-correlation between K and initial heads in (3.24) - (3.29) does not vanish (not
even during the first time interval). This cross-correlation is provided by the solution of the
steady-state MEs.
In all these test cases the average estimation variance is found to remain unaffected by
the initial head. On the other hand, YE is slightly influenced by 0H . Figures 3.13 and 3.14
depict the temporal behaviors of YE for TC1, TC3, and all considered 0H . Results
corresponding to TC2 are qualitatively similar to those of TC1 and are not shown. The effects
of the nature (stochastic or deterministic) of 0H depend on the frequency of head
observation. For TC1, the adoption of a stochastic 0H improves (globally) the Y estimate
field slightly relative to those obtained with a deterministic and linear 0H . The opposite
happens in case TC3 even as the difference in YE tends to decrease with time (see Figure
3.14). In the case of random 0H , increasing the frequency of head observations does not
cause YE to decrease significantly in comparison to the case of deterministic 0H (compare
Figures 3.13 and 3.14).
46
Figure 3.13. Average absolute difference YE t between estimated and reference Y values
versus t for test cases TC1 and TC1_S.
Figure 3.14. Average absolute difference YE t between estimated and reference Y values
versus t for test cases TC3 and TC3_S.
0.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
t
YE
0.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
5
7
8
TC1
TC1_S
t
YE
0.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
5
7
8
TC3
TC3_S
0.75
0.80
0.85
0.90
0.95
1.00
1.05
0.0 0.2 0.4 0.6 0.8 1.0
47
The analysis illustrated above treats the functional form and parameters of the
variograms used to generate the reference Y field as given. To test the influence of the
variogram model and parameters on our estimates, three additional test cases were performed
using the same reference log-conductivity and head fields and the same conditioning data set
of TC1. In one test case (TC4), we increased the unconditional variance and integral scale of
Y to 3 and 6, respectively, and decreased them to 1 and 2, respectively, in another (TC5). In
the last test case (TC6), we changed the functional form of the variogram from exponential to
Gaussian, without changing the unconditional variance and integral scale of Y. The results,
shown in Figures 3.15 and 3.16 confirm in part the finding due to Chen and Zhang [2006]
that incorrect initial variance and integral scale values of Y have no significant adverse effect
on YE or YV , the latter tending to decrease with diminishing initial sill and integral scale
values. The effect of incorrect initial variance and integral scale of Y on YV was found to
diminish with time. This observation might obviate the need to estimate variogram
parameters jointly with Y, as done in the context of steady state and batch transient
geostatistical inversion of MEs by Riva et al. [2009, 2011]. In contrast, adopting an incorrect
variogram model caused the quality of YE , YV , and correlations between estimated and true Y
values to deteriorate at all times.
48
Figure 3.15. Average absolute difference YE t between estimated and reference Y values
versus t for test cases TC1, TC4, TC5 and TC6.
Figure 3.16. Average estimation variance YV t versus t for test cases TC1, TC4, TC5 and
TC6.
0.75
0.85
0.95
1.05
1.15
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
0.0 0.2 0.4 0.6 0.8 1.0
5 21
13 14
t
TC1
YE
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
0.0 0.2 0.4 0.6 0.8 1.0
5 21
13 14TC4
TC5
TC6
0.00
0.50
1.00
1.50
2.00
2.50
0.0 0.2 0.4 0.6 0.8 1.0
t
YV
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
0.0 0.2 0.4 0.6 0.8 1.0
5 21
13 14
TC1
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
0.0 0.2 0.4 0.6 0.8 1.0
5 21
13 14TC4
TC5
TC6
49
3.4 Comparison between MC-based EnKF and ME-based approach
We compare the performances and accuracies of MC-based EnKF and our ME-based
implementation on nine synthetic problems. We adopt the identical domain and
computational grid employed in Section 3.3 and the same flow setting depicted in Figure 3.3.
In these synthetic cases deterministic head values of 1 0.8H and 2 0.0H are prescribed
along the left and right domain boundaries, respectively. These conditions generate a mean
hydraulic gradient of 2% aligned along direction 1x . As in Section 3.3, the bottom and top
domain boundaries are taken as impervious. Initial heads are considered as random.
Superimposed on this background gradient is convergent flow to a centrally located well that
starts pumping at a deterministic constant rate 310pQ at the reference time 0t .
Storativity is set equal to a uniform deterministic value of 10-4
. The nine problems differ from
each other in the variance and integral scale of the reference log-hydraulic conductivity
fields, lnref refY Kx x . The latter are generated (Figure 3.17) by sampling statistically
homogeneous and isotropic multivariate Gaussian fields having mean equal to
4ln 10 9.21 and 9 exponential variograms with different combinations of sill and
integral scale, YI , as detailed in Table 3.1. The reference realizations are generated by the
sequential Gaussian simulator SGSIM of Deutsch and Journel [1998]. Included in Table 3.1
are the ratios between domain length scale and YI , sample variance, as well as sill and
integral scale obtained for each reference realization by fitting, via least squares, an
exponential variogram model to the corresponding sample variogram. The least squares
variogram parameter estimates are seen to differ, generally, from their original field values.
50
Input parameters Least squares fit
Ref. case Sill IY Domain side / IY Sample variance Sill IY
TC1 0.5 4.0 10 0.43 0.41 3.02
TC2 1.0 4.0 10 1.08 1.22 6.20
TC3 2.0 4.0 10 1.80 1.89 3.53
TC4 0.5 10.0 4 0.34 0.42 6.718
TC5 1.0 10.0 4 0.89 1.58 15.95
TC6 2.0 10.0 4 1.62 2.50 15.62
TC7 0.5 20.0 2 0.39 0.53 17.94
TC8 1.0 20.0 2 0.66 1.16 23.85
TC9 2.0 20.0 2 1.40 2.47 22.19
Table 3.1. Variogram input parameters, ratio between domain side and IY, sample variance,
sill and integral scale obtained by fitting, using least squares, an exponential variogram model
to the corresponding sample variogram.
Both MC- and ME-based EnKF require specifying the variogram parameters for the
initial step. We work with the generating rather than the estimated sill and integral scale to
avoid introducing additional sources of uncertainty in the comparison. Chen and Zhang
[2006] showed that incorrect initial sill and integral scale of Y have only a secondary effect
on the final log-conductivity estimates. On the other hand, Jafarpour and Tarrahi [2011]
found in analyzing flow through a highly anisotropic system that inaccuracies in prescribed
directional integral scales tends to persist throughout MC-based EnKF runs.
We solve numerically the groundwater flow equations (3.1) - (3.5) for the duration of
200 time units ( 200.0maxT ). Similarly to Section 3.3, we sample each reference Y field in
nine elements uniformly distributed across the domain and the reference head fields at 20 grid
points (Figure 3.3) and 10 observation times ( kT
nh , n = 1,…20, kT = 10.0; 15.0; 20.0; 25.0;
30.0; 50.0; 80.0; 100.0; 150.0; 200.0; k = 1, 2, ..., 10). This selection of observation times
enables us to sample transient as well as pseudo steady state flow regimes (during which
51
computed heads vary linearly with log-time) the latter of which develop, in these cases, at
80kT . The log-conductivity and head samples are turned into “measurements” by
corrupting them with white Gaussian noise, εm and ε kT
n, having zero mean and standard
deviations 0.1YE and 0.01hE , respectively, as defined in (3.35) – (3.36).
The resulting absolute relative differences between reference and measured values
range from 0.0% to 2.6% (with mean 0.8%, mode 1.7%, 5th
percentile 0.2% and 95th
percentile 21.5%) for log-conductivity and from 0.0% to 144% (with mean 4.6%, mode
0.6%, 5th
percentile 0.0% and 95th
percentile 19.4%) for hydraulic head. Large relative errors
(> 50%) in head measurement are thus obtained far from the pumping well, at short times kT ,
where kT
nh are close to zero.
The elements of d kT, kT
εεΣ and kT
H are defined in the same way as described in Section
3.3. The perturbed log-conductivity samples included in the vector Y are made available at
initial time 0T . In the ME-based assimilation Y is used for generating the initial mean and
the covariance matrix of the model vector. In the MC-based EnKF, the initial collection of
log-conductivity realizations, 0T
iY , 1, ,i NMC , is generated using the true variogram
model with parameters listed in Table 3.1. Each 0T
iY is conditioned on the randomized
measurement vector i
Y , obtained by perturbing each element of
Y with a Gaussian noise
having standard deviation YE . For each MC realization, 0T
iY , the initial head vector, 0T
ih , is
computed by solving the deterministic steady-state flow problem (3.1) - (3.5) without
pumping.
In most previous applications of MC-based EnKF [Chen and Zhang, 2006; Hendricks
Franssen and Kinzelbach, 2008; Schoeniger et al., 2012; Xu et al., 2013] the number NMC
of Monte Carlo runs did not exceed a few hundred. Recognizing that NMC may have an
52
impact on the results and that estimates of mean and variance of a random variable converge
at a rate which diminishes with NMC [Ballio and Guadagnini, 2004], we consider here a
series of values NMC = 100; 500; 1,000; 5,000; 10,000; 50,000; 100,000. Figures 3.17 and
3.18, respectively, compare the spatial distributions of updated log-conductivity ,Y ku T , and
corresponding estimation variances 2 , ku T
Y (diagonal entries of , ku T
YC ), at the final assimilation
time ( 200.0kT ) for all nine reference cases obtained by ME- and MC-based EnKF. Values
of ,Y ku T obtained with 1,000NMC exhibit more pronounced spatial variabilities than do
those obtained from a larger number of MC realizations. Indeed, as ,Y ku T represents a
relatively smooth estimate of Y, spatial fluctuations are expected to diminish with increasing
NMC . Results obtained with 10,000NMC are similar to those obtained with
10,000NMC for all cases examined and therefore not shown.
Estimation variance is seen to vary locally with NMC , due most likely to filter
inbreeding. The problem seems to disappear at 1,000NMC where the spatial distribution
of MC-based variances is quite similar to that of their ME-based counterparts.
53
ME NMC = 10,000 NMC = 1,000 NMC = 500 NMC = 100 Reference
field
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
Figure 3.17. Spatial distributions of ,Y ku T
at Tk = 200 obtained by ME- and MC-based EnKF with diverse
values of NMC. Reference Y fields are also shown.
-12
-10
-8
-6
Y
- 6
- 8
- 10
- 12
-12
-10
-8
-6
Y
- 6
- 8
- 10
- 12
-12
-10
-8
-6
Y
- 6
- 8
- 10
- 12
-12
-10
-8
-6
Y
- 6
- 8
- 10
- 12
-12
-10
-8
-6
Y
- 6
- 8
- 10
- 12
-12
-10
-8
-6
Y
- 6
- 8
- 10
- 12
-12
-10
-8
-6
Y
- 6
- 8
- 10
- 12
-12
-10
-8
-6
Y
- 6
- 8
- 10
- 12
-12
-10
-8
-6
Y
- 6
- 8
- 10
- 12
54
ME NMC = 10,000 NMC = 1,000 NMC = 500 NMC = 100
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
Figure 3.18. Spatial distributions of 2 , ku T
Y at Tk = 200 obtained by ME- and MC-based EnKF with diverse
values of NMC. Reference Y fields are also shown.
0
0.1
0.2
0.3
0.4
0.5
0.4
0.3
0.2
0.1
0.0
0
0.5
1
1.5
21.0
0.75
0.5
0.25
0.0
0
0.5
1
1.5
22.0
1.5
1.0
0.5
0.0
0
0.1
0.2
0.3
0
0.1
0.2
0.30.3
0.2
0.1
0.0
0
0.1
0.2
0.30.6
0.4
0.2
0.0
0
0.1
0.2
0.30.6
0.4
0.2
0.0
0
0.05
0.1
0.15
0.2
0
0.5
1
1.5
20.2
0.15
0.1
0.05
0.0
0
0.1
0.2
0.3
0.4
0
0.5
1
1.5
20.4
0.3
0.2
0.1
0.0
0
0.2
0.4
0.6
0
0.1
0.2
0.30.75
0.5
0.25
0.0
55
Figures 3.19 and 3.20, respectively, show temporal behaviors of the average absolute
difference, YE , as well as the average estimation variance, YV , defined in (3.37) - (3.38).
Assimilation in these cases takes place at 0.050t , 0.075, 0.100, 0.125, 0.150, 0.250,
0.400, 0.500, 0.750, 1.00). YE and YV are seen to increase as the sill of the variogram
increases and as YI decreases. The largest difference between MC-based values of YE and
YV obtained with 100NMC and with 10,000NMC occurs in TC3 (Figures 3.20c and
3.21c) where the sill is largest and the integral scale smallest. Figure 3.19 shows that whereas
YE tends to decrease with NMC , at large NMC its MC- and ME-based values are close. The
only exception is TC9 (associated with the largest sill and YI , Figure 3.19i) where the curve
obtained with 10,000NMC lies slightly below that obtained with ME-based EnKF. In
TC9, the relative difference between MC- and ME-based results varies between 18% at small
t and 10% at large t . We ascribe this behavior to approximations required to close what
would otherwise be exact moment equations. Inaccuracies associated with these
approximations tend to increase with increasing values of YI relative to domain size.
56
Figure 3.19. YE versus t* for the nine test cases. ME-based (solid black) and MC-based
results with NMC = 100 (dashed gray), 500 (dashed-dotted gray), 1,000 (solid gray), and
10,000 (dashed-dotted black) are reported.
Figures 3.19 and 3.20 indicate that assimilations done with 100NMC are generally
associated with (a) large YE values that tend to increase with time and (b) small YV values
that tend to decrease with time. The two phenomena are symptomatic of filter inbreeding.
Several authors [Hendricks Franssen and Kinzelbach, 2008; Liang et al., 2012; Xu et al.,
2013] suggest to analyze the occurrence of filter inbreeding by plotting the ratio Y YV MSE
versus time where
2
, *
1
1 YNu t
Y i i refiY
MSE t Y YN
x x (3.39)
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
20.0YI
10.0YI
4.0YI YE
YE
YE
t t t
Sill = 0.5 Sill = 1.0 Sill = 2.0
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
57
Figure 3.20. YV versus t* for the nine test cases. ME-based (solid black) and MC-based
results with NMC = 100 (dashed gray), 500 (dashed-dotted gray), 1,000 (solid gray), and
10,000 (dashed-dotted black) are reported.
Under ideal conditions, Y YV MSE should be equal to unity [Liang et al., 2012]. Here we
explore this issue by considering also the quantity
*, , *
2
1
12
x x xY
Y
Nu t u t
Y i i i refiY
P t H Y YN
(3.40)
where H is the Heaviside step function, 2 YP representing percent reference values of Y
lying inside a confidence interval of width equal to ± 2 *, xu t
Y i about , *x
u t
iY . Analyses
of how Y YV MSE (Figure 3.21) and 2 YP (Figure 3.22) evolve with time lead to similar
conclusions. When 1,000NMC , Y YV MSE and 2 YP decrease with time, exhibiting a
distinct filter inbreeding effect.
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
YV
YV
YV
Sill = 0.5 Sill = 1.0 Sill = 2.0
t t t
4.0YI
10.0YI
20.0YI
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
58
Figure 3.21. Ratio between YV and
YMSE versus t* for the nine test cases. ME-based (solid
black) and MC-based results with NMC = 100 (dashed gray), 500 (dashed-dotted gray), 1,000
(solid gray), and 10,000 (dashed-dotted black) are reported.
No such deterioration with time is exhibited by either MC-based results with 1,000NMC
or by ME-based outcomes where Y YV MSE remains approximately constant and 2 YP larger
than 90%. The only exception concerns ME-based results associated with TC9 (see Figures
3.21i and 3.22i) where 2 YP is slightly smaller than 90% ( 88%). However, even here the
ME-based values of Y YV MSE and 2 YP show no systematic decrease with time (as would
happen in the presence of filter inbreeding) but instead diminish rapidly during the first
assimilation period and then stay approximately constant. The rapid early decline is likely
due to spurious updates caused by second-order approximation of the cross-covariance terms.
In contrast to MC-based 2 YP which, at small NMC , drops down to below 40%, a steep
decline in ME-based values is limited to early time.
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
Y YV MSE
Y YV MSE
Y YV MSE
Sill = 0.5 Sill = 1.0 Sill = 2.0
4.0YI
10.0YI
20.0YI
ttt
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
59
Figure 3.22. 2 Y
P versus t
* for the nine test cases. ME-based (solid black) and MC-based
results with NMC = 100 (dashed gray), 500 (dashed-dotted gray), 1,000 (solid gray), and
10,000 (dashed-dotted black) are reported.
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
2 YP
2 YP
2 YP
Sill = 0.5 Sill = 1.0 Sill = 2.0
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
4.0YI
10.0YI
20.0YI
ttt
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
60
Black dots in Figure 3.23 indicate the spatial location of the reference values of Y
which, following the last assimilation period at 1.0t , lie outside confidence intervals
having widths equal to *,2 u t
Y i x about , *x
u t
iY . This confirms the poor quality of
estimates obtained with 1,000NMC , even in the weakly heterogeneous settings of TC1,
TC4 and TC7. Remarkably, black dots in Figure 3.23 corresponding to MC- (with
10,000NMC ) and ME-based filters have similar spatial distributions. It thus appears that
the two approaches behave similarly in a global (as observed in Figures 3.19 - 3.20) and in a
local sense when NMC is sufficiently large. This behavior can be quantified by analyzing the
percentage of cells in which reference values of Y lie within the 95% confidence intervals
around updated Y values in both the ME- and MC- based solutions. As expected, this metric
is seen to decrease as the number of MC realizations grows in all test cases. In case of the
MC approach, average values of in the nine test cases are 94%, 93%, 91%, 87% and 48%
for NMC = 10,000, 1,000, 500, and 100, respectively.
61
ME NMC = 10,000 NMC = 1,000 NMC = 500 NMC = 100
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
Figure 3.23. Spatial distributions (black squares) of elements in which Yref lie outside confidence intervals
of width ± 2 ,xku T
Y i about ,
xku T
iY when 200.0kT for the nine test cases.
10,000NMC ME 1,000NMC 500NMC 100NMC
10,000NMC ME 1,000NMC 500NMC 100NMC
10,000NMC ME 1,000NMC 500NMC 100NMC
10,000NMC ME 1,000NMC 500NMC 100NMC
10,000NMC ME 1,000NMC 500NMC 100NMC
10,000NMC ME 1,000NMC 500NMC 100NMC
10,000NMC ME 1,000NMC 500NMC 100NMC
10,000NMC ME 1,000NMC 500NMC 100NMC
10,000NMC ME 1,000NMC 500NMC 100NMC
62
Figures 3.24 and 3.25 depict temporal behaviors of hE and hV , the hydraulic head
analogues of YE and YV in (3.37) – (3.38), defined as
, *
1
1
x xhN
u t
h i i refih
E t h hN
(3.41)
2 , *
1
1
xhN
u t
h h i
ih
V tN
(3.42)
where 2 , *x
u t
h i is the estimation variance of h at node xi (i.e., a diagonal component of
,u t
hC
) and hN is the number of nodes, Nh, minus those located on Dirichlet boundaries and at
the pumping well where, theoretically, h due to the negligible well radius. Figure 3.24
shows that hE decreases sharply at the first assimilation time to then increase with t*. The
largest rate of increase is associated with MC-based values obtained with 100NMC . ME-
based values of hE are in general very close to MC-based values obtained with sufficiently
large NMC . On the other hand, the monotonically decreasing temporal trend in YV is not
mirrored by the mean estimation variance of h in Figure 3.25. Instead, hV in Figure 3.25
decreases sharply during the first assimilation step and then increases with time. We attribute
this to the combination of two contrasting effects: (a) the decrease of hV which is typically
associated with the updating step, (b) the temporal increase or decrease (depending on
location in the domain; see also Figure 7 of Ye et al. [2004] and results of Riva et al. [2009]
in head variance during the forward steps. Effect (a) dominates during the first assimilation
period, due to the high information content of the measurements (see also Figures 3.19 -
3.20), causing hV to decrease initially with time. As time increases and pseudo-steady state
conditions are approached, the conditioning head data become less informative (see also Riva
63
et al. [2009]). This is reflected in Figure 3.19 where YE is seen to be almost constant at large
values of t*. Here, effect (b) dominates as manifested by an increase in hV with time.
Figure 3.24. hE versus t* for the nine test cases. ME-based (solid black) and MC-based
results with NMC = 100 (dashed gray), 500 (dashed-dotted gray), 1,000 (solid gray), and
10,000 (dashed-dotted black) are reported.
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
hE
hE
hE
Sill = 0.5 Sill = 1.0 Sill = 2.0
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.120.0YI
10.0YI
4.0YI
t t t
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
64
Figure 3.25. hV versus t* for the nine test cases. ME-based (solid black) and MC-based
results with NMC = 100 (dashed gray), 500 (dashed-dotted gray), 1,000 (solid gray), and
10,000 (dashed-dotted black) are reported.
We close our analysis by plotting in Figure 3.26 the temporal behavior of
, * , *
2
1
12
x x xh
h
Nu t u t
h i i i refih
P t H h hN
(3.43)
representing percent reference h values lying inside a confidence interval of width equal to
, *2 u t
h i x about , *x
u t
ih . Figure 3.26 confirms that filter inbreeding associated with
small NMC impacts not only Y but also h.
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
hV
hV
hV
Sill = 0.5 Sill = 1.0 Sill = 2.0
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
0 0.2 0.4 0.6 0.8 10
0.002
0.004
0.006
0.008
0.01
20.0YI
10.0YI
4.0YI
t t t
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
65
Figure 3.26. 2 hP versus t
* for the nine test cases. ME-based (solid black) and MC-based
results with NMC = 100 (dashed gray), 500 (dashed-dotted gray), 1,000 (solid gray), and
10,000 (dashed-dotted black) are reported.
Ye et al. [2004] compared the computational time required by ME- and MC-based
forward solutions of a transient groundwater flow problem similar to the one we analyze here
and within a domain which is half the size of the one we consider. They found that, with
2,000NMC , the ME-based method required one quarter to one half the computer time to
evaluate mean heads and variances than did the MC-base approach. The authors computed
head variances by solving an integral expression (their (47)) in the presence of deterministic
sources, boundary and initial conditions, which does not require computing a complete head
covariance matrix. To conduct a more comprehensive comparison with the MC-based
approach, we opted in this work to compute the complete head covariance matrix at the end
of each time interval 1k kT T . Our updating step requires computing additional terms
appearing in (3.24)-(3.26) and (3.29). As recognized by Ye et al. [2004], the computation of
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
2 hP
2 hP
2 hP
Sill = 0.5 Sill = 1.0 Sill = 2.0
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10.4
0.6
0.8
1
4.0YI
10.0YI
20.0YI
ttt
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
66
these terms can have a significant effect on computational time during the forward step.
Indeed we find that, in our case, the ME- and MC-based approaches require 13,650 s and
0.375×NMC s, respectively, of CPU time on 10 parallel 2.80 GHz Intel i7-860 processors. It
follows that CPU time associated with ME-based EnKF is comparable to that associated with
35,000NMC MC-based assimilations. Considering that in our test cases the MC approach
converges within 10,000NMC , which however requires a tenfold increase in NMC to
ascertain convergence (i.e., 100,000NMC realizations are required to support convergence
at 10,000NMC ), we conclude that ME-based EnKF constitutes a viable alternative to the
traditional MC-based approach not only in terms of quality but also in terms of computational
efficiency. We believe that it should be possible to improve the computational efficiency of
ME-based EnKF further in the future.
3.5 Conclusions
In this Chapter we described a novel inversion algorithm for updating in real time
model parameters and system states in a groundwater flow model on information about log-
conductivity and transient head data collected in a randomly heterogeneous aquifer.
The methodology combines approximate form of stochastic transient groundwater
flow moment equations with the Kalman Filter algorithm and allows sequential updating of
parameters and system states without a need for computationally intensive Maximum
Likelihood (ML) or Monte Carlo (MC) analyses. We explored the feasibility and accuracy of
the novel inversion scheme and compared its performance and computational efficiency
against the common MC-based EnKF on nine different synthetic examples characterized by
different degree of heterogeneity.
We showed that embedding the MEs in the KF scheme allows computationally
efficient real time estimation of system states and model parameters avoiding the drawbacks
which are commonly encountered in traditional MC-based applications of EnKF. Our results
67
confirm an earlier finding by others that a few hundred MC simulations are not enough to
overcome filter inbreeding issues, which have a negative impact on the quality of log-
conductivity estimates as well as predicted heads and associated estimation variances. ME-
based EnKF obviates the need for repeated MC simulations and was demonstrated to be free
of inbreeding issues.
69
4. EnKF with complex geology
Here we present a methodology conducive to updating geological and petrophysical
properties of a collection of reservoir models characterized by a complex structural
(geological) architecture within the context of a history matching procedure based on the
Ensemble Kalman Filter (EnKF) approach. The associated computational algorithms are
illustrated in all their relevant details.
The (heterogeneous) spatial distribution of facies is handled by means of a Markov
Mesh (MM) model. The latter is adopted because of (a) its ability to reproduce detailed facies
geometries and spatial patterns, and (b) its consistency with the probabilistic Bayesian
framework at the basis of EnKF.
In Section 4.1 we start by outlining a formal definition of the MM model. Section 4.2
illustrates a novel inversion scheme which allows conditioning the geological and the
petrophysical properties of a collection of reservoir realizations on a set of measured
production data. The results obtained on a two-dimensional synthetic test case and a
comparison with those obtained using a standard EnKF are presented in Section 4.3. In
Section 4.4 we discuss our results.
4.1 Markov Mesh (MM) Model
We consider a finite, regular grid G comprising eN elements arranged in two or more
dimensions and define a sequence of regular grids G1, G2, …GL. Each of these grids is a
subset of cells in G such that
1 2 L G G G , L G G (4.1)
With this notation, the coarsest and the finest grids are respectively denoted as G1 and GL.
We further introduce the disjoint sets of elements, H1, H2, …HL, defined as
l lH G 1l (4.2)
70
1\l l lH G G 2,3, ,l L (4.3)
Each set lH , 1l consists of the cells of lG that are not appearing in the coarser grid level,
1lG . Figure 4.1 depicts the grid refinement that has been selected in our implementation.
Note that the disjoint sets lH , 1, ,l L are defined such that
1
l
l k
k
G H (4.4)
Figure 4.1. Sequence of grids (G1, G2 and G3) onto which the rectangular domain G is
decomposed within a multi-grid approach. Colors indicate the disjoint sets H1, H2 and H3.
Cell numbering corresponds to element indices while arrows indicate the direction followed
by the simulation path.
13 15
2 14
1 3
52 56 58 60
13 53 15 59
19 54 57
2 20 14 55
16 18 21
1 17 3 22
232 236 238 240
58 233 60 239
234 237
13 15 59 235
19
70
2 71 20 14
64 68 72
16 65 18 73 21
61 63 66 69 74
1 62 17 67 3 75 22
1: H 2: H 3: H
1 1G H
2 1 2 G H H
3 1 2 3 G H H H G
71
We then define a path scanning all the elements of the grid G . This path initially visits all
elements of the coarsest subset, 1H . It then scans all the elements of 2H and proceeds until
the last subset, LH , is reached. Each of the subsets, lH , 1 l L , is scanned starting from
the elements at the top grid layer and proceeding through all the elements of the underlying
layers until the bottom of the domain. In each layer the path starts from the left-bottom corner
and proceeds by paths parallel to the domain diagonal. The sequence of elements defined by
the scanning path allows assigning a label 1, , ei N to each cell of the grid G, as
sketched in Figure 4.1. Note that the selections of the grid refinement and of the path are not
unique, and diverse choices are admissible.
We assign an integer value, 1, ,is K , corresponding to a facies type (K being the
number of facies occurring in the domain), to each element of the grid. An indicator variable
k
is is then defined as
1 if , 1, ,
0 otherwise
k i
i
s k k Ks
(4.5)
Assuming that the vector s containing the facies identifiers, si, in all grid elements forms a
random field, one can describe the probability of observing a given facies distribution within
domain G through the discrete joint probability mass function s . The latter can be written
as a product of conditional probabilities
1
eN
i j i
i
s
s s (4.6)
Here, j is is the vector containing the facies identifiers over all grid elements with index
j i . Let i be a subset of the cells identified by j i and let the vector i
s denote the
facies distribution within the elements of i . Then, the Markov property of the random field s
is expressed by
72
ii j i is s s s (4.7)
and the joint probability distribution defined in (4.6) can be simplified as
1
e
i
N
i
i
s
s s (4.8)
In our implementation, the subset i contains all the elements j i that are contained in a
square of arbitrary size and is centered at i. Figure 4.2 provides a graphical depiction of a
snapshot of a simulation at the time element i is visited while the algorithm is progressing.
Figure 4.2. Snapshot of a simulation obtained by freezing the algorithm while grid element i
is visited. Colored cells are identified by index j < i and have already been visited by the
simulation path. Red, yellow and green elements belong to the disjoint sets H1, H2 and H3,
respectively. Blocks contoured by the solid line belong to the conditional neighborhood (i)
of element i. Grey cells are identified by index j > i and will be simulated after element i.
At the core of the method is the way we express the conditional probability included
in (4.8). We employ the method proposed by Stien and Kolbjornsen [2011] where the target
probability is function of a linear combination of coefficients, in the form
i
73
1 1
1
2
2
1
1
exp
expi
Kk k
i i l i
k
i Kk
i l ik
s
s
z θ
s
z θ
(4.9)
Here, iz is a vector of size 1P and contains a set of coefficients which depend on the
facies distribution over i
s ; 1k
l iθ is a vector of unknown parameters; and the function l i
allows identifying the subset of cells l iH to which the element with label i belongs to (i.e.,
l iiH ). In our implementation, we follow for the elements of the vector iz the same
definitions proposed by Kolbjornsen et al. [2013], where each coefficient is equal to 0 or 1
depending on whether a predefined pattern of facies is reproduced over i or not.
From definition (4.9) it follows that there exists one vector of model parameters for
each facies and for each grid level, for a total number of L K vectors. Estimating these
parameter vectors is accomplished via Maximum likelihood [Stien and Kolbjornsen, 2011] by
means of an appropriate training image which contains all the important features one desires
to preserve in the collection of realizations.
The MM model introduced above allows drawing a facies realization by following the
sequential path defined previously. For each cell, a value identifying a facies is drawn
according to the conditional probability expressed by (4.9). After all elements have been
visited, the resulting facies generation follows the joint probability distribution (4.8).
Application of the methodology to field settings requires that all members of a
collection of generated spatial fields of facies share the same volumetric facies proportions.
These values are in fact related to the quantity and mode of displacement of oil in a reservoir
and have a significant impact on strategic planning of production. It is well known that the
statistical framework offered by the MM (a) does not guarantee that the facies proportion
74
observed in the training image is preserved in the generated realizations and (b) does not
include any tuning parameters which might allow setting a given value of facies proportions.
In this work we propose to overcome this drawback by adopting an
acceptance/rejection (AR) sampling method. Let
1
1 eNk k
i
ie
s sN
1, ,k K (4.10)
be the volumetric proportion of facies k over a field composed by eN elements and
characterized by the occurrence of K distinct facies, and let
1
K
s
s
s (4.11)
be the vector containing all volumetric proportions evaluated through (4.10). We note that ks
(k = 1, ..., K) is the sum of eN random correlated variables (see (4.10)). Therefore, ks is also
a random variable, and s is a random vector. Generating a spatial field of facies through the
sequential algorithm described above allows obtaining a realization of s , that we interpret as
a realization drawn from its prior probability density function. In this framework, we can (a)
consider additional information about s in the form of a likelihood function and (b) draw
samples from the corresponding posterior density using a traditional AR algorithm.
As an example, we consider the case where the additional information (which is
available, e.g., in the form of seismic data and/or expert opinion) suggests that s follows a
multi-normal distribution with mean vector μ and covariance matrix Σ (i.e., ,Ns μ Σ ).
Drawing samples from the resulting posterior density can be accomplished using the
AR algorithm described according to the following two steps:
Step 1: generate an unconditional realization of a spatial field of facies and compute
the corresponding vector s .
75
Step 2:
Compute the quantity , , max , ,N N ss μ Σ s μ Σ .
Draw a realization u from a uniform distribution over the interval 0,1 .
Accept the current realization of facies if u , otherwise reject it.
Return to Step 1.
4.2 Theoretical formulation
In this Section we describe a novel inversion scheme for conditioning a collection of
facies and log-permeability spatial fields on available production data. For simplicity of
notation we limit the illustration to the case where only two fluid phases (oil and water) are
displaced in the host porous domain. Note that the methodology can readily be extended to
systems characterized by a three-phase fluid flow.
We consider the model vector
wat
Y
py
S
w
(4.12)
Here, Y , p , watS are vectors containing static (e.g., the log-permeability values) and
dynamic (e.g., pressure and water saturation values) variables at eN numerical block centers,
respectively, while w contains wN values of production data (e.g., fluid flow rates at well
locations or bottom hole pressure values). Vector y is therefore of size 3y e wN N N .
According to the notation introduced in Chapter 2, we denote by 1, ku T y the model
vector y at time 1kT conditioned on measurements available up to time 1kT . Adoption of a
Monte Carlo framework enables one to approximate the probability density function (pdf) of
76
1, ku T y by means of a collection of NMC equally likely model realizations of the system,
defined as
1
1
,
,
k
k
u T
u T
j
wat
j
Y
py
S
w
1, ,j NMC (4.13)
The non-linear operator allows calculating the corresponding forward vectors at time kT
as
1, ,k kf T u T
j jy y 1, ,j NMC (4.14)
which enables representing the pdf of , kf T
y . The function in (4.14) represents a
multiphase flow model the solution of which is performed within time interval 1k kT T by
setting the log-permeability field to 1, ku T
jY and considering as initial conditions the pressure
and water saturation fields contained in vectors 1, ku T
jp and 1,
,ku T
wat jS , respectively. The boundary
conditions for the flow simulation are here assumed to be known without uncertainty. Since
we consider the case where the static variables (i.e., the log-permeabilities) do not change
during a flow simulation, equality 1, ,k kf T u T
j jY Y always holds. We also note that the
particular value of the vector 1, ku T
jw has no effect on the flow simulation.
We introduce the collection of vectors
, kf T
js 1, ,j NMC (4.15)
describing our knowledge about the spatial distribution of the facies over the domain at time
kT conditioned on measurements available up to time 1kT , and the vector
77
,1
,
k
k
f T
f T
j K
j
s
vs
w
1, ,j NMC (4.16)
Here, each vector , , kk f T
js describes the prior spatial distribution of the indicator variables
associated with each facies 1, ,k K in a given realization j of the ensemble.
The objective of the data assimilation algorithm is to calculate the updated vectors,
, ku T
jy and , ku T
js , conditioned on measured values of production data, kTw , at time kT . Note that
these measurements correspond to randomly perturbed values of kTw and are contained in
vector kTd as defined in (2.2). The data assimilation scheme is the developed through a four-
step algorithm described in details in the following.
Step 1
In the first step we consider the sample average of the quantity defined in (4.16)
, ,
1
1k k
NMCf T f T
j
jNMC
v v (4.17)
and, instead of updating each individual member of the collection, , kf T
jv , as done in the EnKF
context (see (2.23) - (2.29)), we use the corresponding averaged equation defined in (2.30) to
evaluate the updated vector , ku Tv through
1
, , , , ,ˆ ˆ ˆk k k k k k k k k k ku T f T f T T T f T T T T T f T
vv vv εεv v Σ H H Σ H Σ d H v (4.18)
Here, the quantities kTH , kT
d and ˆ kT
εεΣ follow the same definitions respectively given in
(2.2), (2.28) and (2.29) of Chapter 2, while ,ˆ kf T
vvΣ is defined as
, , , , ,
1
1ˆ1
k k k k k
NMCf T f T f T f T f T
i i
iNMC
vvΣ v v v v (4.19)
78
Step 2
The updated vector , ku Tv evaluated at step 1 is now employed to draw a new
collection of updated realizations of spatial fields of facies, , ku T
js , 1, ,j NMC .
The updating algorithm visits each element of the grid following the same sequential
path used for the generation of each facies field realization. When the algorithm has
progressed to reach a generic element i, we assign a new facies identifier to this element for
each updated grid j of the collection (,
,ku T
i js ) according to
1 1 1
1
2 1
2
, ,
1, ,
1
exp
exp
k
k k
i
Kk u T k k
i i l i l i
ku T u T
i Kk k
i l i l ik
s
s
z θ λ
s
z θ λ
(4.20)
where the values of the vectors 1k
l iλ are tuned to satisfy
, , , ,
,
1
1k k
NMCk u T k u T
i j i
j
s sNMC
1, ,k K (4.21)
The term appearing on the right hand side of (4.21) is a component of the updated vector
, ku Tv (4.18). One can note that (4.20) reduces to (4.9) when
1k
l iλ equals ,1P0 . In practice,
(4.20) - (4.21) are used to update the probability of the MM reported in (4.9), iis s .
which is not conditioned on the production data, into the probability , ,k k
i
u T u T
is s ,
conditioned on the information provided by the measurements available up to time kT .
In our implementation of the algorithm we consider 1k
l iλ as vectors of constant
components, differing from vector to vector (i.e., 1 1
,1λk k
Pl i l iλ 1 ,
1λk
l i and ,1P1 respectively
being an unknown scalar and a vector of size P with unit components). Working with this
assumption, the conditional probability (4.20) is a monotonic function of 1λ
k
l i. Estimation of
79
1λ
k
l i is readily accomplished through a bisection algorithm. This simple strategy has been
seen to yield good results, in the sense that the updated spatial distribution of facies maintains
spatial correlations which are similar to those observed in the original grids before the
assimilation of production data. Forms of 1k
l iλ with increased complexity can be taken into
account in the procedure.
Step 3
Up to this point we have evaluated the vectors , ku T
js , 1, ,j NMC , which are the
realizations of the spatial fields of facies conditioned on the measurements available up to
time kT . Before updating the model realizations , kf T
jy , 1, ,j NMC , in this third step we
iterate the flow simulations performed within the time interval 1k kT T upon replacing the
model parameters and the system states updated at time 1kT (namely 1, ku T
jY , 1, ku T
jp and
1,
,ku T
wat jS ) through the corresponding auxiliary fields ( 1', ku T
jY , 1', ku T
jp and 1',
,ku T
wat jS ) to render them
consistent with the underlying facies fields updated through steps (1) - (2). Each auxiliary
updated realization is then used as input vector in the flow simulation expressed by (4.14) to
obtain the auxiliary forward vectors at time kT , ', kf T
jy . The latter will be further conditioned
on the measurements available at time kT during step (4) of our algorithm. In practice, one
can note that step 3 allows alleviating the appearance of unphysical updates in the model state
vectors.
To maintain the same statistics (mean and covariances) of the log-permeability fields
contained in 1, ku T
jY also in the auxiliary log-permeability fields, 1', ku T
jY , one employs
1 1 1 1 1 11
', ', ', , , ,k k k k k ku T u T u T u T u T u T
j j j j j j
Y Y L L Y Y 1, ,j NMC (4.22)
Entries 1,
,ku T
m jY of the vector 1, ku T
jY are computed by
80
1 1
1
, , , , , ,
, , , ,
1 1
k m k m k k
NMC NMCu T k f T k f T u T
m j m l m l m l
l l
Y s s Y
1, , em N (4.23)
In (4.23), mk is the facies identifier at element m of the forward facies grid
, kf T
js (i.e.,
1, ,mk K )
,
,kf T
m m jk s (4.24)
It follows that the indicator variable , ,
,m kk f T
m ls in (4.23) is equal to 1 or 0 according to whether
the facies identifiers over element m of ensemble members l and j coincide or not.
Similarly, an entry m of vector 1', ku T
jY , 1',
,ku T
m jY , is evaluated by
1 1
1
', , , , , ,
, , , ,
1 1
k m k m k k
NMC NMCu T k u T k u T u T
m j m l m l m l
l l
Y s s Y
1, , em N (4.25)
In (4.25), mk is defined as
,
,ku T
m m jk s (4.26)
Matrices 1, ku T
jL and 1', ku T
jL in (4.22) are the Cholesky decompositions of 1,
,ku T
j
YC and 1',
,ku T
j
YC ,
respectively (i.e., they satisfy the equalities 1 1 1, , ,
,k k ku T u T u T
j j j
Y
L L C and
1 1 1', ', ',
,k k ku T u T u T
j j j
Y
L L C ). Entries 1,
,,
ku T
jm n
YC and 1',
,,
ku T
jm n
YC of matrices 1,
,ku T
j
YC and 1',
,ku T
j
YC ,
respectively, are computed by means of
1 1 1 1 1
1
, , , , , , , , , , , , ,
, , , , , , , , ,,
1 1
1k m k n k m k n k k k k k
NMC NMCu T k f T k f T k f T k f T u T u T u T u T
j m l n l m l n l m l m j n l n jm n
l l
s s s s Y Y Y Y
Y
C
1, , em N , 1, , en N (4.27)
1 1 1 1 1
1
', , , , , , , , , , ', , ',
, , , , , , , , ,,
1 1
1k m k n k m k n k k k k k
NMC NMCu T k u T k u T k u T k u T u T u T u T u T
j m l n l m l n l m l m j n l n jm n
l l
s s s s Y Y Y Y
Y
C
1, , em N , 1, , en N (4.28)
81
Following the procedure described for log-permeabilities (embodied by (4.22) - (4.28)), we
modify the values of pressure and water saturation in 1, ku T
jp and 1,
,ku T
wat jS in the corresponding
auxiliary vectors ', 1ku T
j
p and
', 1
,ku T
wat j
S .
The updated auxiliary vectors are then used for the evaluation of the auxiliary forward
spatial fields at time kT through
1', ',k kf T u T
j jy y 1, ,j NMC (4.29)
Step 4
We propose to evaluate the updated state vectors ,u T
jy at time kT by means of
, ', ', ', ',
, ,ˆ ˆ ˆk k k k k k k k k k ku T f T f T T T f T T T T T f T
j j j j j j
yy yy εεy y Σ H H Σ H Σ d H y 1, , ej N (4.30)
where the entries ',
,,
ˆ k kf T T
jm n
yyΣ H of matrix ',
,ˆ k kf T T
j
yyΣ H are computed as
1
, , , ,', ', ', ', ',
, , ,, ,, 1 1
ˆ 1k kk k m m k k k k
NMC NMCk u T k u Tf T T f T f T f T f T
j m l m n l nm l m lm n l l
s s y y w w
yy
Σ H
1, ,3 em N , 1, , wn N (4.31)
1, , , ,', ',
,, ,1 1
k kk m m k
NMC NMCk u T k u Tf T f T
m m lm l m ll l
y s s y
1, ,3 em N (4.32)
1, , , ,', ',
,, ,1 1
k kk m m k
NMC NMCk u T k u Tf T f T
n n lm l m ll l
w s s w
1, , wn N (4.33)
Subscript m in (4.31) - (4.33) indicates the label of the grid element to which the m-
entry of the state vector belongs (e.g., consistently with definitions given in (4.12), m* equals
m, em N or 2 em N according to whether the m-th element of the state vector corresponds
to a value of log-permeability, pressures or water saturation, respectively). Equations (4.30) -
(4.33) allows updating the model vector of a given realization by estimating the sample cross-
covariance between production data and the model variable associated with a given block in
82
the reservoir only on the basis of the members of the collection where the very same facies
value of the element considered in the target model realization occurs. This strategy is
adopted because when the reservoir model is characterized by the presence of distinct facies
with unknown spatial distribution the scatter plots between petrophysical parameters (or
system state variables) and production data are typically arranged in clusters, each of which is
associated with a particular facies identifier. A schematic representation of the data
assimilation algorithm described is depicted in Figure 4.3.
83
Figure 4.3. Flow chart describing the proposed data assimilation algorithm.
,
,
k
k
f T
f T
j
wat
j
Y
py
S
w
0
0
T
T
j
wat
j
Y
py
S
w
0
0
1T
T
j K
j
s
vs
w
Tk=T0
, ku T
js
Step 1
1', ku T
jY
', 1ku T
j
p
', 1
,ku T
wat j
S
Updated
facies fields
Auxiliary forward
fields
',
',
k
k
f T
f T
j
wat
j
Y
py
S
w
Flow simulator
Input:
- logk fields:
- initial conditions:', 1 ', 1
,,u T u T
j wat j
p S
Step 3
0T
jY
0Ts
0T
jp
0
,
T
wat jS
Geostatistical
tools
Equilibrium
conditions
Flow simulator
Input:
- logk fields:
- initial conditions:
0T
jY
0 0
,,T T
j wat jp S
k = k +1
,1
,
k
k
f T
f T
j K
j
s
vs
w
, ku Ts
Updated mean
facies field
Step 2
1', ku T
jY
Updated
logk, pressure and
saturation fields
,
,
k
k
u T
u T
j
wat
j
Y
py
S
w
Flow simulator
Input:
- logk fields:
- initial conditions:, ,
,,k ku T u T
j wat jp S
, ku T
jY
k = k +1
Step 4
84
4.3 Synthetic example
We illustrate the data assimilation scheme described in Section 4.2 and explore its
feasibility and accuracy by way of a transient three-dimensional flow example. The flow
domain is of size 4800 m × 3200 m × 5 m along directions x1, x2, and x3, respectively, and is
discretized into grid cells of uniform size of 50 m × 50 m × 5 m (i.e., 96 × 64 × 1 blocks,
yielding 6144eN ). Note that the grid blocks in this example are arranged into a single
horizontal layer and the ensuing spatial distributions of parameters and state variables are
modeled as two-dimensional random fields.
The reference model of the reservoir is formed by two different facies (i.e., 2K ).
The spatial distribution of the facies has been obtained by the Markov Mesh (MM) procedure
described in Section 4.1 and by estimating the parameters of the MM from the training image
depicted in Figure 4.4.
Figure 4.4. Training image employed for the estimation of the parameters in the MM model
example.
The domain is then populated with the petrophysical properties (i.e., the log-permeability),
yielding the field depicted in Figure 4.5. Log-permeabilities are generated at element centers
200 400 600 800 1000
200
400
600
800
1
2
85
using a sequential Gaussian simulator, with the statistical parameters listed in Table 4.1.
Figures 4.6 - 4.7 display the sample histograms of the log-permeability (Y ) and the
corresponding permeability values (k) in the reference field, respectively. In our example we
consider a permeability tensor Κ that is diagonal and displays a vertical anisotropy. It can be
written as the product between the scalar permeability k and a scaling tensor
1 0 0
0 1 0
0 0 0.1
k
Κ (4.34)
Porosity is treated as a deterministic constant over the numerical grid and set equal to 0.2,
yielding a pore volume of 6 315.36 10 m .
Figure 4.5. Log-permeability distribution and well locations in the reference model. Injectors
and producers are indicated as Ii and Pi (i = 1, 2, 3), respectively.
Mean Covariance function Sill Nugget Range x1 [m] Range x2 [m]
Facies 1 7.1 Exponential 0.10 0.0 800.0 400.0
Facies 2 4.6 Exponential 0.10 0.0 80.0 80.0
Table 4.1. Parameters adopted for the generation of the reference log-permeability field.
Permeability values are expressed in mDarcy.
I1
I2
I3
P1
P2
P3
20 40 60 80
20
40
602
3.75
5.5
7.25
9
1000
2000
3000
1000 2000 3000 4000
refY
1x m
2x m
86
Figure 4.6. Histogram of the log-permeability values in the reference model.
Figure 4.7. Histogram of the permeability values in the reference model.
We simulate the deterministic transient flow caused by the joint action of 6 wells: 3
producer wells (P1, P2 and P3) with constant Bottom Hole Pressure (BHP) equal to 30 bar
and three injectors (I1, I2 and I3), injecting water at a constant rate of 30, 230 and 30 m3/day,
respectively. As shown in Figure 4.5, only the two wells I2 and P2 lie in the region of the
domain characterized by the highest permeabilities and are connected by high permeability
channel. A constant initial oil saturation of 0.8 is imposed on the system, rendering an initial
total volume of oil of 6 312.29 10 m . Initial pressure is set to a uniform value of 100 bar
while all external domain boundaries are impervious. Flow is simulated for a time period of
3 4 5 6 7 8 90
0.2
0.4
0.6
0.8
1
Rel
ati
ve
freq
uen
cy
ln k
0 1000 2000 3000 4000 50000
0.0005
0.001
0.0015
0.002
0.0025
Rel
ati
ve
freq
uen
cy
mDarcyk
87
1200 days using the commercial software ECLIPSE developed by Schlumberger. The
dynamics of this reference model are strongly influenced by the underlying facies distribution
and are typical of a water-flooding environment in which the volume of the injected water is
mainly displaced along the meandering channel.
We sample the reference production curves at twenty times, separated by a fixed lag
of 60 days (i.e., kT 60, 120, …1200 days) and perturb these with a white Gaussian noise
having standard deviation d = 5 3 1m day and 5 bar for flow rates and pressures,
respectively, reflecting measurement errors as described in (2.2). In our example the vector
kTd introduced in (2.2) contains these perturbed data while the covariance matrix of the
corresponding measurement errors, kT
εεΣ , is diagonal with entries equal to
2
d . The temporal
behaviors of the reference production curves and of the corresponding conditioning
measurements employed during the assimilation are depicted in Figures 4.8 - 4.9. Reference
values of water production rate are zero at all assimilation steps and therefore are not
included in these figures.
Figure 4.8. Reference (solid lines) and measured (symbols) values of Well Bottom Hole
Pressure (BHP) versus time for injection wells I1 (blue), I2 (green) and I3 (red).
0 200 400 600 800 1000 1200120
140
160
180
200
time days
WBHP
bar
88
Figure 4.9. Reference (solid lines) and measured (symbols) values of Well Oil Production
Rate (WOPR) versus time for production wells P1 (blue), P2 (green) and P3 (red).
At initial time T0, we generate a collection of spatial fields of facies through the MM
model by using the same parameters employed for the reference field. These realizations are
also conditioned on the value of the volumetric proportion of facies 1 observed in the
reference model (the latter being equal to 0.34) following the acceptance/rejection procedure
described in Section 2.2. Each facies field is then populated by the log-permeability values
using a Gaussian simulator with the same variogram functions and parameters used for
generating the reference field. Following this procedure, we implicitly assume a perfect
knowledge of the statistical models describing the spatial distribution of the geological and of
the petrophysical properties in the reservoir. This allows testing the feasibility and accuracy
of the proposed algorithm without introducing additional sources of uncertainty in our
analysis. We compare the performances and accuracies of the inversion algorithm described
in Section 4.2 (denoted in the following as Facies-EnKF) against the traditional EnKF. We
consider 500NMC and start the assimilation with the same collection of model
realizations for both methodologies.
Figures 4.10 - 4.11 respectively compare the spatial distribution of the mean and the
variance of the log-permeability fields obtained at various assimilation steps using the two
0 200 400 600 800 1000 12000
50
100
150
200
250
time days
3
WOPR
m day
89
approaches. During the earliest assimilation steps the estimated mean fields obtained with the
two approaches are similar and reflect the plausibility for the meandering channel to occur
both within the upper and the lower parts of the domain. During the subsequent updating
steps, when additional data are assimilated into the model, estimates of log-permeability
obtained with Facies-EnKF evolve toward a pattern which is similar to that of the reference
field and enables a correct identification of the position of the channel. Note that the
production curves of injectors I1 and I3 play a key role towards an appropriate facies
identification. This is so because these two wells inject the same water flow rate but work at
different pressure values, as shown in Figure 4.8, suggesting that one of the two (i.e., I1) is
characterized by a lower injectivity and should be located further from the high permeable
channel. The estimated variance field obtained after the latest assimilation step using Facies-
EnKF suggests that the calibrated values of Y are characterized by the highest uncertainty at
locations corresponding to the boundary of the identified channel. These results are not
mirrored by the fields calibrated through EnKF, in which the estimated regions associated
with highest permeabilities correspond only in part to the high-permeable pattern displayed in
the reference field. Figure 4.10 shows that the EnKF algorithm identifies correctly the
presence of high/low permeable regions around each well without capturing the global field
architecture. Moreover, EnKF does not allow preserving the correct geological setting in the
calibrated realizations. This is evident from Figure 4.12, which displays five selected
realizations of log-permeability updated after the latest assimilation step using the two
methodologies. On the contrary, the fields estimated through Facies-EnKF preserve the
correct architecture. By visual inspection, one can also note that these fields tend to show a
degree of spatial variability of log-permeabilities which is similar to the one characterizing
the reference model.
90
Time step Facies-EnKF EnKF
0
1
3
6
12
16
20
Figure 4.10. Estimates of mean log-permeability at initial time and at 6 assimilation steps.
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
91
Time step Facies-EnKF EnKF
0
1
3
6
12
16
20
Figure 4.11. Estimation variance of log-permeability at initial time and at 6 assimilation steps.
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
0
0.5
1
1.5
2
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
0
0.5
1
1.5
2
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
0
0.5
1
1.5
2
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
0
0.5
1
1.5
2
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
0
0.5
1
1.5
2
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
0
0.5
1
1.5
2
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
0
0.5
1
1.5
2
92
Relization
n. Facies-EnKF EnKF
100
200
300
400
500
Figure 4.12. Five selected realizations of the updated log-permeability field after the latest assimilation step.
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
I1
I2
I3
P1
P2
P3
2
4
6
8
93
These results are also confirmed by the temporal behaviors of YE and YV (as defined
in (3.37) and (3.38)) displayed in Figure 4.13 - 4.14, respectively. Values of YE obtained
through Facies-EnKF are always smaller than those obtained using a traditional EnKF
algorithm. The curves of YV obtained through the two methodologies are similar. However,
while the EnKF results in a monotonically decreasing trend of the average estimation
variance, the curve obtained with Facies-EnKF is characterized by the appearance of some
fluctuations.
Figure 4.13. Average absolute difference, YE , between estimated and reference Y values
obtained with Facies-EnKF (solid line) and EnKF (dashed line).
Occurrence of filter inbreeding in the performed assimilations is checked by plotting
the temporal behavior of 2 YP in Figure 4.15. Values of 2 Y
P obtained with Facies-EnKF are
always larger than those obtained with a traditional EnKF, reflecting the highest quality of
the Y estimates based on Facies-EnKF. Both approaches display a decreasing temporal trend
of 2 YP . This can be attributed to a systematic underestimation of the error variance in time
and can be due to the occurrence of spurious covariances in the empirical covariance matrix,
which is based on only 500 MC realizations. It is nonetheless remarkable that the dashed
0.5
0.7
0.9
1.1
0 200 400 600 800 1000 1200
YE
time days
94
curve displayed in Figure 4.15 and obtained with EnKF is characterized by a much higher
decreasing rate than the one obtained through Facies-EnKF, suggesting that the proposed
approach is effectively capable to attenuate the occurrence of inbreeding effects during the
assimilation.
Figure 4.14. Average estimation variance ( YV ) of Y obtained with Facies-EnKF (solid line)
and EnKF (dashed line).
Figure 4.15. 2 Y
P versus time obtained with Facies-EnKF (solid line) and EnKF (dashed
line).
0.4
0.6
0.8
1.0
1.2
1.4
0 200 400 600 800 1000 1200
time days
YV
0.80
0.85
0.90
0.95
1.00
0 200 400 600 800 1000 1200
2 YP
time days
95
We finally analyze the ability of the updated model realizations to predict the forecast
production during an additional period of 2400 days after the latest assimilation time. This
analysis is performed by re-running the flow simulation for all calibrated log-permeability
fields from time 0 and for a total simulation period of 3600 days. We explore two distinct
scenarios differing form each other in the flow configuration imposed during the additional
time period of 2400 days. These simulations mimic a procedure which is commonly adopted
in the management of a reservoir where measurements acquired until a given time (in our
example, until the first 1200 days of production) are used to build a calibrated model of the
reservoir. The latter is then typically employed to (a) investigate the future production under
diverse scenarios and (b) select the most efficient development strategy amongst a range of
plausible choices. In this context, we explore two scenarios. In our first scenario, the same
flow setting imposed in the course of the assimilation time is maintained also during the
additional simulation period. A second study is performed by considering the presence of two
additional wells (i.e., one producer and one injector) that become operative after 1200 days.
Figures 4.16 - 4.17 depict the production curves corresponding to the bottom hole
pressure at injectors and to the production rates of oil and water at the production wells
obtained for the first scenario. Predicted and reference water flow rates at wells P1 and P3 are
not displayed because they are zero at all time steps. These figures show that the predicted
pressure curves and oil production rates obtained from the two set of calibrated fields are
both in good agreement with the corresponding reference production curves (which have
been obtained upon relying on the true, reference reservoir model). It is remarkable that the
model realizations calibrated through EnKF provide such a high quality prediction despite the
observation that the corresponding log-permeability distributions do not honor the correct
geology architecture of the reference reservoir (see Figure 4.12). We also note that the water
flow rates at well P2 predicted through Facies-EnKF is characterized by a smaller uncertainty
96
when compared to the estimation provided by EnKF. This is also demonstrated in Figure
4.18, which compares the sample histograms of the predicted water flow rate at the final
simulation time (i.e., 3600 days) obtained with the two approaches. The same observation
holds also for the predicted field oil production, FOPT. While both data assimilation
approaches allows obtaining a good match between the estimated curves and the reference
FOPT production history (see Figure 4.19), the prediction obtained through EnKF at the final
time is characterized by the largest uncertainty (Figure 4.20).
Facies-EnKF EnKF
I1
WBHP
bar
I2
WBHP
bar
I3
WBHP
bar
time days time days
Figure 4.16. Time dependence of WBHP values related to the injection wells for the collection of models
updated through Facies-EnKF and EnKF (solid grey) during test scenario 1. Corresponding mean (solid
black), 10th
and 90th
percentile (dashed black) are also reported. Red curve indicates the reference model
solution.
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
97
Facies-EnKF EnKF
P1 3
WOPR
m day
P2 3
WOPR
m day
P3 3
WOPR
m day
P2 3
WWPR
m day
time days time days
Figure 4.17. Time dependence of WOPR and WWPR values related to the production wells for the
collection of models updated through Facies-EnKF and EnKF (solid grey) during test scenario 1.
Corresponding mean (solid black), 10th
and 90th
percentile (dashed black) are also reported. Red curve
indicates the reference model.
0 1200 2400 36000
50
100
0 1200 2400 36000
50
100
0 1200 2400 36000
100
200
300
400
500
0 1200 2400 36000
100
200
300
400
500
0 1200 2400 36000
50
100
0 1200 2400 36000
50
100
0 1200 2400 36000
50
100
150
200
0 1200 2400 36000
50
100
150
200
98
Facies-EnKF EnKF
3WWPR m day
3WWPR m day
Figure 4.18. Histograms of water production rates values predicted at well P2 at time 3600
days during test scenario 1. Vertical red lines indicate corresponding reference values.
Facies-EnKF EnKF
6 3
FOPT
10 m
time days time days
Figure 4.19. Time dependence of FOPT for the collection of models updated through Facies-
EnKF and EnKF (solid grey) during test scenario 1. Corresponding mean (solid black), 10th
and 90th
percentile (dashed black) are also reported. Red curve indicates the reference model
solution.
Facies-EnKF EnKF
6 3FOPT 10 m
6 3FOPT 10 m
Figure 4.20. Histograms of FOPT values predicted at time 3600 days during test scenario 1.
Vertical red lines indicate corresponding reference values.
0.7 0.8 0.9 10
20
40
60
Rel
ativ
e fr
equen
cy
50 100 150 2000
0.02
0.04
0.06
0.08
0.1
50 100 150 2000
0.02
0.04
0.06
0.08
0.1
0 1200 2400 36000
0.5
1
0 1200 2400 36000
0.5
1
0.7 0.8 0.9 10
20
40
60
Rel
ativ
e fr
equen
cy
0.7 0.8 0.9 10
20
40
60
0.7 0.8 0.9 10
20
40
60
99
As mentioned above, in the second scenario we explore the presence of two additional
wells (i.e., one injector, I4, and one producer, P4) that become operative after time 1200 days.
The locations of these new wells are depicted in Figure 4.21 and are selected upon relying on
the information provided by the mean Y field updated through Facies-EnKF, which displays
in these positions a high probability of occurrence of the high permeability facies. In this
second forecast study, wells I4 and P4 work at a fixed water flow rate of 450 3m day and at
a constant bottom hole pressure of 30 bar, respectively.
Figure 4.21. Mean log-permeability distribution estimated through Facies-EnKF after the
latest assimilation step. Spatial location of additional wells I4 and P4 is also displayed.
Figures 4.22 - 4.23 compare the production curves predicted through the model
realizations obtained with two assimilation approaches. These figures highlight the improved
prediction ability of the log-permeability fields updated through Facies-EnKF with respect to
those calibrated from the assimilation performed with the traditional EnKF approach. All
production curves estimated through Facies-EnKF are characterized by a much smaller
degree of uncertainty than those obtained using EnKF. They also provide a better match of
the reference production values also at the additional wells I4 and P4, as shown by the
I1
I2
I3
P1
P2
P3
I4
P4
20 40 60 80
20
40
602
3.75
5.5
7.25
9
1000
2000
3000
1000 2000 3000 4000
Y
1x m
2x m
100
histograms displayed in Figures 4.24. The improved prediction ability of the model
realizations updated through Facies-EnKF results in a more precise estimation of the total oil
production curve, as shown in Figures 4.25 - 4.26.
Facies-EnKF EnKF
I1
WBHP
bar
I2
WBHP
bar
I3
WBHP
bar
I4
WBHP
bar
time days time days
Figure 4.22. Time dependence of WBHP values related to the injection wells for the collection of models
updated employing Facies-EnKF and EnKF (solid grey) during test scenario 2. Corresponding mean (solid
black), 10th
and 90th
percentile (dashed black) are also reported. Red curve indicates the reference model
solution.
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
0 1200 2400 36000
100
200
300
0 1200 2400 36000
50
100
150
200
0 1200 2400 36000
100
200
300
400
500
101
Facies-EnKF EnKF
P1 3
WOPR
m day
P2 3
WOPR
m day
P3 3
WOPR
m day
P4 3
WOPR
m day
P2 3
WWPR
m day
P4 3
WWPR
m day
time days time days
Figure 4.23. Time dependence of WOPR and WWPR values related to the production wells for the
collection of models updated employing Facies-EnKF and EnKF (solid grey) during test scenario 2.
Corresponding mean (solid black), 10th
and 90th
percentile (dashed black) are also reported. Red curve
indicates the reference model solution.
0 1200 2400 36000
50
100
0 1200 2400 36000
50
100
0 1200 2400 36000
100
200
300
400
500
0 1200 2400 36000
100
200
300
400
500
0 1200 2400 36000
50
100
0 1200 2400 36000
50
100
0 1200 2400 36000
100
200
300
400
500
0 1200 2400 36000
100
200
300
400
500
0 1200 2400 36000
100
200
300
400
500
0 1200 2400 36000
100
200
300
400
500
0 1200 2400 36000
100
200
300
400
500
0 1200 2400 36000
100
200
300
400
500
102
Facies-EnKF EnKF
I4
WBHP bar WBHP bar
P4
3WOPR m day
3WOPR m day
P4
3WWPR m day 3WWPR m day
Figure 4.24. Histograms of estimated production values at wells I4 and P4 at time 3600 days for test
scenario 2. Vertical red lines indicate corresponding reference values.
0.7 0.8 0.9 10
20
40
60
Rel
ativ
e fr
equen
cy
0 100 200 3000
0.05
0.1
0.15
0.2
0 100 200 3000
0.05
0.1
0.15
0.2
0.7 0.8 0.9 10
20
40
60
Rel
ativ
e fr
equen
cy
0 50 100 1500
0.05
0.1
0.15
0.2
0 50 100 1500
0.05
0.1
0.15
0.2
0.7 0.8 0.9 10
20
40
60
Rel
ativ
e fr
equen
cy
0 100 200 300 400 5000
0.005
0.01
0.015
0.02
0 100 200 300 400 5000
0.005
0.01
0.015
0.02
103
Facies-EnKF EnKF
6 3
FOPT
10 m
time days time days
Figure 4.25. Time dependence of FOPT for the collection of models updated through Facies-
EnKF and EnKF (solid grey) during test scenario 2. Corresponding mean (solid black), 10th
and 90th
percentile (dashed black) are also reported. Red curve indicates the reference model
solution.
Facies-EnKF EnKF
6 3FOPT 10 m
6 3FOPT 10 m
Figure 4.26. Histograms of FOPT values predicted at time 3600 days during test scenario 2.
Vertical red lines indicate corresponding reference values.
We conclude our analysis by comparing the computational costs of the two
approaches. Solving one assimilation step on a 2.80 GHz Intel i7-860 processor requires
3,400 s and 8,500 s for EnKF and Facies-EnKF, respectively. The proposed algorithm
requires more CPU time than the traditional EnKF because (a) the flow simulations are
iterated and must be solved twice within a single updating step and (b) the facies fields must
be re-generated at each assimilation time, as detailed in Section 4.2.
0 1200 2400 36000
0.5
1
0 1200 2400 36000
0.5
1
0.7 0.8 0.9 10
20
40
60
Rel
ativ
e fr
equen
cy
0.9 1 1.1 1.2 1.3 1.40
10
20
30
40
0.9 1 1.1 1.2 1.3 1.40
10
20
30
40
104
4.4 Conclusions
In this Chapter we present a novel data assimilation scheme that allows the sequential
assimilation of production data into a complex reservoir model for conditioning its geological
and petrophysical properties.
In the proposed algorithm, a Markov Mesh (MM) model is used to describe the spatial
distribution of the facies within a consistent Bayesian framework. This is then integrated into
a history matching procedure which is based on the EnKF scheme. We test the proposed
methodology by way of a synthetic example corresponding to a reservoir within which two
distinct facies are spatially distributed. We analyze the accuracy and computational efficiency
of our algorithm with respect to the standard EnKF both in terms of history matching quality
and prediction ability of the forecast production.
We show that the proposed inversion scheme is conducive to an updated collection of
facies and log-permeability fields which maintain the type of geological setting displayed
prior to updating (i.e., in the example we analyze the updated fields are still characterized by
a single high-permeability channel, whose location, which defines the internal architecture of
the system, is updated as data assimilation progresses in time). On the other hand, the
standard EnKF is not capable of preserving the correct geological scenario.
We test the prediction ability of the realizations obtained through our procedure by
means of two forecast scenarios, in which diverse flow configurations are considered after the
latest assimilation time. In our first scenario, the same flow setting imposed in the course of
the assimilation time is maintained also during the additional simulation period. A second
study is performed by considering the presence of two additional wells (i.e., one producer and
one injector) that become operative after 1200 days. Both approaches yield a good estimation
of the target production values during the first scenario, the predictions provided by standard
EnKF being characterized by the highest degree of uncertainty. The performances of the two
105
methodologies strongly differ when considering the second scenario where our proposed
algorithm outperforms the standard EnKF by providing a superior match between the
reference and the predicted production curves.
107
Appendix A
We derive finite element equations which we employ for the numerical solution of the
moment equations (MEs) presented in Chapter 3.
Our MEs represent an extension of the work of Ye [2002] and Ye et al. [2004]. They
have been derived to allow embedding of MEs into the KF framework by considering the
general case when the initial head field is random and statistically correlated with the log-
conductivity field. In this Appendix we limit our discussion only to the equations that differ
from those of Ye et al. [2004] and are used for the numerical evaluation of the second-order
approximations of residual flux, cross-covariances and head covariances (see (3.24) - (3.29)).
An extensive and detailed derivation of the finite element equations for the solution of the
zero- and second-order mean heads, together with a description of the quotient-difference
algorithm employed for the computation of the inverse Laplace transform can be found in Ye
[2002].
A.1 Second-order residual flux
We discretize the flow domain into eN elements, each having constant log-
conductivity, and use bilinear Lagrange basis function to interpolate Laplace-transformed
heads between grid nodes according to
1
ˆ, ,
gN
n n n
i i
i
h h h
x x x 0,2n (A.1)
where ˆ
,n
h x is the finite element approximation of ,n
h x , gN is the number of
grid nodes, i x is a bilinear Lagrange basis function and n
ih is the n-th order
approximation of the mean transformed hydraulic head, ,n
h x , at node m, being the
Laplace parameter.
108
We interpolate the transformed zero-order Green’s function in a similar way as
0 0 0
,
1 1
ˆ, , , ,
g gN N
i j i j
i j
G G G
x y x y x y (A.2)
where 0
,i jG is the zero-order transformed mean Green’s function at node i in the x-plane
due to a source of unit strength at node j in the y-plane. In addition, we interpolate
2
0K h x y by
22
0 0,
1
gN
k k
k
K h K h
x y x y (A.3)
where 2
0,kK h x is 2
0K h x y evaluated at node k in the y-plane.
Substituting (A.1) - (A.3) into (3.24) yields
2 0 0
,
1 1 1
, , dg g gN N N
G G Y i j i j k k
i j k
K K C G h
x y yr x x y x y x y y y
2 0
0, ,
1 1 1
dg g gN N N
k k S i j i j
k i j
K h S G
xx y y x y y (A.4)
Expressing the domain integrals in (A.4) as a summation of integrals over each grid element,
the residual flux at point ex inside element e can be written as
' '
2 0 ' 0 '' , ' ' '
,
' 1 1 1 1'
, de e e e
e
N M M Mee ee e e e e e e e
G G Y i j i j k k
e i j ke
K K C G h
x y yx xr x x y x y y y
' '2
0 '' ' ' '
0, ,
' 1 1 1 1'
de e e e
e
N M M Meee e e e e e
k k S i j i j
e k i je
K h S G
x x xx y y x y y (A.5)
where eM and
'eM respectively are the number of nodes of elements e and e’, , 'e e
YC is the
covariance between log-conductivity in elements e and e’, 0 'ee
ijG is the zero-order
transformed mean Green’s function at node i of element e due to a source of unit strength at
node j of element e’, and 0 'e
kh is the value of 0
,h y at node k of element e’.
Rearranging terms in (A.5) leads to the compact form
109
' '
2 0 ' 0 '' , ' ' '
,
1 ' 1 1 1
,e e e e
e
M N M Mee ee e e e e e e e
G i G Y i j k jk
i e j k
K K C G h
x x xr x x x y
' ' 20 '' ' ' '
0, ,
1 ' 1 1 1
e e e e
e
M N M Meee e e e e e
i S k i j jk
i e k j
S K h G
x x xx y x (A.6)
where ' 'e e
jk and ' 'e e
jk are defined as
' ' ' '
'
de e e e
jk j k
e
y yy y y (A.7)
' ' ' '
'
de e e e
jk j k
e
y y y (A.8)
Following a similar approach, the domain integral included in the term
2, dn c nR
xr x x x that appears on the right hand side of the system used to solve
for the second-order mean head (see (3.11) of Ye [2002]) is discretized into
2
1
, deN
e e
n n
e e
R
xr x x x (A.9)
Substituting (A.6) into (A.9) yields
' '
0 ' 0 '' , ' ' '
,
1 1 ' 1 1 1
de e e e e
e
N M N M Mee ee e e e e e e e
n G i G Y i j k jk n
e i e j ke
R K K C G h
x xx x
x x y x x
' ' 20 '' ' ' '
0, ,
1 1 ' 1 1 1
de e e e e
e
N M N M Meee e e e e e e
i S k i j jk n
e i e k je
S K h G
x xx x
x y x x x
(A.10)
that can be also written as
' '
0 ' 0 '' , ' ' '
,
1 1 ' 1 1 1
e e e e eN M N M Mee ee ee e e e e e
n G in G Y i j k jk
e i e j k
R K K C G h
x y
' ' 20 '' ' ' '
0, ,
1 1 ' 1 1 1
e e e e eN M N M Meeee e e e e e
in S k i j jk
e i e j k
S K h G
y x (A.11)
110
A.2 Second-order cross-covariance between head and conductivity
We denote as 2,
yju x the cross-covariance
2, ,Khu x y between hydraulic
conductivity at point x and transformed hydraulic head at node yj of the y-plane and
interpolate the zero-order mean Green’s function as
0 0 0
,
1
ˆ, , , ,
g
y
N
i j i
i
G G G
z y z y z (A.12)
Substituting (A.1), (A.3) and (A.12) into (3.25) yields
2 0 0
,
1 1
, , dg g
y y
N N
j G G Y i i i j i
i i
u K K C h G
z zx x z z x z z z
2 0
0, ,
1 1
dg g
y
N N
S i i i j i
i i
S K h G
z x z z z (A.13)
Expressing the domain integral in (A.13) as a summation of integrals over each grid
element allows writing the cross-covariance between hydraulic conductivity at element e and
transformed head at node yj as
' '
2 0 ' 0 '' ' ' '
,
' 1 1 1
,e e e
y y
N M Me ee e e e e e e
j G G Y k j i ik
e i k
u K K C G h
x x z
' ' 2
0 '' ' ' '
, 0,
' 1 1 1
e e e
y
N M Mee e e e e
S k j i ik
e i k
S G K h
z x (A.14)
A.3 Second-order head head covariance
We interpolate the second-order covariance, 2
0 0h h x z , between initial
hydraulic heads at node m assocated with vector position x and vector position z as
2 2
0, 0 0, 0,
1
gN
m m i i
i
h h h h
z z (A.15)
and evaluate the covariance between initial head at node m and the transformed head at node
yj at vector position y, 2
0, ym jh h , upon substituting (A.1), (A.3), (A.12) and (A.15) into
(3.29)
111
2 20 0
0, 0, ,
1 1
dg g
y y
N N
m j i i m k j k
i k
h h h K h G
z zz z z z
2 0
0, 0, ,
1 1
dg g
y
N N
S m i i k j k
i k
S h h G
z z z z (A.16)
The domain integral in (A.16) is then decomposed into a sum of integrals over each grid
element as
2 20 0
0, 0, ,
1 1 1
e e e
y y
N M Me ee
m j m i k j ik
e i k
h h K h h G
z
2 0
0, 0, ,
1 1 1
e e e
y
N M Me ee
S m i k j ik
e i k
S h h G
z (A.17)
Numerically solution of (3.26) - (3.28) is performed by approximating the second-order head
covariance 2
, , ,hC sx y through
2 2 2
,
1
ˆ, , , , , , ,
g
y
N
h h m j m
m
C s C s C s
x y x y x (A.18)
where 2ˆ
, , ,hC sx y is the finite element approximation of 2
, , ,hC sx y and 2
, ,ym jC s
is the covariance between transformed head at node m associated with vector position x and
head at node yj associated with vector position y.
Galerkin orthogonalization of (3.26) yields
2 2 0ˆ, , , , , , dG h Kh nK C s u s h
x x x
x x y x y x x x
2ˆ, , , dS h nS C s
x x y x x
2
0 , dS nS h h s
x x y x x
2 2 0ˆ, , , , , , dG h Kh nK C s u s h
x x
x x y x y x n x x x
1, , gn N (A.19)
By virtue of (3.27) - (3.28), equation (A.19) becomes
112
2 2 0ˆ, , , d , , , dG h n Kh nK C s u s h
x x x xx x y x x x y x x x
2ˆ, , , dS h nS C s
x x y x x
2
0 , dS nS h h s
x x y x x
2 0, , , d
D
Kh nu s h
xx y x n x x x 1, , gn N (A.20)
We interpolate 2
0 ,h h s x y as
22
0 0,
1
,g
y
N
m j m
m
h h s h h s
x y x (A.21)
where 2
0, ym jh h s is the second-order approximation of the covariance between initial
head at node m associated with vector position x and head at node yj associated with vector
position y and is evaluated by taking the inverse Laplace transform of 2
0, ym jh h (A.17).
Substituting (A.1), (A.14), (A.18) and (A.21) into (A.20) yields
2 2 0
,
1 1
, d , dg g
y y
N N
G m j m n j m m n
m m
K C s u s h
x x x xx x x x x x x x
2
,
1
, dg
y
N
S m j m n
m
S C s
x x x x
2
0,
1
dg
y
N
S m j m n
m
S h h s
x x x x
2 0
1
, dg
y
D
N
j m m n
m
u s h
xx x n x x x 1, , gn N (A.22)
Rearranging terms and defining
dnm G m nA K
x xx x x x (A.23)
dnm S m nD S
x x x x (A.24)
2 2
,
1
, d , de
y y y
Ne
nm j j m n j m n
e e
F u s u s
x x x xx x x x x x x x (A.25)
2 0
1
, dg
y
D
N
n j m m n
m
T u s h
xx x n x x x (A.26)
113
where 2,
y
e
ju sx is rendered by taking the inverse Laplace transform of 2,
y
e
ju x
calculated through (A.14). Equation (A.22) can finally be written as
2
,
1
,g
y
N
nm nm m j
m
A D C s
nT 0
,
1
g
y
N
nm j m
m
F h
2
0,
1
g
y
N
nm m j
m
D h h s
1, , gn N (A.27)
115
References
Aanonsen, S.I., G. Nævdal,, D.S. Oliver, A.C. Reynolds, and B. Vallès (2009), Ensemble
Kalman filter in reservoir engineering – a review, SPE J., 14(3), 393-412,
doi:10.2118/117274-PA.
Ahmed, N., T. Natarajan, and K.R. Rao (1974), Discrete Cosine Tranform, IEEE T. Comput.
C-23(1), 90-93, doi:10.1109/T-C.1974.223784.
Alcolea, A., J. Carrera, and A. Medina (2006), Pilot points method incorporating prior
information for solving the groundwater flow inverse problem, Adv. Water Resour.,
29(11), 1678-1689, doi: 10.1016/j.advwatres.2005.12.009.
Anderson, J.L. (2007), An adaptive covariance inflation error correction algorithm for
ensemble filters. Tellus Series, 59(2), 210-224, doi: 10.1111/j.1600-0870.2006.00216.x.
Ballio, F., and A. Guadagnini (2004), Convergence assessment of numerical Monte Carlo
simulations in groundwater hydrology, Water Resour. Res., 40(4), W04603,
doi:10.1029/2003WR002876.
Bianchi Janetti, E., M. Riva, S. Straface, and A. Guadagnini (2010), Stochastic
characterization of the Montalto Uffugo research site (Italy) by geostatistical inversion
of moment equations of groundwater flow, J. Hydrol., 381(1-2), 42-51, doi:
10.1016/j.jhydrol.2009.11.023
Burgers, G., P.J. van Leeuwen, and G. Evensen (1998), Analysis Scheme in the Ensemble
Kalman Filter. Mon. Weather Rev., 126(6), 1719–1724, doi:10.1175/1520-
0493(1998)126<1719:ASITEK>2.0.CO;2.
Chang, H., Zhang, D., Lu, Z. (2010), History matching of facies distribution with the EnKF
and level set parameterization, J. Comput. Phys. 229(20), 8011-8030, doi:
10.1016/j.jcp.2010.07.005.
116
Chen, Y., and D. Zhang (2006), Data assimilation for transient flow in geologic formations
via ensemble Kalman filter, Adv. Water Res., 29(8), 1107-1122,
doi:10.1016/j.advwatres.2005.09.007.
Cohn, S.E. (1997), An Introduction to Estimation Theory, Journal of the Meteorological
Society of Japan, 75(1B), 257-288.
De Hoog, F. R., J. H. Knight, and A. N. Stokes (1982), An improved method for numerical
inversion of Laplace transform, SIAM J. Sci. Stat. Comput., 3(3), 357– 366, doi:
10.1137/0903022.
Deutsch, C.V., and A.G. Journel (1998), GSLIB, geostatistical software library and user’s
guide, 2nd ed., Oxford University Press, New York, ISBN-10:0195100158.
Dovera, L., and E. Della Rossa (2011), Multimodal ensemble Kalman filtering for Gaussian
mixture models, Computat. Geosci. 15(2), 307-323, doi: 10.1007/s10596-010-9205-3.
Evensen, G. (1994), Sequential data assimilation with a nonlinear quasi-geostrophic model
using Monte Carlo methods to forecast error statistics, J. Geophys. Res., 99(C5),
10143-10162, doi:10.1029/94JC00572.
Furrer, R., and T. Bengtsson (2007), Estimation of high-dimensional prior and posterior
covariance matrices in Kalman filter variants, J. Multivar. Anal., 98(2), 227–255,
doi:10.1016/j.jmva.2006.08.003.
Gelb, A. (1974), Applied optimal estimation, The MIT Press, Cambridge, Mass., ISBN-
10:0262570483.
Guadagnini, A., and S.P. Neuman (1999), Nonlocal and localized analyses of conditional
mean steady state flow in bounded, randomly nonuniform domains: 2. Computational
examples, Water Resour. Res., 35(10), 3019 – 3039, doi:10.1029/1999WR900159.
117
Hendricks Franssen, H.-J., and W. Kinzelbach (2008), Real-time groundwater flow modeling
with the Ensemble Kalman Filter: Joint estimation of states and parameters and the
filter imbreeding problem, Water Resour. Res., 44, W09408,
doi:10.1029/2007WR006505.
Hendricks Franssen, H.-J., H.P. Kaiser, U. Kuhlmann, G. Bauser, F. Stauffer, R. Muller, and
W. Kinzelbach (2011), Operational real-time modeling with ensemble Kalman filter of
variably saturated subsurface flow including stream-aquifer interaction and parameter
updating, Water Resour. Res., 47, W02532, doi:10.1029/2010WR009480.
Hernandez, A.F., S.P. Neuman, A. Guadagnini, and J. Carrera (2003), Conditioning mean
steady state flow on hydraulic head and conductivity through geostatistical inversion,
Stochastic Environ. Res. Risk Assess., 17(5), 329-338, doi:10.1007/s00477-003-0154-4.
Hernandez, A.F., S.P. Neuman, A. Guadagnini, and J. Carrera (2006), Inverse stochastic
moment analysis of steady state flow in randomly heterogeneous media, Water Resour.
Res., 42(5), W05425, doi:10.1029/2005WR004449.
Houtekamer, P.L., and H.L. Mitchell (1998), Data Assimilation Using an Ensemble Kalman
Filter Technique, Mon. Weather Rev., 126(3), 796-811, doi:10.1175/1520-
0493(1998)126<0796:DAUAEK>2.0.CO;2.
Jafarpour, B., and D.B. McLaughlin (2008), History matching with an ensemble Kalman
filter and discrete cosine parameterization, Computat. Geosci., 12(2), 227–244, doi:
10.1007/s10596-008-9080-3.
Jafarpour, B., and M. Khodabakhshi (2011), A Probability Conditioning Method (PCM) for
nonlinear flow data integration into multipoint statistical facies simulation, Math.
Geosci., 43(2), 133-164, doi: 10.1007/s11004-011-9316-y.
118
Jafarpour, B., and M. Tarrahi (2012), Assessing the performance of the ensemble Kalman
filter for subsurface flow data integration under variogram uncertainty, Water Resour.
Res., 47, W05537, doi:10.1029/2010WR009090.
Kalman, R.E. (1960), A New Approach to Linear Filtering and Prediction Problems, J. Basic
Eng., 82(D), 35-45, doi: doi:10.1115/1.3662552.
Kolbjørnsen, O., M. Stien, H. Kjønsberg, B. Fjellvoll, and P. Abrahamsen (2013), Using
Multiple Grids in Markov Mesh Facies Modeling, Math. Geosci., doi: 10.1007/s11004-
013-9499-5.
Le Loc’h, G., and A. Galli (1997), Truncated plurigaussian method: theoretical and practical
points of view. E.Y. Baafi and N.A. Schofield (eds), Geostatistics Wollongong ’96, 1,
211-222, Dordrecht, Kluwer Academic Press.
Liang, X., X. Zheng, S. Zhang, G. Wu, Y. Dai, and Y. Li (2012), Maximum Likelihood
estimation of inflation factors on error covariance matrices for ensemble Kalman filter
assimilation, Quart. J. Meteor. Soc., 138(662), 263-273, doi:10.1002/qj.912.
Liu, N. and D.S. Oliver (2005a), Ensemble Kalman filter for automatic history matching of
geologic facies, J. Petrol. Sci. Eng., 47(3-4), 147–161, doi:
10.1016/j.petrol.2005.03.006.
Liu, N. and D.S. Oliver (2005b), Critical Evaluation of the Ensemble Kalman Filter on
History Matching of Geologic Facies, SPE Reserv. Eval. Eng. 8(6), 470-477, doi:
10.2118/92867-PA.
Liu, Y., A.H. Weerts, M. Clark, H.-J. Hendricks Franssen, S. Kumar, H. Moradkhani, D.J.
Seo, D. Schwanenberg, P. Smith, A.I.J.M. van Dijk, N. van Velzen, M. He, H. Lee, S.J.
Noh, O. Rakovec, and P. Restrepo (2012), Advancing data assimilation in operational
hydrologic forecasting: progresses, challenges, and emerging opportunities, Hydrol.
Hearth Syst. Sci., 16(10), 3863-3887, doi: 10.5194/hess-16-3863-2012.
119
McLaughlin, D.B. (2002), An integrated approach to hydrologic data assimilation:
interpolation, smoothing, and filtering, Adv. Water Resour., 25(8-12), 1275-1286,
doi:10.1016/S0309-1708(02)00055-6.
Moreno, D. and S.I. Aanonsen (2007), Stochastic facies modeling using the level set method,
Petroleum Geostatistics, 10–14 September 2007, Cascais, Portugal, A16, Extended
Abstracts Book, EAGE Publications BV, Utrecht, The Netherlands.
Naevdal, G., L.M. Johnsen, S.I. Aanonsen, and E.H. Vefring (2005), Reservoir monitoring
and continuous model updating using ensemble Kalman filter, SPE J., 10(1), 66-74,
doi:10.2118/84372-PA.
Oliver, D.S., and Y. Chen (2011), Recent progress on reservoir history matching: a review,
Comput. Geosci. 15(1), 185-221, doi:10.1007/s10596-010-9194-2.
Rao, K.R., and P. Yip (1990), Discrete Cosine Tranform: Algorithms, Advantages,
Applications, Academic Press, Boston.
Riva, M., A. Guadagnini, S. P. Neuman, E. Bianchi Janetti, and B. Malama (2009), Inverse
analysis of stochastic moment equations for transient flow in randomly heterogeneous
media, Adv. Water Resour., 32(10), 1495-1507, doi:10.1016/j.advwatres.2009.07.003.
Riva, M., A. Guadagnini, F. De Gaspari, and A. Alcolea (2010), Exact sensitivity matrix and
influence of the number of pilot points in the geostatistical inversion of moment
equations of groundwater flow, Water Resour. Res., 46, W11513,
doi:10.1029/2009WR008476.
Riva, M., M. Panzeri, A. Guadagnini, and S.P. Neuman (2011), Role of model selection
criteria in geostatistical inverse estimation of statistical data- and model-parameters,
Water Resour. Res., 47, W07502, doi:10.1029/2011WR010480.
120
Schoeniger, A., W. Nowak, and H.-J. Hendricks Franssen (2012), Parameter estimation by
ensemble Kalman filters with transformed data: Approach and application to hydraulic
tomography, Water Resour. Res., 48, W04502, doi:10.1029/2011WR010462.
Stien, M., and O. Kolbjørnsen (2011), Facies modeling using a Markov Mesh model
specification, Math. Geosci., 43(6), 611-624, doi: 10.1007/s11004-011-9350-9.
Strebelle, S. (2002), Conditional simulation of complex geological structures using multiple-
point statistics, Math. Geol., 34(1), 1-21, doi: 10.1023/A:1014009426274.
Tarantola, A. (2005), Inverse Problem Theory, Society for Industrial and Applied
Mathematics, Philadelphia, ISBN-10: 0898715725.
Tartakovsky, D.M., and S.P. Neuman (1998), Transient flow in bounded randomly
heterogeneous domains: 1. Exact conditional moment equations and recursive
approximations, Water Resour. Res., 34(1), 1-12, doi:10.1029/97WR02118.
Ye, M. (2002), Parallel finite element Laplace transform algorithm for transient flow in
bounded randomly heterogeneous domains, Ph.D. dissertation, Univ. of Ariz., Tucson.
Ye, M., S.P. Neuman, A. Guadagnini, and D.M. Tartakovsky (2004), Nonlocal and localized
analyses of conditional mean transient flow in bounded, randomly heterogeneous
porous media, Water Resour. Res., 40(5), W05104, doi:10.1029/2003WR00209.
van Leeuwen, P.J. (1999), Comment on “Data Assimilatin Using an Ensemble Kalman Filter
Technique”, Mon. Weather Rev., 127(6), 1374-1377, doi:10.1175/1520-
0493(1999)127<1374:CODAUA>2.0.CO;2.
Vrugt, J., C.G.H. Diks, H.V. Gupta, W. Bouten, and J.M. Verstraten (2005), Improved
treatment of uncertainty in hydrologic modeling: combining the strengths of global
optimization and data assimilation, Water Resour. Res., 41(1), W01017,
doi:10.1029/2004WR003059.
121
Wang, X., and C.H. Bishop (2003), A comparison of breeding and ensemble transform
Kalman filter ensemble forecast schemes, J. Atmos. Sci., 60(9), 1140-1158,
doi:10.1175/1520-0469(2003)060<1140:ACOBAE>2.0.CO;2.
Wang, X., T.A. Hamill, J.S. Whitaker, and C.H. Bishop (2007), A comparison of hybrid
ensemble transform Kalman Filter-optimum interpolation and ensemble square root
filter analysis scheme, Mon. Weather Rev., 135(3), 1055-1076, doi:
doi:10.1175/MWR3307.1.
Wen, X.-H., and W.H. Chen (2007), Some practical issues on real-time reservoir updating
using ensemble Kalman filter, SPE J., 12(2), 156-166, doi:10.118/111571-PA.
Woodbury, A.D., and T.J. Ulrych (2000), A full-Bayesian approach to the groundwater
inverse problem for steady state flow, Water Resour. Res., 36(8), 2081 - 2093,
doi:10.1029/2000WR900086.
Xu, T., J.J. Gómez-Hernández, H. Zhou, and L. Li (2013), The power of transient
piezometric head data in inverse modeling: an application of the localized normal-score
EnKF with covariance inflation in a heterogeneous bimodal hydraulic conductivity
field, Adv. Water Resour., 54, 100-118, doi:10.1016/j.advwatres.2013.01.006.
Zeng, L., H. Chang, and D. Zhang (2011), A Probabilistic Collocation-Based Kalman Filter
for History Matching, SPE J., 16(2), 294-306, doi:10.2118/140737-PA.
Zeng, L., L. Shi, D. Zhang, and L. Wu (2012), A sparse grid Bayesian method for
contaminant source identification, Adv. Water Resour., 37, 1-9,
doi:10.1016/j.advwatres.2011.09.011.
Zhang, D., Z. Lu, and Y. Chen (2007), Dynamic Reservoir Data Assimilation With an
Efficient, Dimension-Reduced Kalman Filter, SPE J., 12(1), 108-117,
doi:10.2118/95277-PA.
123
Acknowledgements
This work was supported in part by a grant provided by Eni S.p.a. - Exploration and
Production Division through the project “History Matching per la caratterizzazione delle
facies di reservoir mediante tecniche di inversione stocastica”. Schlumberger is
acknowledged for allowing the use of the software ECLIPSE for research purposes.
125
Estratto in italiano
La corretta modellazione dei fenomeni di flusso e trasporto nei mezzi porosi è di
estrema rilevanza per affrontare e risolvere efficacemente una notevole quantità di problemi
ingegneristici ed ambientali. Tra le applicazioni più rilevanti si cita, a titolo di esempio,
l’approvvigionamento idrico per scopi civili e industriali, i trattamenti di bonifica di suoli e
acquiferi inquinati, la protezione dei pozzi di emungimento, la necessità di migliorare
l’efficienza legata all’estrazione degli idrocarubri presenti nei giacimenti petroliferi per far
fronte alla crescente domanda di risorse energetiche contestualmente alla diminuzione delle
disponibilità naturali, la quantificazione del rischio connesso allo stoccaggio di materiale
radioattivo nel sottosuolo.
La realizzazione di un modello di flusso e trasporto sotterraneo richiede la definizione
della distribuzione spaziale dei parametri contenuti nelle equazioni che governano il
fenomeno in esame, tipicamente costituiti da permeabilità e porosità. Tale distribuzione è
generalmente caratterizzata da un’elevata incertezza. In questo contesto, si rende necessario
l’utilizzo di un modello probabilistico, in cui i parametri del sistema sono trattati come dei
processi stocastici eventualmente condizionati sulla base delle misure disponibili tramite
tecniche di modellazione inversa o di assimilazione dati.
Tra le numerose metodologie descritte in letteratura, in questo lavoro si è utilizzata
una tecnica denominata "Ensemble Kalman Filter" (EnKF). EnKF permette di assimilare dati
di diversa natura in modelli dinamici in maniera sequenziale, quando le misure vengono
acquisite. EnKF consente di operare con modelli di elevata dimensione spaziale e
caratterizzati da dinamiche non-lineari tipiche della modellazione del flusso e trasporto
sotterraneo.
Nonostante la sua crescente popolarità, ci sono diversi aspetti negativi che limitano lo
spettro di applicabilità di EnKF. Tradizionalmente EnKF richiede l’utilizzo di un approccio
126
Monte Carlo (MC) per generare un insieme di realizzazioni del processo stocastico in esame.
Un fattore critico è costituito dal numero di realizzazioni MC utilizzato per approssimare i
momenti statistici delle variabili di interesse. Se infatti da un lato è necessario utilizzare un
elevato numero di realizzazioni per ottenere una buona stima di medie e covarianze delle
quantità analizzate, l'onere computazionale richiesto può impedirne l'utilizzo in applicazioni
di interesse pratico. Inoltre EnKF fornisce risultati ottimali solo se le variabili del sistema
(i.e., parametri del modello e variabili di stato) possono essere descritte da una distribuzione
Gaussiana multinormale mentre la realtà del sottosuolo è generalmente complessa e modelli
realistici devono considerare la presenza di diverse facies, ovvero di distinte unità geologiche
ciascuna caratterizzata da peculiari proprietà mineralogiche e petrofisiche. La distribuzione
spaziale delle facies, tipicamente descritta utilizzando funzioni indicatrici, influenza
considerevolmente il comportamento dinamico del sistema. A causa della natura non-
Gaussiana delle funzioni indicatrici, utilizzare EnKF per aggiornare la distribuzione spaziale
delle facies è, nella maggior parte dei casi, fonte di errore.
I principali obiettivi di questa tesi sono: (a) integrare all’interno di procedure EnKF le
equazioni stocastiche dei momenti del flusso sotterraneo per ovviare all'utilizzo di tecniche
MC; e (b) sviluppare un algoritmo in grado di condizionare la distribuzione spaziale delle
facies e delle loro proprietà petrofisiche utilizzando dati di produzione.
Nella prima parte della tesi si propone di ovviare all’utilizzo di simulazioni MC
risolvendo direttamente le equazioni stocastiche del flusso sotterraneo che governano
l’evoluzione spazio-temporale dei primi due momenti statistici (medie e covarianze) dei
carichi idraulici (h) e dei flussi. La nuova metodologia sviluppata è stata testata su un
problema sintetico di flusso sotterraneo in un acquifero eterogeneo, confinato, soggetto a
condizioni al contorno di tipo misto e in presenza di un pozzo di emungimento. Si è
analizzato l’effetto (a) degli errori di misura dei dati di log-conduttività (Y), (b) dell'intervallo
127
temporale dei dati di carico disponibili, (c) delle caratteristiche statistiche del campo Y, sulla
qualità dei campi calibrati di Y e h e sulle rispettive incertezze di stima. Si sono inoltre
confrontate le prestazioni e l’accuratezza dei risultati ottenuti con la nuova procedura e con la
tecnica tradizionale basata su simulazioni MC. Si è dimostrato che l’utilizzo delle equazioni
dei momenti all’interno di EnKF permette di stimare efficientemente ed in tempo reale i
parametri del modello e le variabili di stato evitando l’insorgere di problemi tipicamente
associati all’utilizzo dell’approccio basato sulle simulazioni MC. I risultati ottenuti
confermano che utilizzando solo poche centinaia di realizzazioni MC, come viene spesso
effettuato in letteratura, insorgono problemi di filter inbreeding, impattando negativamente la
qualità delle stime di Y e h e delle loro rispettive incertezze di stima.
Nella seconda parte del lavoro si illustra un nuovo algoritmo di assimilazione che
permette di aggiornare la distribuzione delle facies e delle proprietà petrofisiche di un
insieme di realizzazioni del sistema sotterraneo caratterizzato da una complessa architettura
geologica. La distribuzione delle facies è descritta utilizzando un modello Markov Mesh
(MM) accoppiato con una tecnologia multi-griglia, secondo cui i pattern geologici vengono
inizialmente riprodotti ad una scala più grande e ridefiniti successivamente con risoluzione
maggiore. Questa tecnica permette di riprodurre in dettaglio geometrie complesse e
caratterizzate da correlazioni spaziali distribuite su diverse scale. L’algoritmo di
assimilazione si fonda sull’integrazione del modello MM all’interno dello schema dell’EnKF.
La metodologia è stata testata in un modello sintetico di giacimento caratterizzato dalla
presenza di due facies che rappresentano un sistema fluviale meandriforme. I risultati sono
inoltre stati confrontati con quelli ottenuti applicando un approccio EnKF standard. Si
dimostra che lo schema di assimilazione proposto fornisce un insieme di campi di
permeabilità e di facies nei quali è mantenuta l’architettura geologica del modello di
riferimento, contrariamente a quanto ottenuto utilizzando EnKF tradizionale. La capacità
128
predittiva dei modelli di giacimento calibrati è stata testata attraverso due casi in cui, al
termine del periodo di assimilazione, si sono considerati due scenari caratterizzati da
differenti configurazioni di flusso. Nel primo scenario viene mantenuta la stessa
configurazione adottata durante il periodo di assimilazione, mentre nel secondo studio si
investigano gli effetti prodotti dalla presenza di due pozzi addizionali che diventano operativi
dopo l’assimilazione dei dati. Nel primo scenario, sia l'approccio tradizionale sia quello
proposto in questo lavoro forniscono una buona previsione dei dati di produzione anche se
l'incertezza di stima ottenuta con la nuova metodologia diminuisce. L’analisi condotta nel
secondo scenario ha invece evidenziato una sensibile differenza tra le prestazioni dei due
metodi di assimilazione e ha confermato la netta superiorità dell’algoritmo proposto rispetto
all’EnKF standard nel fornire una previsione dei dati di produzione concorde con il modello
sintetico di riferimento.
Desidero ringraziare prima di tutto Monica e Alberto per il loro costante aiuto durante
questi anni.
Un ringraziamento anche a Ernesto e Laura per il loro supporto e per la possibilità di aver
collaborato con loro.
Ringrazio infine tutti i miei amici e la mia famiglia per il sostegno che ho sempre ricevuto.