data assimilation for complex subsurface flow fields · 2014-05-13 · 2014 politecnico di milano...

2014

POLITECNICO DI MILANO

Department of Civil and Environmental Engineering

Doctoral Programme in Environmental and Infrastructure Engineering

XXVI Cycle

DATA ASSIMILATION FOR COMPLEX SUBSURFACE

FLOW FIELDS

Marco PANZERI

Tutor: Prof. Alberto GUADAGNINI

Advisor: Prof. Monica RIVA

Co-advisor: Dr. Ernesto Luigi DELLA ROSSA

The Chair of the Doctoral Programme: Prof. Alberto GUADAGNINI

2014

POLITECNICO DI MILANO

Department of Civil and Environmental Engineering

Doctoral Programme in Environmental and Infrastructure Engineering

XXVI Cycle

DATA ASSIMILATION FOR COMPLEX SUBSURFACE

FLOW FIELDS

Doctoral dissertation of:

Marco PANZERI ________________________

Tutor:

Prof. Alberto GUADAGNINI ________________________

Advisor:

Prof. Monica RIVA ________________________

Co-advisor:

Dr. Ernesto Luigi DELLA ROSSA ________________________

The Chair of the Doctoral Programme:

Prof. Alberto GUADAGNINI ________________________

TABLE OF CONTENTS

Abstract ..................................................................................................................................1

Chapter 1. Introduction ....................................................................................................5

1.1 Background ..................................................................................................................5

1.2 Objectives and Outline ...............................................................................................13

Chapter 2. Data assimilation with the Kalman Filter ..........................................17

2.1 The filtering problem ..................................................................................................17

2.2 Forward step ..............................................................................................................19

2.3 Analysis step ...............................................................................................................20

Chapter 3. Kalman Filter coupled with stochastic moment equations of

transient groundwater flow ...........................................................................................25

3.1 Extended transient moment equations of groundwater flow ......................................25

3.2 Data assimilation of groundwater flow data via KF: MC-based EnKF and ME-

based approach ............................................................................................................30

3.3 Exploratory synthetic example of data assimilation and parameter estimation .........33

3.4 Comparison between MC-based EnKF and ME-based approach ..............................49

3.5 Conclusions.................................................................................................................66

Chapter 4. EnKF with complex geology ...................................................................69

4.1 Markov Mesh (MM) model ........................................................................................69

4.2 Theoretical formulation ..............................................................................................75

4.3 Synthetic example .......................................................................................................84

4.4 Conclusions...............................................................................................................104

Appendix A........................................................................................................................107

References ..........................................................................................................................115

Acknowledgements .........................................................................................................123

Estratto in italiano ..........................................................................................................125

1

Abstract

Proper modeling of subsurface flow and transport processes is key to the solution of a

wide range of engineering and environmental problems. Relevant applications include, e.g.,

the supply of fresh water for civil and industrial activities, the remediation of contaminated

aquifers or the protection of groundwater sources, the need for enhancing the recovery

efficiency of hydrocarbon reservoirs to face the ever increasing demand for energy resources,

the quantification of the risk linked to the geological disposals of nuclear wastes. Building a

subsurface flow model requires defining the spatial distribution of the input parameters

embedded in the underlying governing equations, such as permeability and porosity. Despite

the key role played by these petrophysical properties when modeling aquifer and oil

reservoirs, our knowledge of the way they are distributed within a domain of interest is scarce

in practical applications and often characterized by a high degree of uncertainty.

A well-established approach to tackle this problem is to work within a stochastic

framework, in which the permeability and the porosity fields are treated as random processes

of space. An inverse and/or data assimilation modeling framework is then employed for

conditioning these spatial distributions relying on either direct or surrogate measurements.

Among the various available inversion (or data assimilation) techniques, we focus on the

Ensemble Kalman Filter (EnKF) approach. EnKF is a data assimilation technique which is

employed to incorporate data into physical system models sequentially and as soon as they

are collected. EnKF is appropriate for large and nonlinear models of the kind required for

realistic subsurface fluid flow simulations and has traditionally entailed the use of a

(numerical) Monte Carlo (MC) approach to generate a collection of interdependent random

model representations.

Despite its increasing popularity, there are several drawbacks that undermine the

range of scenarios under which EnKF is applicable. A critical factor is the size of the

2

ensemble, i.e., the number of MC simulations employed for (ensemble) moment evaluation.

Whereas to estimate mean and covariance accurately requires many simulations, working

with large ensemble sizes and assessing MC convergence is computationally demanding.

Another common problem is that EnKF performs optimally only if the system variables (i.e.,

model parameters and state variables) can be described by a joint Gaussian distribution.

Modern reservoir models require to explicitly take into account the spatial distribution of

facies, which can be defined as distinctive and non-overlapping units forming the internal

architecture of the host rock system and which are associated with given attributes such as

porosity, permeability, mineralogy. Demarcation of diverse facies in a reservoir model is

usually accomplished through indicator functions. Due to the typically non Gaussian nature

of the latter, use of EnKF to update complex reservoir models can be fraught with severe

challenges.

The main objectives of this work are: (a) to couple EnKF with stochastic moment

equations (MEs) of transient groundwater flow to circumvent and alleviate problems related

to the finiteness of the ensemble employed in the traditional MC-based EnKF; and (b) to

develop an assimilation algorithm that is conducive to conditioning on a set of measured

production data the spatial distribution of lithofacies and of the associated petrophysical

properties for a collection of hydrocarbon reservoirs.

We propose to circumvent the need for MC through a direct solution of nonlocal

(integrodifferential) stochastic MEs that govern the space-time evolution of conditional

ensemble means (statistical expectations) and covariances of hydraulic heads and fluxes. The

purpose is to combine an approximate form of the stochastic MEs with EnKF in a way that

allows sequential updating of parameters and system states without a need for

computationally intensive MC analyses. We explore the resulting combined algorithm on

synthetic problems of two-dimensional transient groundwater flow toward a well pumping

3

water from a randomly heterogeneous confined aquifer subject to prescribed boundary

conditions. We investigate the effect of the error variances linked to available log-

conductivity data and the impact of assimilating hydraulic heads during the transient or the

pseudo-steady state regime on the quality of the calibrated mean of log-conductivity and head

fields as well as on the associated estimation variance. We also compare the performances

and accuracies of our ME- and MC-based EnKF on synthetic problems differing from each

other in the variance and (integral) autocorrelation scale of log-conductivity random fields.

We analyze the impact of the number of realizations employed in the MC-based EnKF and

the occurrence of filter inbreeding in the assimilations. We show that embedding MEs in the

EnKF scheme allows for computationally efficient real time estimation of system states and

model parameters avoiding the drawbacks which are commonly encountered in traditional

MC-based applications of EnKF. Our results confirm that a few hundred MC simulations are

not enough to overcome filter inbreeding issues, which have a negative impact on the quality

of log-conductivity estimates as well as on the predicted heads and the associated estimation

variances. Contrariwise, ME-based EnKF obviates the need for repeated simulations and is

demonstrated to be free of inbreeding issues.

We further illustrate a novel data assimilation scheme conducive to updating both

facies and petrophysical properties of a reservoir model set characterized by complex geology

architecture. The spatial distribution of facies is treated by means of a Markov Mesh (MM)

model coupled with a multi-grid approach, according to which geological patterns are

initially reproduced at a coarse scale and are subsequently generated on grids with increasing

resolution. This allows reproducing detailed facies geometries and spatial patterns distributed

on multiple scales. The assimilation algorithm is developed within the context of a history

matching procedure and is based on the integration of the MM model within the EnKF

workflow. We test the methodology by way of a two-dimensional synthetic reservoir model

4

in the presence of two distinct facies and representing a complex meandering channel system.

We show that the proposed inversion scheme is conducive to an updated collection of facies

and log-permeability fields which maintain the geological architecture displayed by the

reference model, as opposed to the standard EnKF. We test the prediction ability of the

realizations obtained through our procedure by means of two forecast scenarios, in which

diverse flow configurations are considered following the latest assimilation time. In our first

scenario, the same flow setting imposed in the course of the assimilation time is maintained

also during the additional simulation period. A second study is performed by considering the

presence of two additional wells that become operative after the assimilation period. The

approaches tested yield a good estimation of the target production values during the first

scenario, the predictions provided by standard EnKF being characterized by the highest

degree of uncertainty. The performances of the two methodologies are markedly different

when considering the second scenario where our proposed algorithm outperforms the

standard EnKF by providing a superior match between the reference and the predicted

production curves.

5

1. Introduction

1.1 Background

Our ability of properly modeling subsurface flow and transport phenomena upon

making use of diverse information content associated with the often limited amount of data

available has a considerable impact on several engineering, environmental and energy

applications. These include, e.g., the supply of fresh water for civil and industrial activities,

the remediation of contaminated aquifers or the protection of groundwater sources, the need

for enhancing the recovery efficiency in hydrocarbon reservoirs to face the ever increasing

demand for energy resources, the quantification of the risk linked to the geological disposals

of nuclear wastes.

The motion of fluids and contaminants in sedimentary aquifers and fractured rocks is

strongly influenced by the spatial distribution of the physical properties of the geological

media, such as permeability and porosity, which are often characterized by a high degree of

spatial heterogeneity. Despite the key role played by these properties in modeling aquifers

and oil reservoirs, in practical applications our knowledge of the way they are distributed

within a domain of interest is scarce and often characterized by a high degree of uncertainty.

For these reasons, providing reliable predictions of pressure, saturation or solute

concentration values at a given location of the considered domain and taking advantage of

diverse types of information to quantify the uncertainty associated with such predictions are

often complex tasks.

In the last decades, several techniques have been developed for estimating the spatial

distribution of petrophysical properties of underground reservoirs on the basis of either direct

or indirect/surrogate measurements, with the objective of improving our ability of predicting

the system response to anthropogenic or natural forcing terms. These techniques are often

referred to as inverse modeling approaches in the groundwater hydrology community or

6

history matching in the reservoir engineering literature. The quantification of the uncertainty

associated with a model prediction requires the adoption of a probabilistic approach. In this

context the model parameters are treated as spatially correlated random fields and the

resulting governing differential equations become stochastic, thus allowing the quantification

of the space-time evolution of the probability density function of a target state variable. There

are several flavors of inverse modeling procedures which can be employed in the context of

groundwater aquifers and petroleum reservoir modeling. In recent years hydrogeologists and

petroleum reservoir engineers have devoted increasing attention to the development of data

assimilation techniques based on the concepts embedded in the Kalman Filter (KF) approach.

Kalman Filter (KF) is a well-known data assimilation technique used to incorporate

data into physical system models sequentially and as they are collected. It was originally

introduced by Kalman [Kalman, 1960] to integrate data corrupted by white Gaussian noise in

linear dynamic models the outputs of which include additive noise which is also modeled as a

Gaussian random variable. KF entails two steps: (a) a forward modeling (or forecasting) step

that propagates system states in time until new measurements become available, and (b) an

updating step that modifies/updates system states optimally in real time on the basis of such

measurements. Some modern versions of KF update system states (e.g., hydraulic heads or

pressures) and parameters (e.g., permeabilities) jointly based on measurements of one or both

variables [e.g., Vrugt et al., 2005].

Gelb [1974] proposed an Extended Kalman Filter (EKF) to deal with nonlinear

system models. EKF linearizes the model and propagates the first two statistical moments of

target model variables in time. As such it is not suitable for strongly non-linear systems of the

kind encountered in the context of groundwater flow or transport in any but mildly

heterogeneous media. EKF further requires large amounts of computer storage which limits

its use to relatively small-size problems. Evensen [1994] and Burgers et al. [1998] proposed

7

to overcome these limitations through the use of Monte Carlo (MC) simulation. Their so-

called Ensemble Kalman Filter (EnKF) approach utilizes sample mean values and

covariances to perform the updating. The development of sensors and measuring devices

capable of recording massive amounts of data in real time has rendered EnKF popular among

hydrologists, climate modelers and petroleum reservoir engineers [Oliver and Chen, 2011;

Liu et al., 2012]; assimilating such rich data sets in batch rather than sequential mode, as is

common with classical inverse frameworks such as Maximum Likelihood, would not be

feasible. Applications of EnKF to groundwater and multiphase flow problems include the

pioneering works of McLaughlin [2002] and Naevdal et al. [2005]; recent reviews are

presented by Aanonsen et al. [2009], Oliver and Chen [2011] and Liu et al. [2012].

A crucial factor affecting EnKF is the size of the "ensemble", i.e., the number (NMC)

of MC simulations (sample size) employed for moment evaluation. Whereas to estimate

mean and covariance accurately requires many simulations, working with large NMC tends to

be computationally demanding. Chen and Zhang [2006] showed that a few hundred NMC

appear to provide accurate estimates of mean log-conductivity fields. They pointed out,

however, that obtaining covariance estimates of comparable accuracy would require many

more simulations, a task they had not carried through. Efforts to reduce the dimensionality of

the problem through orthogonal decomposition of state variables have been reported by

Zhang et al. [2007] and Zeng et al. [2011, 2012].

Small sample sizes give rise to filter inbreeding [Oliver and Chen, 2011] whereby

EnKF systematically understates parameter and system state estimation errors; rather than

stabilizing as they should, these errors appear to continue decreasing indefinitely with time,

giving a false impression that the quality of the parameter and state estimates likewise keeps

improving. There is no general theory to assess, a priori, the impact that the number NMC of

MC simulations would have on the accuracy of moment estimates. We do know, however,

8

that the sample mean of a random variable converges to the population mean at a rate

proportional to 1 NMC , the sample width of a normal variable's confidence interval

converges at a rate proportional to 1/NMC for large NMC, and this latter rate is modulated by

Chebyshev's inequality as detailed in Ballio and Guadagnini [2004] and references therein.

This is enough to conclude that increasing NMC by a factor of a few hundred, as is often

done, would likely not lead to marked improvements in accuracy. A practical solution is to

continue running MC simulations till the sample mean and variance stabilize or, if computer

time is at a premium, till their rates of change slow down markedly.

van Leeuwen [1999] showed theoretically that filter inbreeding is caused by (a)

updating a given set ("ensemble" or collection) of model output realizations with a gain

computed on the basis of this same set and (b) spurious covariances associated with gains

based on finite numbers NMC of realizations. Remedies suggested in the literature are

generally ad hoc. Houtkamer and Mitchell [1998] proposed splitting the set of MC runs into

two groups and updating each subset with a Kalman gain obtained from the other subset.

Hendricks Franssen and Kinzelbach [2008] proposed alleviating the adverse effects of filter

inbreeding by (a) dampening the amplitude of log-conductivity fluctuations, (b) correcting

the predicted covariance matrix on the basis of a comparison between the predicted ensemble

variance and the average absolute error at measurement locations, and (c) performing a large

number of realizations (in their case NMC = 1000) during the first simulation step and a

subset of realizations (NMC = 100) thereafter; a procedure similar to the latter was also

suggested in Wen and Chen [2007]. To select an optimal subset one would minimize some

measure of differences between cumulative sample distributions of hydraulic heads obtained

in the first step with (say) NMC = 1000 and NMC = 100. This, however, brings about an

artificial reduction in variance, as shown by Hendricks Franssen and Kinzelbach [2008].

Hendricks Franssen and Kinzelbach [2008] obtained best results with a combination of all

9

three techniques. Hendricks Franssen et al. [2011] observed filter inbreeding when analyzing

variably saturated flow through a randomly heterogeneous porous medium with NMC = 100

even after dampening log-conductivity fluctuations by a factor of 10. Several authors [e.g.,

Wang et al., 2007; Anderson, 2007; Liang et al., 2012; Xu et al., 2013] have noted a reduction

in filter inbreeding effects through covariance localization and covariance inflation.

Covariance localization is achieved upon multiplying each element of the updated state

covariance matrix by an appropriate localization function to reduce the effect of spurious

correlations [Houtekamer and Michell, 1998; Furrer and Bengtsson, 2007]. In the covariance

inflation methods, the forecast ensemble is inflated through multiplication of each state by a

constant or variable factor [e.g., Wang and Bishop, 2003; Liang et al., 2012; Xu et al., 2013].

Another common problem encountered in the application of the EnKF is related to the

assumption that the system variables (i.e., model parameters and state variables) can be

described by a joint Gaussian distribution. Most current reservoir models require to explicitly

take into account the spatial distribution of facies, which can be defined as distinctive and

non-overlapping units of the host rock system with specified characteristics such as porosity,

permeability, mineralogy. Like petrophysical properties, facies can often be inferred from

well logs at well locations. As is often the case, their spatial distribution between wells is

highly uncertain. A common procedure to distinguish between diverse facies in a reservoir

model is to employ indicator functions, which by their nature cannot be represented by

Gaussian distributions. This implies that using EnKF to update these types of complex

reservoir models can be problematic.

The common approach which is found in the literature is based on the transformation

of the diverse facies types into intermediate random fields that are described by Gaussian

distributions. Liu and Oliver [2005a, 2005b] used a transformation based on the truncated

pluri-Gaussian method [Le Loc'h and Galli, 1997] and focused on the estimation of the

10

boundaries between the diverse facies. They did not consider within-facies variability of

attributes (i.e., porosity and permeability) and assigned a deterministic value of permeability

and porosity to each facies type (thus, considering only across-facies variability). They

adopted two Gaussian fields and three thresholds to model the spatial distribution of three

geologic units. The key point of the work of Liu and Oliver [2005a] is to consider two

truncated Gaussian fields with fixed truncation thresholds as static parameters (i.e.,

parameters that do not vary with time during a flow simulation, such as permeability and

porosity in the absence of consolidation processes or geochemical reactions) in the state

vector to be estimated through inversion. This would overcome the problem of updating a

discrete variable (facies distribution in the domain of interest) by representing the latter by

means of two continuously distributed random processes. They applied the truncated pluri-

Gaussian method to match both hard (i.e., direct facies observations) and production data.

Disadvantages of the truncated pluri-Gaussian method include the difficulty in determining

truncation maps and structural properties of the Gaussian random fields that are suitable to

describe the internal architecture of highly complex reservoirs in terms of a small number of

truncation parameters. Moreover, although the underlying model parameters are multivariate

Gaussian, the relationship between observations and model parameters is highly non-linear

and causes the appearance of apparently unphysical updates during the assimilation step. For

this reason, iterative methods, where only the static parameters are updated and the system

states are obtained by re-starting the flow simulation from the previous assimilation time, are

often employed to alleviate these drawbacks [Aanonsen et al., 2009].

Moreno and Aanonsen [2007] proposed to combine the level set method with the

EnKF. The level set method relies on a suitable level-set function which is an implicit

representation of a given surface that is defined as the set of points at which the function

vanishes. More specifically, if a given facies in a background medium is defined on a certain

11

domain (support), the level set function is defined as the signed distance to the domain

boundary. This distance is positive inside and negative outside of the boundary separating the

facies from the background medium. The spatial dynamics of the level set function which is

deforming during data acquisition is modeled by the convection equation. Moreno and

Aanonsen [2007] assumed that the velocity field governing the evolution of the level set was

defined as a Gaussian random field and included it in the state vector to be estimated/updated.

Chang et al. [2010] improved the application of level set functions to EnKF by employing a

parameterization based on the concept of representing nodes (i.e., these are also called master

points or pilot points in the literature). Contrary to Moreno and Aanonsen [2007], they

considered only the values of the level set function at a set of so-called representing nodes as

variables of the state vector to be estimated. The values of the level set function at grid nodes

different from the selected representing nodes are obtained by linear interpolation. This

allowed these authors to alleviate the non-uniqueness associated with the identifiability of the

level set function. If the distance between representing nodes is properly chosen, these are

uncorrelated or weakly correlated and can be treated as independent from each other. The

authors applied their methodology to diverse synthetic case studies with two or three different

facies units, where the rock properties such as porosity and permeability were assumed as

constant within the same unit. Although the work of Chang et al. [2010] has introduced

important improvements in the application of the level set method to EnKF, it is not clear

how level set methodologies enables one to capture complex geological constraints of the

kind required for reproducing realistic facies geometries.

Jafarpour and McLaughlin [2008] introduced the use of the Discrete Cosine

Transform (DCT) method to history matching applications in reservoir models with complex

geology. The Discrete Cosine Transform (DCT) is a Fourier-related transform. It uses

orthonormal cosine basis functions to represent an image that, in the case we examine, can

12

correspond to the spatial distribution of state variables or model parameters. This method has

been proposed by Ahmed et al. [1974] for signal decorrelation and has been used in the

context of several applications in other fields (mostly in the context of audio and image

compression, e.g., Rao et al. [1990]). The powerful compression property of DCT allows for

retaining only a few basis functions in comparison to the total number of grid nodes. The

authors modified the EnKF scheme in such a way that the coefficients of the retained cosine

basis functions representing the spatial distribution of the state variables and model

parameters are updated to describe the distribution of the target quantities. Using DCT

parameterization contributes to dramatically reduce the dimension of the state vector and can

also mitigate the loss of structural continuity that can be observed in the context of other

approaches when the updating step is performed with reference to each node of the numerical

grid. This result can be achieved because DCT emphasizes large scale (associated with low

frequency component) rather than small scale features. One of the synthetic test cases

presented by Jafarpour and McLaughlin [2008] was a two-facies reservoir model. In this

scenario the methodology was conducive to a correct identification of large scale structures,

in the shape of elongated continuous channels, embedded in the reference fields. These

results highlighted the ability of this parameterization method to account for relatively

complex geological structures within a reservoir model.

Other proposed schemes include the work of Dovera and Della Rossa [2011], where a

composite medium is described through a multimodal density of system parameters. The

multimodality of prior parameter fields is taken into account through the theory of Gaussian

mixture (GM) models. Gaussian mixture models are based on the idea that the probability

density function (pdf) of the model parameters can be parametrically described as weighted

sums of Gaussian pdfs. The authors derived a novel set of EnKF updating equations and

coupled it with the expectation-maximization (EM) method for the evaluation of the weights

13

of the prior GM. The authors compared the performance of their method against the

traditional EnKF formulation by means of a synthetic case and concluded that their scheme

allowed obtaining an improved evaluation of the posterior distribution of the forecast

production.

Jafarpour and Khodabakhshi [2011] proposed a Probability Conditioning Method

(PCM) for conditioning the facies distribution of a collection of reservoir models in which a

deterministic value of permeability and porosity is assigned to each facies type. They

employed the EnKF scheme to update the sample mean values of the log-permeability field.

The updated mean values are then used to infer information about the distribution of facies

probabilities through the PCM. This method consists of converting a value of log-

permeability mean to a value of probability of facies occurrence through a simple linear

mapping function. The updated probability map is then combined with the snesim algorithm

[Strebelle, 2002] to simulate a new collection of facies realizations. The realizations are

therefore conditioned on the updated probability maps as well as on the production data. The

updated saturation and pressure fields are obtained by re-running the simulation from the

initial time to ensure consistency with the updated permeability fields. This methodology was

used to successfully condition the categorical permeability fields in an ensemble of synthetic

reservoirs with two or three different facies, and was demonstrated to outperform the EnKF

in the quality of the calibrated models and in the accuracy of the model forecast.

1.2 Objectives and outline

The main objectives of this work are: (a) to couple the updating step of EnKF with the

stochastic moment equations of transient groundwater flow to circumvent and alleviate

problems related to the finiteness of the ensemble employed in the traditional MC-based

EnKF, and (b) to develop an assimilation algorithm that is conducive to conditioning the

spatial distribution of the facies and of the associated petrophysical properties of a collection

14

of hydrocarbon reservoirs on a set of measured production data. The dissertation is structured

according to the objectives outlined above.

In Chapter 2, we cast the updating equations of the Kalman Filter in a Bayesian

context. The derivation follows the work of Cohn [1997] and provides the theoretical ground

on which all KF-based assimilation techniques are based. In this Chapter it is shown that

assuming that the prior distribution of the model variables and the distribution of the

measurement errors be multivariate normal leads to a Gaussian posterior distribution of the

model variables conditioned on the measured data. The mean vector and the covariance

matrix of the posterior distribution are precisely those which are determined through the

updating equations of the KF.

In Chapter 3, we propose to circumvent the need for MC through a direct solution of

nonlocal (integrodifferential) stochastic MEs that govern the space-time evolution of

conditional ensemble means (statistical expectations) and covariances of hydraulic heads and

fluxes [Tartakovsky and Neuman, 1998; Ye et al., 2004]. Such MEs have been used

successfully to analyze steady state and transient flows in randomly heterogeneous media

conditional on measured values of medium properties. Second-order approximations of these

equations have yielded accurate predictions of complex flows in heterogeneous media with

unconditional variances of (natural) log-hydraulic conductivity as high as 4.0 [Guadagnini

and Neuman, 1999].

Hernandez et al. [2003, 2006] and Riva et al. [2009] developed batch geostatistical

inverse algorithms that enable one to condition flow predictions further on measured values

of state variables (heads and fluxes) for steady state and transient flows, respectively. A field

application is described in the work of Bianchi Janetti et al. [2010]. This approach yields

Maximum Likelihood (ML) estimates of hydraulic conductivity, variogram parameters and

measurement error statistics. Parameter estimation entails the minimization of a log-

15

likelihood function which in turn requires the computation of a sensitivity matrix. The latter

step tends to be computationally intensive, especially in the case of large parameter vectors

[e.g., Alcolea et al., 2006; Riva et al., 2010].

Our purpose is to combine approximate forms of nonlocal, conditional stochastic MEs

with EnKF in a way that allows sequential updating of parameters and system states without

a need for computationally intensive ML or MC analyses. We extend the ME formulation of

Ye et al. [2004] in a way that renders it compatible with EnKF. We explore the resulting

combined algorithm on synthetic problems of two-dimensional transient groundwater flow

toward a well pumping water from a randomly heterogeneous confined aquifer subject to

prescribed head and flux boundary conditions. We investigate the effect of the error variances

linked to the measurements of log-conductivities and the impact of assimilating hydraulic

heads during the transient or the pseudo-steady state regime on the quality of the calibrated

mean of log-conductivity and head fields and on the associated estimation variance. In

addition, we compare the performances and accuracies of our ME- and the traditional MC-

based EnKF on synthetic problems differing from each other in the variance and (integral)

autocorrelation scale of the (natural) logarithm of hydraulic conductivities. We analyze the

impact of the number of realizations employed in the MC-based EnKF and the occurrence of

filter inbreeding in the performed assimilations.

In Chapter 4, we illustrate a novel inversion scheme which allows conditioning the

geological and petrophysical properties of a collection of reservoir realizations on the basis of

a set of production data. First, we present a Markov Mesh model [Stien and Kolbjørnsen,

2011] that is used to (a) describe the spatial distribution of the geological properties and (b)

reproduce their complex spatial arrangement. The MM model is coupled with a multi-grid

approach [Kolbjørnsen et al., 2013], according to which the geological patterns are initially

reproduced at a coarse scale, and are subsequently generated on increasingly finer grids. This

16

methodology allows reproducing (geological) patterns distributed on different scales. The

proposed inversion scheme is based on a three step algorithm. First, the EnKF scheme is

employed to update the sample mean of the lithofacies spatial distribution. A new collection

of facies realizations is then generated via a Markov Mesh (MM) model. During this step the

equation used to calculate the conditional probability of occurrence of a given lithotype in

each element of the computational grid ensures that the mean facies distribution obtained at

the previous step is honored. In the third step, the petrophysical properties of each reservoir

model in the collection are updated through a proposed modification of the EnKF scheme.

The cross-covariance between production data and log-permeabilities is estimated

considering the updated spatial distribution of lithofacies computed at the previous step.

Updating of log-permeabilities in a given realization relies on the estimation of the sample

cross-covariance between production data and log-permeabilities associated with a given

reference block in the reservoir upon considering only the members of the collection where

the same facies of the reference element considered in the target model realization occurs.

We test the proposed methodology by way of a two-dimensional synthetic reservoir

model in the presence of two distinct facies and representing a complex meandering channel

system. We analyze the accuracy and computational efficiency of our algorithm and

demonstrate its benefit with respect to the standard EnKF in terms of improved prediction

ability and use of information for the quantification of the uncertainty associated with the

forecast production.

17

2. Data assimilation with the Kalman Filter

In this Chapter we present a derivation of the Kalman Filter (KF) equations in the

context of Bayesian inference theory. The objective is to provide a theoretical framework and

to introduce the basic concepts that will become useful in the following Chapters. The KF

algorithm developed by Kalman [1960] considers a linear model dynamic the output of which

is corrupted with additive Gaussian noise with zero mean and given covariance matrix. Here

we intentionally restrict our discussion to the case of an exact model (i.e., without error) and

focus on techniques that extend the classical KF scheme to models characterized by nonlinear

dynamics of the kind required to model realistic subsurface fluid flow scenarios. In Section

2.1 we define the filtering problem in the framework of data assimilation. The solution of this

problem is achieved by means of two sequential steps, respectively termed forward and

analysis and described in Sections 2.2 and 2.3.

2.1 The filtering problem

The model dynamics describing groundwater and (in general) subsurface multiphase

flow consist of a system of (typically coupled) nonlinear partial differential equations

(PDEs). Let kTy be the vector containing the yN model variables of the system under study

evaluated at time kT at a finite number of discretization nodes (or elements) of a numerical

grid. This vector can include model parameters (i.e., permeability and porosity), state

variables (i.e., pressure, saturation) and production data (i.e., well flow rates, water cut,

bottom hole pressure). Model dynamics can be described through the non-linear operator ,

which yields the solution at time kT , kTy , given the model state at an earlier time 1kT , 1kT y

1k kT T y y (2.1)

18

Uncertainty associated with model parameters renders the system state kTy random, and

allows describing it by means of the probability density function (pdf) kTf y . The time

evolution of tf y within the time interval 1k kT T is governed by a set of stochastic

PDEs associated with (in general) random initial condition 1kTf y .

We introduce the following model governing system observations

k k k kT T T T

dd H y ε (2.2)

where kTd is a vector of size 1dN containing all Nd measurements available at time kT , the

matrix kTH of size d yN N is a linear operator mapping the model variables kT

y into their

observed counterparts and kT

dε is a random vector containing the dN measurement errors.

Typically, kT

dε is assumed to be normally distributed, unbiased and with known covariance

matrix

,1k

d

T

Ndε 0 (2.3)

k k k k kT T T T T

d d d d εεε ε ε ε Σ (2.4)

where ,1dN0 is a vector of size 1dN with all elements equal zero, denotes expectation

and the superscript ‘+’ stands for transpose. A common assumption is that the measurement

error vectors at different times are uncorrelated

,l k

d d

T T

N N

d dε ε 0 for k l (2.5)

The filtering problem in data assimilation is posed as the problem of describing the

evolution in time of the conditional pdf, k kT Tf y D , where the matrix kT

D denotes the set of

all observations available up to time kT , i.e.

1 2, , ,k kT TT T D d d d (2.6)

19

The filtering problem is solved by means of two sequential steps. The forward step (Section

2.2) consists on propagating in time the conditional density available at time 1kT ,

1 1k kT Tf y D , towards the corresponding pdf at time kT , 1k kT T

f y D . The latter is then

conditioned on the measurement vector kTd by means of the analysis step (Section 2.3),

allowing the evaluation of 1 ,k k k k kT T T T Tf f y D d y D .

2.2 Forward step

Suppose one is given the conditional density 1 1k kT Tf y D . The forward step requires

the evaluation of the density 1k kT Tf y D . This entails solving a stochastic PDE within the

time interval 1k kT T and subject to the random initial conditions embodied in

1 1k kT Tf y D . If the model dynamics are linear, as it is assumed in the classical KF, the

evaluation of the mean vector and of the covariance matrix associated with 1k kT Tf y D is

straightforward [Cohn, 1997]. This assumption is generally not valid in the field of

groundwater and multiphase flow in porous media, where the model dynamics can be highly

non-linear. In these cases, the solution of the forward step requires a diverse approach.

A possible strategy is to resort to Monte Carlo (MC) simulation. This technique is

based on representing the pdf 1 1k kT Tf y D through a collection of model realizations.

Propagating each member of the collection within the time interval 1k kT T using the

forward model operator (2.1) allows approximating the pdf 1k kT Tf y D at time kT and

estimating its statistical moments through the corresponding sample moments.

Monte Carlo simulations are not the only available strategy for the solution of the

forward step in the presence of non-linear system dynamics. As will be explored in this

dissertation (Chapter 3), an alternative could be formulating a system of (ensemble) moment

20

equations (MEs) which describe the temporal evolution of the statistical moments (typically,

mean and covariances) of the pdf 1kTtf y D , 1k kT t T . The format of these moment

equations is determined by the model dynamic expressed by (2.1), and should be derived ad

hoc depending on the specific context. This issue will be further explored in Chapter 3, where

a set of approximated equations describing the temporal evolution of the first and second

moment of the target pdf for a groundwater flow model will be presented and embedded in

the KF scheme.

2.3 Analysis step

The objective of the analysis step is to calculate the conditional pdf k kT Tf y D given

the density 1k kT Tf y D . Application of Bayes’ theorem allows writing

1 1

1

1

,,

k k k k k

k k k k k

k k

T T T T T

T T T T T

T T

f ff f

f

y D d D y

y D y D dd D

(2.7)

Since kTd (given kT

y ) depends only on kT

dε , which in turn is independent of 1kT D because of

(2.5), the following simplification holds

1 ,k k k k kT T T T Tf f d D y d y (2.8)

and (2.7) can be rewritten as

1

1

k k k k

k k

k k

T T T T

T T

T T

f ff

f

y D d y

y Dd D

(2.9)

The function 1k kT Tf y D is also termed the forward pdf, and is here denoted as

, kf Tf y . Following the work of [Cohn, 1997], we consider this density to be multivariate

Gaussian so that it can be parameterized through the corresponding mean vector and the

covariance matrix. These are respectively defined as

21

1 ,k k kT T f T y D y (2.10)

, , , , ,k k k k kf T f T f T f T f T

yy

y y y y Σ (2.11)

The density function of , kf T

y is then equal to

11 22, , , , , , ,1

2 exp2

yk k k k k k kNf T f T f T f T f T f T f T

f

yy yyy Σ y y Σ y y (2.12)

where denotes matrix determinant. By virtue of (2.2)-(2.4), the mean vector and the

covariance matrix of the likelihood function k kT Tf d y can be written as

k k k k k k k kT T T T T T T T

dd y H y ε y H y (2.13)

k k k k k k k k k k kT T T T T T T T T T T

d d εε

d d y d d y y ε ε y Σ (2.14)

Since kT

dε is considered to be normally distributed, the density k kT T

f d y is also Gaussian

and given by

11 22 1

2 exp2

dk k k k k k k k k kNT T T T T T T T T T

f

εε εεd y Σ d H y Σ d H y (2.15)

We can then employ (2.2) to define the mean vector and the covariance matrix of the pdf

1k kT Tf d D as

1 1 1 ,k k k k k k k k k k kT T T T T T T T T T f T d

d D H y ε D H y D H y (2.16)

1 1 1k k k k k k kT T T T T T T

d d D d d D D

1, ,k k k k k k k k kT T f T T T T f T T T

d dH y y ε H y y ε D

,k k k kT f T T T

yy εε

H Σ H Σ (2.17)

Since kT

dε and kT

y were assumed to be normally distributed, the density 1k kT Tf d D is also

Gaussian and can be written as

22

1

1 22 ,

2 dk k k k k kNT T T f T T T

f

yy εεd D H Σ H Σ

1

, , ,1exp

2k k k k k k k k k kT T f T T f T T T T T f T

yy εε

d H y H Σ H Σ d H y (2.18)

Substitution of (2.12), (2.15) and (2.18) into (2.9) yields the target posterior density

function, k kT Tf y D , also termed as the updated pdf and denoted as , ku T

f y

11 22, , , , , , ,1

2 exp2

yk k k k k k kNu T u T u T u T u T u T u T

f

yy yyy Σ y y Σ y y (2.19)

Here, the mean vector , ku Ty and the covariance matrix , ku T

yyΣ are evaluated using the

following set of equations

, , ,k k k k k ku T f T T T T f T y y K d H y (2.20)

, ,k k k k

y

u T T T f T

N yy yy

Σ I K H Σ (2.21)

1

, ,k k k k k k kT f T T T f T T T

yy yy εεK Σ H H Σ H Σ (2.22)

where yNI is the identity matrix of size y yN N , and the matrix kT

K is called the Kalman

gain. The complete set of details of the mathematical derivation of (2.20)-(2.22) can be found

in Cohn [1997] or in Tarantola [2005].

The set of equations (2.20)-(2.22) allows defining the updated pdf as a function of the

mean vector and of the covariance matrix of the forward and measurement error density

functions. The updated moments of the target pdf are then used to characterize the initial

conditions in the subsequent forward step, consisting in the evaluation of the density

1k kT Tf y D . When KF is coupled with MC simulation, the pdf of

, kf Ty is approximated

through a collection of model realizations, as discussed in Section 2.2. In this case equations

(2.20) - (2.22) do not enable one to evaluate directly the updated realizations, which are

employed to approximate the posterior pdf at time kT , , ku Tf y , and constitute the initial

23

conditions to the solution of the flow problem during the subsequent forward step. If we

indicate the collection of forward model realizations at time kT as , kf T

iy , 1, ,i NMC

(NMC being the number of Monte Carlo iteration in the sample), then the updated

realizations, , ku T

iy , can be calculated through [Evensen, 1994; Burgers et al., 1998]

, , ,ˆk k k k k ku T f T T T T f T

i i i i y y K d H y 1, ,i NMC (2.23)

Here, kT

id is a randomized measurement vector defined as

,k k kT T T

i i dd d ε 1, ,i NMC (2.24)

where ,kT

idε is a random vector having a Gaussian distribution with zero mean and covariance

matrix kT

εεΣ . In (2.24), the empirical Kalman gain matrix ˆ kT

K , given by

1

, ,ˆ ˆ ˆ ˆk k k k k k kT f T T T f T T T

yy yy εεK Σ H H Σ H Σ (2.25)

is evaluated employing the empirical covariance matrices, defined as

, ,

1

1k k

NMCf T f T

i

iNMC

y y (2.26)

, , , , ,

1

1ˆ1

k k k k k

NMCf T f T f T f T f T

i i

iNMC

yyΣ y y y y (2.27)

1

1k k

NMCT T

i

iNMC

d d (2.28)

1

1ˆ1

k k k k k

NMCT T T T T

i i

iNMC

εεΣ d d d d (2.29)

The system (2.24)-(2.30) yields the updating equations used in the Ensemble Kalman Filter

(EnKF).

Equations (2.24) - (2.30) ensure that the elements of the collection , ku T

iy , 1, ,i NMC are

realizations of the posterior distribution defined in (2.19). Evaluating the sample mean, , ku T

y ,

and the sample covariance matrix, ,ˆ ku T

yyΣ , of the updated model realizations

24

, , ,ˆk k k k k ku T f T T T T f T y y K d H y (2.30)

, , , , ,

1

1ˆ1

k k k k k

NMCu T u T u T u T u T

i i

iNMC

yyΣ y y y y

, , ,

1

1 ˆ ˆ1

k k k k k k k

y

NMCT T f T f T T T f T

N i i

iNMC

I K H y y K d d

, , ,ˆ ˆk k k k k k k

y

T T f T f T T T f T

N i i

I K H y y K d d

, ,ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆk k k k k k k k k k k

y y y

T T f T T T T T T T T f T

N N N

yy εε yyI K H Σ I K H K Σ K I K H Σ (2.31)

and comparing (2.30) - (2.31) with (2.20) - (2.22) show that in the limit of infinite sample

size the empirical moments of the updated collection converge to their corresponding

theoretical counterparts.

The state vector , kf T

y contains model parameters, state variables and production data

in most of the KF-based applications performed in the context of data assimilation in

groundwater and subsurface multiphase flow models. In these cases, assuming that the

density of , kf T

y is multivariate normal (see (2.12)) is in general sub-optimal because of the

non-linear relationship between the elements of the model state vector. For this reason the

solution obtained by means of the updating equations (2.20)-(2.22) can be considered only an

approximation of the true system state. One of the main drawbacks related to this

approximation is the appearance of unphysical updates, for which the updated model

variables do not satisfy mass conservation or saturations are associated with values which can

be negative or larger than unity.

25

3. Kalman Filter coupled with stochastic moment equations of

transient groundwater flow

This Chapter focuses on data assimilation in models of transient groundwater flow in

randomly heterogeneous media via Kalman Filter. We propose to solve the forward step

entailed in the Kalman Filter scheme through a direct solution of approximate nonlocal

(integrodifferential) moment equations (ME) that govern the space-time evolution of

conditional ensemble means (statistical expectations) and covariances of hydraulic heads and

fluxes. This procedure allows circumventing the need for computationally intensive Monte

Carlo (MC) simulation.

In Section 3.1 we extend the ME formulation of Ye et al. [2004] in a way that renders

it compatible with KF. Section 3.2 describes the key steps of the assimilation procedure

performed using the common MC-based EnKF as well as our new ME-based version. Section

3.3 explores the feasibility and accuracy of the proposed algorithm on a synthetic problem of

two-dimensional transient groundwater flow toward a well pumping water from a randomly

heterogeneous confined aquifer subject to prescribed head and flux boundary conditions. In

Section 3.4 the same flow setting is considered for nine heterogeneous systems differing from

each other in the variance and integral scale of the log-hydraulic conductivity field. A

detailed comparison of the performances and accuracies of ME- and MC-based EnKF is

presented and results and implications are discussed in Section 3.5.

3.1 Extended transient moment equations of groundwater flow

We consider transient groundwater flow in a saturated domain governed by

stochastic partial differential equations of mass balance and Darcy’s law

,, ,S

h tS t f t

t

xq x x x (3.1)

26

, ,t K h t q x x x x (3.2)

subject to initial and boundary conditions

0, 0h t H x x x (3.3)

, ,h t H tx x Dx (3.4)

, ,t Q t q x n x x Nx (3.5)

where ,h tx is hydraulic head and , tq x the Darcy flux vector at point , tx in space-

time, K x is an autocorrelated random field of scalar hydraulic conductivities, SS is

specific storage treated here as a deterministic constant, 0H x is (generally) a random

initial head field, ,f tx is (generally) a random source function of space and time, ,H tx

and ,Q tx are (generally) random head and normal flux conditions on Dirichlet boundaries

D and Neumann boundaries N , respectively, and n is a unit outward normal to N .

The Laplace transform of a function g t is defined as

0

tg e g t dt

(3.6)

where is a complex Laplace parameter. Taking the Laplace transform of (3.1) - (3.5)

yields the transformed flow equations

0, , ,S SS h f S H x x q x x x x x (3.7)

, ,K h q x x x x (3.8)

, ,h H x x Dx (3.9)

, ,Q q x n x x Nx (3.10)

27

Each random quantity in (3.7) - (3.10) can be written as the sum of its (conditional)

ensemble mean (statistical expectation) and a zero-mean random fluctuation about that mean

such that

K K K x x x (3.11)

, , ,h h h x x x (3.12)

, , , q x q x q x (3.13)

Ye et al. [2004] present and solve numerically non-local conditional stochastic MEs

satisfied by the mean and covariance of h and q and by the cross-covariance between h and

K for a special case in which all forcing terms ( f , 0H , H , and Q ) are uncorrelated with

each other and/or with K . To embed (3.7) - (3.10) in the KF scheme, the total simulation

period is segmented into a sequence of time intervals according to the number of time steps at

which measurements need to be assimilated. We solve the MEs within each time interval

1k kT T and treat the updated moments of h (and flux) at time 1kT as initial condition.

The MEs of Ye et al. are therefore extended in a way that takes these cross-correlations into

account.

Like Ye et al. [2004], we render the exact MEs workable by expanding them to

second-order in Y , the conditional standard deviation of (natural) log-conductivity

lnY Kx x , about its conditional mean, Y x . We adopt the notation of Ye et al.

[2004] and approximate the Laplace transform of conditional mean head and flux by their

leading terms up to second-order (denoted by parenthetic superscript) in Y

0 2, , ,h h h x x x (3.14)

0 2, , , q x q x q x (3.15)

The system of equations satisfied by the zero-order mean and flux is given by

28

0 0

0, , ,S SS h f S H x x q x x x x x (3.16)

0 0, ,GK h q x x x x (3.17)

0, ,h H x x Dx (3.18)

0, ,Q q x n x x Nx (3.19)

Here, expGK Y x is the conditional geometric mean of K ; f , 0H , H and

Q are ensemble mean (in part Laplace transformed) forcing terms. For simplicity we treat

f , H and Q as deterministic. Second-order corrections of head and flux are governed by

2

2 2 0 2, , , ,

2

Y

GK h h

xq x x x r x x (3.20)

2 2, , 0SS h x x q x x (3.21)

2, 0h x Dx (3.22)

2, 0 q x n x Nx (3.23)

where 2 2

Y Y x x is the conditional variance of Y x and the second-order

transformed residual flux, 22

, ,K h r x x x , in (3.20) is evaluated according

to

2 0 0, , , , , dG G YK K C G h

x y yr x x y x y y x y y

2 0

0 , , dSK h S G

xx y y y x y (3.24)

where ,YC Y Y x y x y is the conditional covariance of Y between points x and y ,

the superscript ‘+’ denoting transpose. The zero-order conditional mean random Green’s

function, 0, ,G y x , associated with (3.7) - (3.10) is obtained upon writing (3.16) -

29

(3.19) in terms of ,y and solving them subject to homogenous boundary conditions and a

Dirac delta source at x . The last integral on the right hand side of (3.24), containing the

conditional cross-correlation between hydraulic conductivity and initial head fluctuations 0h ,

is new and does not appear in equation (39) of Ye et al. [2004]. For reasons explained earlier,

this term may vanish during the first time interval 0 1T T (in particular when 0H is

deterministic) but not during later intervals. The second-order conditional cross-covariance,

2, ,Khu x y , between K x and transformed head ,h y is evaluated according to

2 0 0, , , , , , dKh G G Yu K K C h G

z zx y x z z x z z y z

2 0

0 , , dSS K h G

z x z z y z (3.25)

Corresponding equations for the conditional second-moment (variance-covariance) of

associated head prediction errors are evaluated according to

2 2 0 2, , , , , , , , ,G h Kh S hK C s u s h S C s

x x xx x y x y x x x y

2

0 ,SS h h s x x y x (3.26)

2, , , 0hC s x y Dx (3.27)

2 2 0, , , , , , 0G h KhK C s u s h

x xx x y x y x n x Nx (3.28)

Here, 2, , , , ,hC s h h s x y x y is the conditional covariance between transformed

and untransformed head fluctuations ,h x and ,h s y . The term on the right hand side of

(3.26) includes the covariance between head ,h sy at time s and initial head 0h x . This

covariance is rendered by the inverse Laplace transform with respect to s of

2 20 0

0 0, , , , dh h h K h G

z zx y z z x z y z

2 0

0 0 , , dSS h h G

z x z z y z (3.29)

30

Like Ye et al. [2004] the above MEs are solved by a Galerkin finite element method

using bilinear Lagrange interpolation functions. The finite element equations are shown in

Appendix A. Laplace back transformation into the time domain is performed using the

quotient difference algorithm of De Hoog et al. [1982]. The numerical code has been

parallelized to (i) solve (3.14) - (3.24) for different values of simultaneously, and (ii)

compute the cross-covariances and covariances (3.25) - (3.29) at subsets of grid nodes which

are uniformly distributed among available processors in a cluster.

3.2 Data assimilation of groundwater flow data via KF: MC-based

EnKF and ME-based approach

We consider the model vector

Yy

h (3.30)

where the parameter vector Y contains YN log-conductivities and the state vector h includes

hN hydraulic head values satisfying (3.1) - (3.5), so that y has dimension y Y hN N N . In

our finite element solver of (3.1) - (3.5), described above, YN is the number of elements in

which hydraulic conductivity is taken to be uniform and hN is the number of nodes at which

heads are computed.

According to the notations introduced in Chapter 2, we denote the model vector y at

time 1kT conditioned on measurements available up to time 1kT , by 1, ku T y . In line with

Tarantola [2005], Cohn [1997] and Woodbuty and Ulrych [2000] we consider 1, ku T y to be

multivariate Gaussian with mean vector

1

1

1

,

,

,

k

k

k

u T

u T

u T

Yy

h (3.31)

and covariance matrix

31

1 1

1

1 1

, ,

,

, ,

k k

k

k k

u T u T

Y Yhu T

u T u T

Yh h

C u

u C

yyΣ (3.32)

Here, 1, ku T

YC and 1, ku T

hC are the conditional covariance matrix of 1, ku T Y and 1, ku T h ,

respectively, and 1, ku T

Yhu is their cross-covariance matrix.

The forward step entailed in the KF algorithm requires solving the system of

stochastic PDEs (3.1) - (3.5) within the time interval 1k kT T and with random initial

conditions given by (3.31) - (3.32).

One way of solving the forward step is to rely on Monte Carlo (MC) simulation. As

detailed in Chapter 2, MC requires representing the density function of 1, ku T y through a

collection of model realizations, 1, ku T

jy , 1, ,j NMC . With this approach equations (3.1) -

(3.5) are solved within the time interval 1k kT T for each model realization j. This is

accomplished by employing the deterministic log-conductivity field and initial head field

contained in 1, ku T

jy . The MC solution yields the collection of forward realizations at time kT ,

, kf T

jy , 1, ,j NMC .

As an alternative to MC, we propose to solve the forward step directly through the

system of moment equations (3.7) - (3.29). These MEs are solved within the time interval

1k kT T upon setting the mean and the covariance of the log-conductivity field equal to

1, ku T Y in (3.31) and 1, ku T

YC in (3.32), respectively. This approach requires treating the

initial conditions as random. The initial head field is characterized by mean and covariance

matrix equal to 1, ku T h and 1, ku T

hC , respectively, while the cross-covariances between

conductivities and initial heads are set to 1 1, ,k ku T u T

Kh G Yhu K u . The ME solution yields second-

order approximation of mean and covariance matrix of the forward vector at time kT , , kf T

y

32

,

,

,

k

k

k

f T

f T

f T

Yy

h (3.33)

, ,

,

, ,

k k

k

k k

f T f T

Y Yhf T

f T f T

Yh h

C u

u C

yyΣ (3.34)

The measurements of Y and/or h available at time kT and the covariance matrix of

the corresponding measurement errors are then used in the analysis step of the KF algorithm.

Working with MC, (2.24) - (2.30) allow obtaining the collection of updated model

realizations , ku T

jy , 1, ,j NMC . In the ME-based approach, (2.20) - (2.22) are used for the

evaluation of the mean, , ku Ty , and covariance matrix,

, ku T

yyΣ of the target updated density

function. Figures 3.1 - 3.2 summarize the assimilation algorithms associated with MC and

ME approaches, respectively.

Figure 3.1. Flow chart of data assimilation through common MC-based EnKF.

Initial conditions:T0

Forecast: MC (3.1) - (3.5)

kTd

kT

εεΣ

Updating: EnKF (2.23) - (2.29)

Observed Data

Assimilation

0T

j

Y

h

k =1…n

1, ,j NMC

, kf T

j

Y

h1, ,j NMC

, ku T

j

Y

h1, ,j NMC

33

Figure 3.2. Flow chart of data assimilation through embedding of stochastic moment

equations of transient groundwater flow in the KF scheme.

3.3 Exploratory synthetic example of data assimilation and

parameter estimation

We explore the feasibility and accuracy of our ME-based approach by way of a two-

dimensional transient flow example. We consider a square domain measuring 40 × 40 (all

quantities are given in consistent space-time units) discretized into grid cells of size 1 × 1.

Each element has uniform hydraulic conductivity, yielding a parameter vector Y of

dimension 1600YN . Head values are prescribed or computed at 1681hN nodes, yielding

a head vector h of similar dimension. Whereas deterministic head values equal to 1 1.0H

and 2 0.0H are prescribed along the left and right boundaries, the top and bottom

boundaries are made impervious (Figure 3.3). Storativity is set equal to a uniform

deterministic value of 0.3. Initial hydraulic heads are deterministic and vary linearly between

the two constant head boundaries. Superimposed on this background gradient is convergent

, kf T

Y

h

, kf T

Y Yh

Yh h

C u

u C

Initial conditions:T0

Forecast: ME (3.7) - (3.29)

kTd

kT

εεΣ

Updating: KF (2.20) - (2.22)

Observed Data

Assimilation

0T

Y

h

k =1…n

0T

Y Yh

Yh h

C u

u C

, ku T

Y

h

, ku T

Y Yh

Yh h

C u

u C

34

flow to a centrally located well that starts pumping at a deterministic constant rate 0.3pQ

at reference time 0t . Mathematically the well is simulated by setting , 0 0f t x and

, 0 pf t Q wx x x in (3.1) where is the Dirac delta function, wx are the Cartesian

coordinates of the well and well radius is neglected.

Figure 3.3. Flow domain, nodes of the computational grid (+), boundary conditions, pumping

well (○), log-conductivity (◊) and hydraulic head (∆) measurement locations.

Values of Y in each grid cell are set equal to those generated at element centers using

a sequential Gaussian simulator [SGSIM, Deutsch and Journel, 1998]. The generated values

form a random realization (depicted in Figure 3.4) of a statistically homogeneous and

isotropic multivariate Gaussian field having variance 2.0 and exponential covariance with

integral scale 4.0. This strongly heterogeneous reference field is characterized by spatial

mean and variance equal, respectively, to 0.00 and 1.71. We solve the corresponding

deterministic flow problem through the system (3.1) - (3.5) for a time period of 80 units

0.0

10.0

20.0

30.0

40.0

0.0 10.0 20.0 30.0 40.0

Impervious boundary

Impervious boundary

Co

nst

ant h

ead

H

1

Co

nst

ant h

ead

H

2

1x

2x

14

13

1

4

75

3

12 2

6

15

16

8

18

10

9

11

20

19

17

35

80.0maxT to obtain a corresponding reference head distribution in space-time. Figure 3.5

shows the time evolution of reference head at eight selected measurement points. The vertical

line in Figure 3.5 at 30.0kT separates an early transient flow regime from a later pseudo

steady state regime during which heads are seen to vary linearly with log-time.

Figure 3.4. Spatial distribution of log-hydraulic conductivity in the reference model.

We sample the reference Y field, refY , in nine elements uniformly distributed ( mY ,

1, ,9m ) across the domain and the reference head values at 20 grid points and kN

observation times ( kT

nh , 1, ,20n , 1, , kk N ). The spatial locations of the measurement

points are indicated in Figure 3.3. The Y and h measurements are corrupted with zero-mean

white Gaussian noise, m and kT

n , having standard deviations YE and hE , respectively,

according to

m m mY Y 1, ,9m (3.35)

, k k kT T T

n n nh h 1, ,20n 1, , kk N (3.36)

0.0 10.0 20.0 30.0 40.00.0

10.0

20.0

30.0

40.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

4.0

Y

1x

2x

36

Figure 3.5. Temporal evolution of reference hydraulic head (curves) and noisy measurements

(symbols) at seven locations identified in Figure 3.3.

We consider three case studies (TC1, TC2 and TC3) with diverse values of kN and

YE . Test case 1 (TC1) considers ten observation times ( 5.0;kT 10.0; 15.0; 20.0; 25.0; 30.0;

35.0; 40.0; 60.0; 80.0; k = 1,2,… kN ) and 0.1YE . In test case 2 (TC2) the measurement

error variance of Y exceeds that in TC1 by one order of magnitude, the standard deviation

being now 0.32YE . The third test case (TC3) differs from TC1 in that it includes eleven

additional observation times ( kT 3.0; 7.0; 9.0; 11.0; 13.0; 17.0; 19.0; 21.0; 23.0; 27.0;

29.0). All three test cases consider the measurement error variance of heads, 2

hE , equal to

410 .

In our example the vector kTd introduced in (2.2) contains the perturbed sample of

hydraulic head at time kT as defined in (3.36). The corresponding covariance matrix of head

measurement errors, kT

εεΣ , is diagonal homoscedastic with entries equal to 2

hE . Entries kT

ijH

Hea

d

kT

-0.8

-0.4

0.0

0.4

0.8

1.2

5 5015 30

1

4

389

6

7

37

of kTH are equal to 1 when the i-th element of kT

d is a measurement of the j-th entry of kTy

and 0 otherwise.

The elements of the vector Y containing the measurements of log-conductivity as

defined in (3.35) are employed for generating the mean and the covariance matrix of the log-

conductivity field, Y , at the initial time 0T . The perturbed samples of log-conductivity are

projected via ordinary kriging onto the centroids of all grid elements assuming knowledge of

the corresponding variogram model and parameters.

Figures 3.6 and 3.7 respectively depict estimates of Y and corresponding variances at

each assimilation step of TC1. The estimates of Y in Figure 3.6 evolve toward a pattern

similar to that of the reference Y field in Figure 3.4. The rate of evolution is fastest at early

time and slowest during the pseudo steady state period at 30kT . A similar phenomenon

was observed by Chen and Zhang [2006] when coupling EnKF with standard MC simulation,

and by Riva et al. [2009] during batch transient inversion of stochastic MEs using maximum

likelihood. Prior to the start of assimilation (at 0kT ) the estimation variance of Y in Figure

3.7 is close to the unconditional reference variance everywhere except near the nine

measurement points at which it is equal to the error measurement variance. Assimilation

brings about a rapid reduction in this estimation variance at early time and a much reduced

rate of reduction at later times.

38

Figure 3.6. Estimates of log-conductivity Y at initial times 0kT and at ten updating times for test case TC1.

Tk = 0 Tk = 5 Tk = 10 Tk = 15 Tk = 20 Tk = 25

Tk = 30 Tk = 35 Tk = 40 Tk = 60 Tk = 80

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

4.0

Y

39

Figure 3.7. Estimation variance of Y at initial times 0kT and at ten updating times for test case TC1.

2

Y

0.0

0.5

1.0

1.5

2.0

2.5Tk = 0 Tk = 5 Tk = 10 Tk = 15 Tk = 20 Tk = 25

Tk = 30 Tk = 35 Tk = 40 Tk = 60 Tk = 80

40

These phenomena are reflected quantitatively in the temporal behaviors of YE , the average

absolute difference between estimates ,Y ku T and reference values refY at all element

centroids ix , and of YV , the average estimation variance 2 ,u t

Y

at these points, defined as

, *

1

1

x xYN

u t

Y i i refiY

E t Y YN

(3.37)

2 ,

1

1 YNu t

Y Y i

iY

V tN

x (3.38)

where k maxt T T is normalized time (assimilation take place at 0.0625;t 0.125; 0.1875;

0.25; 0.3125; 0.375; 0.4375; 0.50; 0.75; 1.00). Indeed, Figure 3.8 demonstrates that YE and

YV decrease more sharply with t at early time than during the later pseudo steady state

period.

Figure 3.8. Average absolute difference YE t between estimated and reference Y values,

and corresponding average estimation variance YV t , versus t for test case TC1.

t

YE YV

0.8

1.0

1.2

1.4

1.6

1.8

2.0

0.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

0.8

1.0

1.2

1.4

1.6

1.8

2.0

0.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

5

7

YE

YV

41

Figure 3.9 depicts scatter plots of estimated versus reference Y values at 0,kT 15,

30, and 80 together with intervals corresponding to ± two standard deviations, *,2 u t

Y x , of

the estimates about their mean values. More than 90% of the estimates are seen to lie inside

these intervals at each time kT , even as the intervals narrow with increasing kT . Linear

regression lines fitted in Figure 3.10 to the data have slopes that increase with time from 0.14

at 0kT to 0.37 at 30kT , and coefficients of determination 2R that likewise increase from

0.18 at 0kT to 0.33 at 30kT . Beyond 30kT , these variations are comparatively small.

Figure 3.9 also shows that, due to the relatively small standard deviation of Y measurement

errors, estimates of Y at the nine measurement points do not change much during the

assimilation process.

Figures 3.10 and 3.11 show that increasing the measurement error variance of Y by

one order of magnitude, as is done in TC2, has only a minor effect on the temporal behavior

of YE and YV . Including eleven additional observation times (TC3) allows obtaining values

of YE considerably reduced, while YV remains virtually unaffected. This behavior indicates

that increasing the number of assimilation data at early time improves parameter estimates

without underestimating their variance. Figure 3.12 compares slopes of regression lines fitted

to scatter plots of estimated versus reference Y values at various updating times in each test

case. The graph shows that whereas adding noise to log-conductivity measurements causes

their estimates (in terms of this slope) to deteriorate slightly for the scenarios considered,

adding early time measurements renders the estimates markedly more accurate.

42

Figure 3.9. Scatter plots of estimated and reference Y at four Tk values; corresponding

intervals of ± two standard deviations of Y estimates about their mean (gray lines); and linear

regression fits to the data (black lines), for test case TC1. Y estimates at the nine measurement

locations are highlighted in red.

-4.0

-2.0

0.0

2.0

4.0

-4.0 -2.0 0.0 2.0 4.0

-4.0

-2.0

0.0

2.0

4.0

-4.0 -2.0 0.0 2.0 4.0

-4.0

-2.0

0.0

2.0

4.0

-4.0 -2.0 0.0 2.0 4.0

-4.0

-2.0

0.0

2.0

4.0

-4.0 -2.0 0.0 2.0 4.0

refYrefY

refYrefY

Y Y

Y Y

Tk = 0 Tk = 15

Tk = 30 Tk = 80

0.14 0.17refY Y 2 0.18R

0.32 0.17refY Y 2 0.31R

0.37 0.10refY Y 2 0.33R

0.39 0.06refY Y 2 0.34R

(a) (b)

(c) (d)

43

Figure 3.10. Average absolute difference YE t between estimated and reference Y values

versus t for test cases TC1, TC2 and TC3.

Figure 3.11. Average estimation variance YV t versus t for test cases TC1, TC2 and TC3.

0.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

5

7

8

0.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

t

YE

TC1

TC2

TC3

0.80

1.00

1.20

1.40

1.60

1.80

2.00

0.0 0.2 0.4 0.6 0.8 1.00.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

5

7

8

t

YV

TC1

TC2

TC3

44

Figure 3.12. Slopes of regression lines fitted to scatter plots of estimated versus reference Y

values at various updating times in each test case.

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0.0 0.2 0.4 0.6 0.8 1.0

0.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

5

7

8

t

TC1

TC2

TC3Reg

ress

ion

lin

e sl

op

e

45

The impact of initial hydraulic heads on estimates of Y is analyzed by repeating the

three test cases described above with stochastic initial heads, the moments of which are

obtained by solving the steady-state MEs [Guadagnini and Neuman, 1999] with kriged mean

permeability and corresponding covariance without pumping. We designate the test cases

corresponding to this initial condition as TCi_S ( 1,2,3i ). It is important to note that in this

case the cross-correlation between K and initial heads in (3.24) - (3.29) does not vanish (not

even during the first time interval). This cross-correlation is provided by the solution of the

steady-state MEs.

In all these test cases the average estimation variance is found to remain unaffected by

the initial head. On the other hand, YE is slightly influenced by 0H . Figures 3.13 and 3.14

depict the temporal behaviors of YE for TC1, TC3, and all considered 0H . Results

corresponding to TC2 are qualitatively similar to those of TC1 and are not shown. The effects

of the nature (stochastic or deterministic) of 0H depend on the frequency of head

observation. For TC1, the adoption of a stochastic 0H improves (globally) the Y estimate

field slightly relative to those obtained with a deterministic and linear 0H . The opposite

happens in case TC3 even as the difference in YE tends to decrease with time (see Figure

3.14). In the case of random 0H , increasing the frequency of head observations does not

cause YE to decrease significantly in comparison to the case of deterministic 0H (compare

Figures 3.13 and 3.14).

46


versus t for test cases TC1 and TC1_S.


versus t for test cases TC3 and TC3_S.

0.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

t

YE

0.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

5

7

8

TC1

TC1_S

t

YE

0.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

5

7

8

TC3

TC3_S

0.75

0.80

0.85

0.90

0.95

1.00

1.05

0.0 0.2 0.4 0.6 0.8 1.0

47

The analysis illustrated above treats the functional form and parameters of the

variograms used to generate the reference Y field as given. To test the influence of the

variogram model and parameters on our estimates, three additional test cases were performed

using the same reference log-conductivity and head fields and the same conditioning data set

of TC1. In one test case (TC4), we increased the unconditional variance and integral scale of

Y to 3 and 6, respectively, and decreased them to 1 and 2, respectively, in another (TC5). In

the last test case (TC6), we changed the functional form of the variogram from exponential to

Gaussian, without changing the unconditional variance and integral scale of Y. The results,

shown in Figures 3.15 and 3.16 confirm in part the finding due to Chen and Zhang [2006]

that incorrect initial variance and integral scale values of Y have no significant adverse effect

on YE or YV , the latter tending to decrease with diminishing initial sill and integral scale

values. The effect of incorrect initial variance and integral scale of Y on YV was found to

diminish with time. This observation might obviate the need to estimate variogram

parameters jointly with Y, as done in the context of steady state and batch transient

geostatistical inversion of MEs by Riva et al. [2009, 2011]. In contrast, adopting an incorrect

variogram model caused the quality of YE , YV , and correlations between estimated and true Y

values to deteriorate at all times.

48


versus t for test cases TC1, TC4, TC5 and TC6.

Figure 3.16. Average estimation variance YV t versus t for test cases TC1, TC4, TC5 and

TC6.

0.75

0.85

0.95

1.05

1.15

0.0 0.2 0.4 0.6 0.8 1.0

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15

1.20

0.0 0.2 0.4 0.6 0.8 1.0

5 21

13 14

t

TC1

YE

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15

1.20

0.0 0.2 0.4 0.6 0.8 1.0

5 21

13 14TC4

TC5

TC6

0.00

0.50

1.00

1.50

2.00

2.50

0.0 0.2 0.4 0.6 0.8 1.0

t

YV

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15

1.20

0.0 0.2 0.4 0.6 0.8 1.0

5 21

13 14

TC1

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15

1.20

0.0 0.2 0.4 0.6 0.8 1.0

5 21

13 14TC4

TC5

TC6

49

3.4 Comparison between MC-based EnKF and ME-based approach

We compare the performances and accuracies of MC-based EnKF and our ME-based

implementation on nine synthetic problems. We adopt the identical domain and

computational grid employed in Section 3.3 and the same flow setting depicted in Figure 3.3.

In these synthetic cases deterministic head values of 1 0.8H and 2 0.0H are prescribed

along the left and right domain boundaries, respectively. These conditions generate a mean

hydraulic gradient of 2% aligned along direction 1x . As in Section 3.3, the bottom and top

domain boundaries are taken as impervious. Initial heads are considered as random.

Superimposed on this background gradient is convergent flow to a centrally located well that

starts pumping at a deterministic constant rate 310pQ at the reference time 0t .

Storativity is set equal to a uniform deterministic value of 10-4

. The nine problems differ from

each other in the variance and integral scale of the reference log-hydraulic conductivity

fields, lnref refY Kx x . The latter are generated (Figure 3.17) by sampling statistically

homogeneous and isotropic multivariate Gaussian fields having mean equal to

4ln 10 9.21 and 9 exponential variograms with different combinations of sill and

integral scale, YI , as detailed in Table 3.1. The reference realizations are generated by the

sequential Gaussian simulator SGSIM of Deutsch and Journel [1998]. Included in Table 3.1

are the ratios between domain length scale and YI , sample variance, as well as sill and

integral scale obtained for each reference realization by fitting, via least squares, an

exponential variogram model to the corresponding sample variogram. The least squares

variogram parameter estimates are seen to differ, generally, from their original field values.

50

Input parameters Least squares fit

Ref. case Sill IY Domain side / IY Sample variance Sill IY

TC1 0.5 4.0 10 0.43 0.41 3.02

TC2 1.0 4.0 10 1.08 1.22 6.20

TC3 2.0 4.0 10 1.80 1.89 3.53

TC4 0.5 10.0 4 0.34 0.42 6.718

TC5 1.0 10.0 4 0.89 1.58 15.95

TC6 2.0 10.0 4 1.62 2.50 15.62

TC7 0.5 20.0 2 0.39 0.53 17.94

TC8 1.0 20.0 2 0.66 1.16 23.85

TC9 2.0 20.0 2 1.40 2.47 22.19

Table 3.1. Variogram input parameters, ratio between domain side and IY, sample variance,

sill and integral scale obtained by fitting, using least squares, an exponential variogram model

to the corresponding sample variogram.

Both MC- and ME-based EnKF require specifying the variogram parameters for the

initial step. We work with the generating rather than the estimated sill and integral scale to

avoid introducing additional sources of uncertainty in the comparison. Chen and Zhang

[2006] showed that incorrect initial sill and integral scale of Y have only a secondary effect

on the final log-conductivity estimates. On the other hand, Jafarpour and Tarrahi [2011]

found in analyzing flow through a highly anisotropic system that inaccuracies in prescribed

directional integral scales tends to persist throughout MC-based EnKF runs.

We solve numerically the groundwater flow equations (3.1) - (3.5) for the duration of

200 time units ( 200.0maxT ). Similarly to Section 3.3, we sample each reference Y field in

nine elements uniformly distributed across the domain and the reference head fields at 20 grid

points (Figure 3.3) and 10 observation times ( kT

nh , n = 1,…20, kT = 10.0; 15.0; 20.0; 25.0;

30.0; 50.0; 80.0; 100.0; 150.0; 200.0; k = 1, 2, ..., 10). This selection of observation times

enables us to sample transient as well as pseudo steady state flow regimes (during which

51

computed heads vary linearly with log-time) the latter of which develop, in these cases, at

80kT . The log-conductivity and head samples are turned into “measurements” by

corrupting them with white Gaussian noise, εm and ε kT

n, having zero mean and standard

deviations 0.1YE and 0.01hE , respectively, as defined in (3.35) – (3.36).

The resulting absolute relative differences between reference and measured values

range from 0.0% to 2.6% (with mean 0.8%, mode 1.7%, 5th

percentile 0.2% and 95th

percentile 21.5%) for log-conductivity and from 0.0% to 144% (with mean 4.6%, mode

0.6%, 5th

percentile 0.0% and 95th

percentile 19.4%) for hydraulic head. Large relative errors

(> 50%) in head measurement are thus obtained far from the pumping well, at short times kT ,

where kT

nh are close to zero.

The elements of d kT, kT

εεΣ and kT

H are defined in the same way as described in Section

3.3. The perturbed log-conductivity samples included in the vector Y are made available at

initial time 0T . In the ME-based assimilation Y is used for generating the initial mean and

the covariance matrix of the model vector. In the MC-based EnKF, the initial collection of

log-conductivity realizations, 0T

iY , 1, ,i NMC , is generated using the true variogram

model with parameters listed in Table 3.1. Each 0T

iY is conditioned on the randomized

measurement vector i

Y , obtained by perturbing each element of

Y with a Gaussian noise

having standard deviation YE . For each MC realization, 0T

iY , the initial head vector, 0T

ih , is

computed by solving the deterministic steady-state flow problem (3.1) - (3.5) without

pumping.

In most previous applications of MC-based EnKF [Chen and Zhang, 2006; Hendricks

Franssen and Kinzelbach, 2008; Schoeniger et al., 2012; Xu et al., 2013] the number NMC

of Monte Carlo runs did not exceed a few hundred. Recognizing that NMC may have an

52

impact on the results and that estimates of mean and variance of a random variable converge

at a rate which diminishes with NMC [Ballio and Guadagnini, 2004], we consider here a

series of values NMC = 100; 500; 1,000; 5,000; 10,000; 50,000; 100,000. Figures 3.17 and

3.18, respectively, compare the spatial distributions of updated log-conductivity ,Y ku T , and

corresponding estimation variances 2 , ku T

Y (diagonal entries of , ku T

YC ), at the final assimilation

time ( 200.0kT ) for all nine reference cases obtained by ME- and MC-based EnKF. Values

of ,Y ku T obtained with 1,000NMC exhibit more pronounced spatial variabilities than do

those obtained from a larger number of MC realizations. Indeed, as ,Y ku T represents a

relatively smooth estimate of Y, spatial fluctuations are expected to diminish with increasing

NMC . Results obtained with 10,000NMC are similar to those obtained with

10,000NMC for all cases examined and therefore not shown.

Estimation variance is seen to vary locally with NMC , due most likely to filter

inbreeding. The problem seems to disappear at 1,000NMC where the spatial distribution

of MC-based variances is quite similar to that of their ME-based counterparts.

53

ME NMC = 10,000 NMC = 1,000 NMC = 500 NMC = 100 Reference

field

TC1

TC2

TC3

TC4

TC5

TC6

TC7

TC8

TC9

Figure 3.17. Spatial distributions of ,Y ku T

at Tk = 200 obtained by ME- and MC-based EnKF with diverse

values of NMC. Reference Y fields are also shown.

-12

-10

-8

-6

Y

- 6

- 8

- 10

- 12

-12

-10

-8

-6

Y

- 6

- 8

- 10

- 12

-12

-10

-8

-6

Y

- 6

- 8

- 10

- 12

-12

-10

-8

-6

Y

- 6

- 8

- 10

- 12

-12

-10

-8

-6

Y

- 6

- 8

- 10

- 12

-12

-10

-8

-6

Y

- 6

- 8

- 10

- 12

-12

-10

-8

-6

Y

- 6

- 8

- 10

- 12

-12

-10

-8

-6

Y

- 6

- 8

- 10

- 12

-12

-10

-8

-6

Y

- 6

- 8

- 10

- 12

54

ME NMC = 10,000 NMC = 1,000 NMC = 500 NMC = 100

TC1

TC2

TC3

TC4

TC5

TC6

TC7

TC8

TC9

Figure 3.18. Spatial distributions of 2 , ku T

Y at Tk = 200 obtained by ME- and MC-based EnKF with diverse

values of NMC. Reference Y fields are also shown.

0

0.1

0.2

0.3

0.4

0.5

0.4

0.3

0.2

0.1

0.0

0

0.5

1

1.5

21.0

0.75

0.5

0.25

0.0

0

0.5

1

1.5

22.0

1.5

1.0

0.5

0.0

0

0.1

0.2

0.3

0

0.1

0.2

0.30.3

0.2

0.1

0.0

0

0.1

0.2

0.30.6

0.4

0.2

0.0

0

0.1

0.2

0.30.6

0.4

0.2

0.0

0

0.05

0.1

0.15

0.2

0

0.5

1

1.5

20.2

0.15

0.1

0.05

0.0

0

0.1

0.2

0.3

0.4

0

0.5

1

1.5

20.4

0.3

0.2

0.1

0.0

0

0.2

0.4

0.6

0

0.1

0.2

0.30.75

0.5

0.25

0.0

55

Figures 3.19 and 3.20, respectively, show temporal behaviors of the average absolute

difference, YE , as well as the average estimation variance, YV , defined in (3.37) - (3.38).

Assimilation in these cases takes place at 0.050t , 0.075, 0.100, 0.125, 0.150, 0.250,

0.400, 0.500, 0.750, 1.00). YE and YV are seen to increase as the sill of the variogram

increases and as YI decreases. The largest difference between MC-based values of YE and

YV obtained with 100NMC and with 10,000NMC occurs in TC3 (Figures 3.20c and

3.21c) where the sill is largest and the integral scale smallest. Figure 3.19 shows that whereas

YE tends to decrease with NMC , at large NMC its MC- and ME-based values are close. The

only exception is TC9 (associated with the largest sill and YI , Figure 3.19i) where the curve

obtained with 10,000NMC lies slightly below that obtained with ME-based EnKF. In

TC9, the relative difference between MC- and ME-based results varies between 18% at small

t and 10% at large t . We ascribe this behavior to approximations required to close what

would otherwise be exact moment equations. Inaccuracies associated with these

approximations tend to increase with increasing values of YI relative to domain size.

56

Figure 3.19. YE versus t* for the nine test cases. ME-based (solid black) and MC-based

results with NMC = 100 (dashed gray), 500 (dashed-dotted gray), 1,000 (solid gray), and

10,000 (dashed-dotted black) are reported.

Figures 3.19 and 3.20 indicate that assimilations done with 100NMC are generally

associated with (a) large YE values that tend to increase with time and (b) small YV values

that tend to decrease with time. The two phenomena are symptomatic of filter inbreeding.

Several authors [Hendricks Franssen and Kinzelbach, 2008; Liang et al., 2012; Xu et al.,

2013] suggest to analyze the occurrence of filter inbreeding by plotting the ratio Y YV MSE

versus time where

2

, *

1

1 YNu t

Y i i refiY

MSE t Y YN

x x (3.39)

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

20.0YI

10.0YI

4.0YI YE

YE

YE

t t t

Sill = 0.5 Sill = 1.0 Sill = 2.0

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

57

Figure 3.20. YV versus t* for the nine test cases. ME-based (solid black) and MC-based



Under ideal conditions, Y YV MSE should be equal to unity [Liang et al., 2012]. Here we

explore this issue by considering also the quantity

*, , *

2

1

12

x x xY

Y

Nu t u t

Y i i i refiY

P t H Y YN

(3.40)

where H is the Heaviside step function, 2 YP representing percent reference values of Y

lying inside a confidence interval of width equal to ± 2 *, xu t

Y i about , *x

u t

iY . Analyses

of how Y YV MSE (Figure 3.21) and 2 YP (Figure 3.22) evolve with time lead to similar

conclusions. When 1,000NMC , Y YV MSE and 2 YP decrease with time, exhibiting a

distinct filter inbreeding effect.

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

YV

YV

YV

Sill = 0.5 Sill = 1.0 Sill = 2.0

t t t

4.0YI

10.0YI

20.0YI

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

58

Figure 3.21. Ratio between YV and

YMSE versus t* for the nine test cases. ME-based (solid

black) and MC-based results with NMC = 100 (dashed gray), 500 (dashed-dotted gray), 1,000

(solid gray), and 10,000 (dashed-dotted black) are reported.

No such deterioration with time is exhibited by either MC-based results with 1,000NMC

or by ME-based outcomes where Y YV MSE remains approximately constant and 2 YP larger

than 90%. The only exception concerns ME-based results associated with TC9 (see Figures

3.21i and 3.22i) where 2 YP is slightly smaller than 90% ( 88%). However, even here the

ME-based values of Y YV MSE and 2 YP show no systematic decrease with time (as would

happen in the presence of filter inbreeding) but instead diminish rapidly during the first

assimilation period and then stay approximately constant. The rapid early decline is likely

due to spurious updates caused by second-order approximation of the cross-covariance terms.

In contrast to MC-based 2 YP which, at small NMC , drops down to below 40%, a steep

decline in ME-based values is limited to early time.

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

Y YV MSE

Y YV MSE

Y YV MSE

Sill = 0.5 Sill = 1.0 Sill = 2.0

4.0YI

10.0YI

20.0YI

ttt

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

59

Figure 3.22. 2 Y

P versus t

* for the nine test cases. ME-based (solid black) and MC-based



0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

2 YP

2 YP

2 YP

Sill = 0.5 Sill = 1.0 Sill = 2.0

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

4.0YI

10.0YI

20.0YI

ttt

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

60

Black dots in Figure 3.23 indicate the spatial location of the reference values of Y

which, following the last assimilation period at 1.0t , lie outside confidence intervals

having widths equal to *,2 u t

Y i x about , *x

u t

iY . This confirms the poor quality of

estimates obtained with 1,000NMC , even in the weakly heterogeneous settings of TC1,

TC4 and TC7. Remarkably, black dots in Figure 3.23 corresponding to MC- (with

10,000NMC ) and ME-based filters have similar spatial distributions. It thus appears that

the two approaches behave similarly in a global (as observed in Figures 3.19 - 3.20) and in a

local sense when NMC is sufficiently large. This behavior can be quantified by analyzing the

percentage of cells in which reference values of Y lie within the 95% confidence intervals

around updated Y values in both the ME- and MC- based solutions. As expected, this metric

is seen to decrease as the number of MC realizations grows in all test cases. In case of the

MC approach, average values of in the nine test cases are 94%, 93%, 91%, 87% and 48%

for NMC = 10,000, 1,000, 500, and 100, respectively.

61

ME NMC = 10,000 NMC = 1,000 NMC = 500 NMC = 100

TC1

TC2

TC3

TC4

TC5

TC6

TC7

TC8

TC9

Figure 3.23. Spatial distributions (black squares) of elements in which Yref lie outside confidence intervals

of width ± 2 ,xku T

Y i about ,

xku T

iY when 200.0kT for the nine test cases.

10,000NMC ME 1,000NMC 500NMC 100NMC









62

Figures 3.24 and 3.25 depict temporal behaviors of hE and hV , the hydraulic head

analogues of YE and YV in (3.37) – (3.38), defined as

, *

1

1

x xhN

u t

h i i refih

E t h hN

(3.41)

2 , *

1

1

xhN

u t

h h i

ih

V tN

(3.42)

where 2 , *x

u t

h i is the estimation variance of h at node xi (i.e., a diagonal component of

,u t

hC

) and hN is the number of nodes, Nh, minus those located on Dirichlet boundaries and at

the pumping well where, theoretically, h due to the negligible well radius. Figure 3.24

shows that hE decreases sharply at the first assimilation time to then increase with t*. The

largest rate of increase is associated with MC-based values obtained with 100NMC . ME-

based values of hE are in general very close to MC-based values obtained with sufficiently

large NMC . On the other hand, the monotonically decreasing temporal trend in YV is not

mirrored by the mean estimation variance of h in Figure 3.25. Instead, hV in Figure 3.25

decreases sharply during the first assimilation step and then increases with time. We attribute

this to the combination of two contrasting effects: (a) the decrease of hV which is typically

associated with the updating step, (b) the temporal increase or decrease (depending on

location in the domain; see also Figure 7 of Ye et al. [2004] and results of Riva et al. [2009]

in head variance during the forward steps. Effect (a) dominates during the first assimilation

period, due to the high information content of the measurements (see also Figures 3.19 -

3.20), causing hV to decrease initially with time. As time increases and pseudo-steady state

conditions are approached, the conditioning head data become less informative (see also Riva

63

et al. [2009]). This is reflected in Figure 3.19 where YE is seen to be almost constant at large

values of t*. Here, effect (b) dominates as manifested by an increase in hV with time.

Figure 3.24. hE versus t* for the nine test cases. ME-based (solid black) and MC-based



0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

hE

hE

hE

Sill = 0.5 Sill = 1.0 Sill = 2.0

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.120.0YI

10.0YI

4.0YI

t t t

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

64

Figure 3.25. hV versus t* for the nine test cases. ME-based (solid black) and MC-based



We close our analysis by plotting in Figure 3.26 the temporal behavior of

, * , *

2

1

12

x x xh

h

Nu t u t

h i i i refih

P t H h hN

(3.43)

representing percent reference h values lying inside a confidence interval of width equal to

, *2 u t

h i x about , *x

u t

ih . Figure 3.26 confirms that filter inbreeding associated with

small NMC impacts not only Y but also h.

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

hV

hV

hV

Sill = 0.5 Sill = 1.0 Sill = 2.0

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

20.0YI

10.0YI

4.0YI

t t t

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

65

Figure 3.26. 2 hP versus t

* for the nine test cases. ME-based (solid black) and MC-based



Ye et al. [2004] compared the computational time required by ME- and MC-based

forward solutions of a transient groundwater flow problem similar to the one we analyze here

and within a domain which is half the size of the one we consider. They found that, with

2,000NMC , the ME-based method required one quarter to one half the computer time to

evaluate mean heads and variances than did the MC-base approach. The authors computed

head variances by solving an integral expression (their (47)) in the presence of deterministic

sources, boundary and initial conditions, which does not require computing a complete head

covariance matrix. To conduct a more comprehensive comparison with the MC-based

approach, we opted in this work to compute the complete head covariance matrix at the end

of each time interval 1k kT T . Our updating step requires computing additional terms

appearing in (3.24)-(3.26) and (3.29). As recognized by Ye et al. [2004], the computation of

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

2 hP

2 hP

2 hP

Sill = 0.5 Sill = 1.0 Sill = 2.0

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10.4

0.6

0.8

1

4.0YI

10.0YI

20.0YI

ttt

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

66

these terms can have a significant effect on computational time during the forward step.

Indeed we find that, in our case, the ME- and MC-based approaches require 13,650 s and

0.375×NMC s, respectively, of CPU time on 10 parallel 2.80 GHz Intel i7-860 processors. It

follows that CPU time associated with ME-based EnKF is comparable to that associated with

35,000NMC MC-based assimilations. Considering that in our test cases the MC approach

converges within 10,000NMC , which however requires a tenfold increase in NMC to

ascertain convergence (i.e., 100,000NMC realizations are required to support convergence

at 10,000NMC ), we conclude that ME-based EnKF constitutes a viable alternative to the

traditional MC-based approach not only in terms of quality but also in terms of computational

efficiency. We believe that it should be possible to improve the computational efficiency of

ME-based EnKF further in the future.

3.5 Conclusions

In this Chapter we described a novel inversion algorithm for updating in real time

model parameters and system states in a groundwater flow model on information about log-

conductivity and transient head data collected in a randomly heterogeneous aquifer.

The methodology combines approximate form of stochastic transient groundwater

flow moment equations with the Kalman Filter algorithm and allows sequential updating of

parameters and system states without a need for computationally intensive Maximum

Likelihood (ML) or Monte Carlo (MC) analyses. We explored the feasibility and accuracy of

the novel inversion scheme and compared its performance and computational efficiency

against the common MC-based EnKF on nine different synthetic examples characterized by

different degree of heterogeneity.

We showed that embedding the MEs in the KF scheme allows computationally

efficient real time estimation of system states and model parameters avoiding the drawbacks

which are commonly encountered in traditional MC-based applications of EnKF. Our results

67

confirm an earlier finding by others that a few hundred MC simulations are not enough to

overcome filter inbreeding issues, which have a negative impact on the quality of log-

conductivity estimates as well as predicted heads and associated estimation variances. ME-

based EnKF obviates the need for repeated MC simulations and was demonstrated to be free

of inbreeding issues.

69

4. EnKF with complex geology

Here we present a methodology conducive to updating geological and petrophysical

properties of a collection of reservoir models characterized by a complex structural

(geological) architecture within the context of a history matching procedure based on the

Ensemble Kalman Filter (EnKF) approach. The associated computational algorithms are

illustrated in all their relevant details.

The (heterogeneous) spatial distribution of facies is handled by means of a Markov

Mesh (MM) model. The latter is adopted because of (a) its ability to reproduce detailed facies

geometries and spatial patterns, and (b) its consistency with the probabilistic Bayesian

framework at the basis of EnKF.

In Section 4.1 we start by outlining a formal definition of the MM model. Section 4.2

illustrates a novel inversion scheme which allows conditioning the geological and the

petrophysical properties of a collection of reservoir realizations on a set of measured

production data. The results obtained on a two-dimensional synthetic test case and a

comparison with those obtained using a standard EnKF are presented in Section 4.3. In

Section 4.4 we discuss our results.

4.1 Markov Mesh (MM) Model

We consider a finite, regular grid G comprising eN elements arranged in two or more

dimensions and define a sequence of regular grids G1, G2, …GL. Each of these grids is a

subset of cells in G such that

1 2 L G G G , L G G (4.1)

With this notation, the coarsest and the finest grids are respectively denoted as G1 and GL.

We further introduce the disjoint sets of elements, H1, H2, …HL, defined as

l lH G 1l (4.2)

70

1\l l lH G G 2,3, ,l L (4.3)

Each set lH , 1l consists of the cells of lG that are not appearing in the coarser grid level,

1lG . Figure 4.1 depicts the grid refinement that has been selected in our implementation.

Note that the disjoint sets lH , 1, ,l L are defined such that

1

l

l k

k

G H (4.4)

Figure 4.1. Sequence of grids (G1, G2 and G3) onto which the rectangular domain G is

decomposed within a multi-grid approach. Colors indicate the disjoint sets H1, H2 and H3.

Cell numbering corresponds to element indices while arrows indicate the direction followed

by the simulation path.

13 15

2 14

1 3

52 56 58 60

13 53 15 59

19 54 57

2 20 14 55

16 18 21

1 17 3 22

232 236 238 240

58 233 60 239

234 237

13 15 59 235

19

70

2 71 20 14

64 68 72

16 65 18 73 21

61 63 66 69 74

1 62 17 67 3 75 22

1: H 2: H 3: H

1 1G H

2 1 2 G H H

3 1 2 3 G H H H G

71

We then define a path scanning all the elements of the grid G . This path initially visits all

elements of the coarsest subset, 1H . It then scans all the elements of 2H and proceeds until

the last subset, LH , is reached. Each of the subsets, lH , 1 l L , is scanned starting from

the elements at the top grid layer and proceeding through all the elements of the underlying

layers until the bottom of the domain. In each layer the path starts from the left-bottom corner

and proceeds by paths parallel to the domain diagonal. The sequence of elements defined by

the scanning path allows assigning a label 1, , ei N to each cell of the grid G, as

sketched in Figure 4.1. Note that the selections of the grid refinement and of the path are not

unique, and diverse choices are admissible.

We assign an integer value, 1, ,is K , corresponding to a facies type (K being the

number of facies occurring in the domain), to each element of the grid. An indicator variable

k

is is then defined as

1 if , 1, ,

0 otherwise

k i

i

s k k Ks

(4.5)

Assuming that the vector s containing the facies identifiers, si, in all grid elements forms a

random field, one can describe the probability of observing a given facies distribution within

domain G through the discrete joint probability mass function s . The latter can be written

as a product of conditional probabilities

1

eN

i j i

i

s

s s (4.6)

Here, j is is the vector containing the facies identifiers over all grid elements with index

j i . Let i be a subset of the cells identified by j i and let the vector i

s denote the

facies distribution within the elements of i . Then, the Markov property of the random field s

is expressed by

72

ii j i is s s s (4.7)

and the joint probability distribution defined in (4.6) can be simplified as

1

e

i

N

i

i

s

s s (4.8)

In our implementation, the subset i contains all the elements j i that are contained in a

square of arbitrary size and is centered at i. Figure 4.2 provides a graphical depiction of a

snapshot of a simulation at the time element i is visited while the algorithm is progressing.

Figure 4.2. Snapshot of a simulation obtained by freezing the algorithm while grid element i

is visited. Colored cells are identified by index j < i and have already been visited by the

simulation path. Red, yellow and green elements belong to the disjoint sets H1, H2 and H3,

respectively. Blocks contoured by the solid line belong to the conditional neighborhood (i)

of element i. Grey cells are identified by index j > i and will be simulated after element i.

At the core of the method is the way we express the conditional probability included

in (4.8). We employ the method proposed by Stien and Kolbjornsen [2011] where the target

probability is function of a linear combination of coefficients, in the form

i

73

1 1

1

2

2

1

1

exp

expi

Kk k

i i l i

k

i Kk

i l ik

s

s

z θ

s

z θ

(4.9)

Here, iz is a vector of size 1P and contains a set of coefficients which depend on the

facies distribution over i

s ; 1k

l iθ is a vector of unknown parameters; and the function l i

allows identifying the subset of cells l iH to which the element with label i belongs to (i.e.,

l iiH ). In our implementation, we follow for the elements of the vector iz the same

definitions proposed by Kolbjornsen et al. [2013], where each coefficient is equal to 0 or 1

depending on whether a predefined pattern of facies is reproduced over i or not.

From definition (4.9) it follows that there exists one vector of model parameters for

each facies and for each grid level, for a total number of L K vectors. Estimating these

parameter vectors is accomplished via Maximum likelihood [Stien and Kolbjornsen, 2011] by

means of an appropriate training image which contains all the important features one desires

to preserve in the collection of realizations.

The MM model introduced above allows drawing a facies realization by following the

sequential path defined previously. For each cell, a value identifying a facies is drawn

according to the conditional probability expressed by (4.9). After all elements have been

visited, the resulting facies generation follows the joint probability distribution (4.8).

Application of the methodology to field settings requires that all members of a

collection of generated spatial fields of facies share the same volumetric facies proportions.

These values are in fact related to the quantity and mode of displacement of oil in a reservoir

and have a significant impact on strategic planning of production. It is well known that the

statistical framework offered by the MM (a) does not guarantee that the facies proportion

74

observed in the training image is preserved in the generated realizations and (b) does not

include any tuning parameters which might allow setting a given value of facies proportions.

In this work we propose to overcome this drawback by adopting an

acceptance/rejection (AR) sampling method. Let

1

1 eNk k

i

ie

s sN

1, ,k K (4.10)

be the volumetric proportion of facies k over a field composed by eN elements and

characterized by the occurrence of K distinct facies, and let

1

K

s

s

s (4.11)

be the vector containing all volumetric proportions evaluated through (4.10). We note that ks

(k = 1, ..., K) is the sum of eN random correlated variables (see (4.10)). Therefore, ks is also

a random variable, and s is a random vector. Generating a spatial field of facies through the

sequential algorithm described above allows obtaining a realization of s , that we interpret as

a realization drawn from its prior probability density function. In this framework, we can (a)

consider additional information about s in the form of a likelihood function and (b) draw

samples from the corresponding posterior density using a traditional AR algorithm.

As an example, we consider the case where the additional information (which is

available, e.g., in the form of seismic data and/or expert opinion) suggests that s follows a

multi-normal distribution with mean vector μ and covariance matrix Σ (i.e., ,Ns μ Σ ).

Drawing samples from the resulting posterior density can be accomplished using the

AR algorithm described according to the following two steps:

Step 1: generate an unconditional realization of a spatial field of facies and compute

the corresponding vector s .

75

Step 2:

Compute the quantity , , max , ,N N ss μ Σ s μ Σ .

Draw a realization u from a uniform distribution over the interval 0,1 .

Accept the current realization of facies if u , otherwise reject it.

Return to Step 1.

4.2 Theoretical formulation

In this Section we describe a novel inversion scheme for conditioning a collection of

facies and log-permeability spatial fields on available production data. For simplicity of

notation we limit the illustration to the case where only two fluid phases (oil and water) are

displaced in the host porous domain. Note that the methodology can readily be extended to

systems characterized by a three-phase fluid flow.

We consider the model vector

wat

Y

py

S

w

(4.12)

Here, Y , p , watS are vectors containing static (e.g., the log-permeability values) and

dynamic (e.g., pressure and water saturation values) variables at eN numerical block centers,

respectively, while w contains wN values of production data (e.g., fluid flow rates at well

locations or bottom hole pressure values). Vector y is therefore of size 3y e wN N N .

According to the notation introduced in Chapter 2, we denote by 1, ku T y the model

vector y at time 1kT conditioned on measurements available up to time 1kT . Adoption of a

Monte Carlo framework enables one to approximate the probability density function (pdf) of

76

1, ku T y by means of a collection of NMC equally likely model realizations of the system,

defined as

1

1

,

,

k

k

u T

u T

j

wat

j

Y

py

S

w

1, ,j NMC (4.13)

The non-linear operator allows calculating the corresponding forward vectors at time kT

as

1, ,k kf T u T

j jy y 1, ,j NMC (4.14)

which enables representing the pdf of , kf T

y . The function in (4.14) represents a

multiphase flow model the solution of which is performed within time interval 1k kT T by

setting the log-permeability field to 1, ku T

jY and considering as initial conditions the pressure

and water saturation fields contained in vectors 1, ku T

jp and 1,

,ku T

wat jS , respectively. The boundary

conditions for the flow simulation are here assumed to be known without uncertainty. Since

we consider the case where the static variables (i.e., the log-permeabilities) do not change

during a flow simulation, equality 1, ,k kf T u T

j jY Y always holds. We also note that the

particular value of the vector 1, ku T

jw has no effect on the flow simulation.

We introduce the collection of vectors

, kf T

js 1, ,j NMC (4.15)

describing our knowledge about the spatial distribution of the facies over the domain at time

kT conditioned on measurements available up to time 1kT , and the vector

77

,1

,

k

k

f T

f T

j K

j

s

vs

w

1, ,j NMC (4.16)

Here, each vector , , kk f T

js describes the prior spatial distribution of the indicator variables

associated with each facies 1, ,k K in a given realization j of the ensemble.

The objective of the data assimilation algorithm is to calculate the updated vectors,

, ku T

jy and , ku T

js , conditioned on measured values of production data, kTw , at time kT . Note that

these measurements correspond to randomly perturbed values of kTw and are contained in

vector kTd as defined in (2.2). The data assimilation scheme is the developed through a four-

step algorithm described in details in the following.

Step 1

In the first step we consider the sample average of the quantity defined in (4.16)

, ,

1

1k k

NMCf T f T

j

jNMC

v v (4.17)

and, instead of updating each individual member of the collection, , kf T

jv , as done in the EnKF

context (see (2.23) - (2.29)), we use the corresponding averaged equation defined in (2.30) to

evaluate the updated vector , ku Tv through

1

, , , , ,ˆ ˆ ˆk k k k k k k k k k ku T f T f T T T f T T T T T f T

vv vv εεv v Σ H H Σ H Σ d H v (4.18)

Here, the quantities kTH , kT

d and ˆ kT

εεΣ follow the same definitions respectively given in

(2.2), (2.28) and (2.29) of Chapter 2, while ,ˆ kf T

vvΣ is defined as

, , , , ,

1

1ˆ1

k k k k k

NMCf T f T f T f T f T

i i

iNMC

vvΣ v v v v (4.19)

78

Step 2

The updated vector , ku Tv evaluated at step 1 is now employed to draw a new

collection of updated realizations of spatial fields of facies, , ku T

js , 1, ,j NMC .

The updating algorithm visits each element of the grid following the same sequential

path used for the generation of each facies field realization. When the algorithm has

progressed to reach a generic element i, we assign a new facies identifier to this element for

each updated grid j of the collection (,

,ku T

i js ) according to

1 1 1

1

2 1

2

, ,

1, ,

1

exp

exp

k

k k

i

Kk u T k k

i i l i l i

ku T u T

i Kk k

i l i l ik

s

s

z θ λ

s

z θ λ

(4.20)

where the values of the vectors 1k

l iλ are tuned to satisfy

, , , ,

,

1

1k k

NMCk u T k u T

i j i

j

s sNMC

1, ,k K (4.21)

The term appearing on the right hand side of (4.21) is a component of the updated vector

, ku Tv (4.18). One can note that (4.20) reduces to (4.9) when

1k

l iλ equals ,1P0 . In practice,

(4.20) - (4.21) are used to update the probability of the MM reported in (4.9), iis s .

which is not conditioned on the production data, into the probability , ,k k

i

u T u T

is s ,

conditioned on the information provided by the measurements available up to time kT .

In our implementation of the algorithm we consider 1k

l iλ as vectors of constant

components, differing from vector to vector (i.e., 1 1

,1λk k

Pl i l iλ 1 ,

1λk

l i and ,1P1 respectively

being an unknown scalar and a vector of size P with unit components). Working with this

assumption, the conditional probability (4.20) is a monotonic function of 1λ

k

l i. Estimation of

79

1λ

k

l i is readily accomplished through a bisection algorithm. This simple strategy has been

seen to yield good results, in the sense that the updated spatial distribution of facies maintains

spatial correlations which are similar to those observed in the original grids before the

assimilation of production data. Forms of 1k

l iλ with increased complexity can be taken into

account in the procedure.

Step 3

Up to this point we have evaluated the vectors , ku T

js , 1, ,j NMC , which are the

realizations of the spatial fields of facies conditioned on the measurements available up to

time kT . Before updating the model realizations , kf T

jy , 1, ,j NMC , in this third step we

iterate the flow simulations performed within the time interval 1k kT T upon replacing the

model parameters and the system states updated at time 1kT (namely 1, ku T

jY , 1, ku T

jp and

1,

,ku T

wat jS ) through the corresponding auxiliary fields ( 1', ku T

jY , 1', ku T

jp and 1',

,ku T

wat jS ) to render them

consistent with the underlying facies fields updated through steps (1) - (2). Each auxiliary

updated realization is then used as input vector in the flow simulation expressed by (4.14) to

obtain the auxiliary forward vectors at time kT , ', kf T

jy . The latter will be further conditioned

on the measurements available at time kT during step (4) of our algorithm. In practice, one

can note that step 3 allows alleviating the appearance of unphysical updates in the model state

vectors.

To maintain the same statistics (mean and covariances) of the log-permeability fields

contained in 1, ku T

jY also in the auxiliary log-permeability fields, 1', ku T

jY , one employs

1 1 1 1 1 11

', ', ', , , ,k k k k k ku T u T u T u T u T u T

j j j j j j

Y Y L L Y Y 1, ,j NMC (4.22)

Entries 1,

,ku T

m jY of the vector 1, ku T

jY are computed by

80

1 1

1

, , , , , ,

, , , ,

1 1

k m k m k k

NMC NMCu T k f T k f T u T

m j m l m l m l

l l

Y s s Y

1, , em N (4.23)

In (4.23), mk is the facies identifier at element m of the forward facies grid

, kf T

js (i.e.,

1, ,mk K )

,

,kf T

m m jk s (4.24)

It follows that the indicator variable , ,

,m kk f T

m ls in (4.23) is equal to 1 or 0 according to whether

the facies identifiers over element m of ensemble members l and j coincide or not.

Similarly, an entry m of vector 1', ku T

jY , 1',

,ku T

m jY , is evaluated by

1 1

1

', , , , , ,

, , , ,

1 1

k m k m k k

NMC NMCu T k u T k u T u T

m j m l m l m l

l l

Y s s Y

1, , em N (4.25)

In (4.25), mk is defined as

,

,ku T

m m jk s (4.26)

Matrices 1, ku T

jL and 1', ku T

jL in (4.22) are the Cholesky decompositions of 1,

,ku T

j

YC and 1',

,ku T

j

YC ,

respectively (i.e., they satisfy the equalities 1 1 1, , ,

,k k ku T u T u T

j j j

Y

L L C and

1 1 1', ', ',

,k k ku T u T u T

j j j

Y

L L C ). Entries 1,

,,

ku T

jm n

YC and 1',

,,

ku T

jm n

YC of matrices 1,

,ku T

j

YC and 1',

,ku T

j

YC ,

respectively, are computed by means of

1 1 1 1 1

1

, , , , , , , , , , , , ,

, , , , , , , , ,,

1 1

1k m k n k m k n k k k k k

NMC NMCu T k f T k f T k f T k f T u T u T u T u T

j m l n l m l n l m l m j n l n jm n

l l

s s s s Y Y Y Y

Y

C

1, , em N , 1, , en N (4.27)

1 1 1 1 1

1

', , , , , , , , , , ', , ',

, , , , , , , , ,,

1 1

1k m k n k m k n k k k k k

NMC NMCu T k u T k u T k u T k u T u T u T u T u T

j m l n l m l n l m l m j n l n jm n

l l

s s s s Y Y Y Y

Y

C

1, , em N , 1, , en N (4.28)

81

Following the procedure described for log-permeabilities (embodied by (4.22) - (4.28)), we

modify the values of pressure and water saturation in 1, ku T

jp and 1,

,ku T

wat jS in the corresponding

auxiliary vectors ', 1ku T

j

p and

', 1

,ku T

wat j

S .

The updated auxiliary vectors are then used for the evaluation of the auxiliary forward

spatial fields at time kT through

1', ',k kf T u T

j jy y 1, ,j NMC (4.29)

Step 4

We propose to evaluate the updated state vectors ,u T

jy at time kT by means of

, ', ', ', ',

, ,ˆ ˆ ˆk k k k k k k k k k ku T f T f T T T f T T T T T f T

j j j j j j

yy yy εεy y Σ H H Σ H Σ d H y 1, , ej N (4.30)

where the entries ',

,,

ˆ k kf T T

jm n

yyΣ H of matrix ',

,ˆ k kf T T

j

yyΣ H are computed as

1

, , , ,', ', ', ', ',

, , ,, ,, 1 1

ˆ 1k kk k m m k k k k

NMC NMCk u T k u Tf T T f T f T f T f T

j m l m n l nm l m lm n l l

s s y y w w

yy

Σ H

1, ,3 em N , 1, , wn N (4.31)

1, , , ,', ',

,, ,1 1

k kk m m k

NMC NMCk u T k u Tf T f T

m m lm l m ll l

y s s y

1, ,3 em N (4.32)

1, , , ,', ',

,, ,1 1

k kk m m k

NMC NMCk u T k u Tf T f T

n n lm l m ll l

w s s w

1, , wn N (4.33)

Subscript m in (4.31) - (4.33) indicates the label of the grid element to which the m-

entry of the state vector belongs (e.g., consistently with definitions given in (4.12), m* equals

m, em N or 2 em N according to whether the m-th element of the state vector corresponds

to a value of log-permeability, pressures or water saturation, respectively). Equations (4.30) -

(4.33) allows updating the model vector of a given realization by estimating the sample cross-

covariance between production data and the model variable associated with a given block in

82

the reservoir only on the basis of the members of the collection where the very same facies

value of the element considered in the target model realization occurs. This strategy is

adopted because when the reservoir model is characterized by the presence of distinct facies

with unknown spatial distribution the scatter plots between petrophysical parameters (or

system state variables) and production data are typically arranged in clusters, each of which is

associated with a particular facies identifier. A schematic representation of the data

assimilation algorithm described is depicted in Figure 4.3.

83

Figure 4.3. Flow chart describing the proposed data assimilation algorithm.

,

,

k

k

f T

f T

j

wat

j

Y

py

S

w

0

0

T

T

j

wat

j

Y

py

S

w

0

0

1T

T

j K

j

s

vs

w

Tk=T0

, ku T

js

Step 1

1', ku T

jY

', 1ku T

j

p

', 1

,ku T

wat j

S

Updated

facies fields

Auxiliary forward

fields

',

',

k

k

f T

f T

j

wat

j

Y

py

S

w

Flow simulator

Input:

- logk fields:

- initial conditions:', 1 ', 1

,,u T u T

j wat j

p S

Step 3

0T

jY

0Ts

0T

jp

0

,

T

wat jS

Geostatistical

tools

Equilibrium

conditions

Flow simulator

Input:

- logk fields:

- initial conditions:

0T

jY

0 0

,,T T

j wat jp S

k = k +1

,1

,

k

k

f T

f T

j K

j

s

vs

w

, ku Ts

Updated mean

facies field

Step 2

1', ku T

jY

Updated

logk, pressure and

saturation fields

,

,

k

k

u T

u T

j

wat

j

Y

py

S

w

Flow simulator

Input:

- logk fields:

- initial conditions:, ,

,,k ku T u T

j wat jp S

, ku T

jY

k = k +1

Step 4

84

4.3 Synthetic example

We illustrate the data assimilation scheme described in Section 4.2 and explore its

feasibility and accuracy by way of a transient three-dimensional flow example. The flow

domain is of size 4800 m × 3200 m × 5 m along directions x1, x2, and x3, respectively, and is

discretized into grid cells of uniform size of 50 m × 50 m × 5 m (i.e., 96 × 64 × 1 blocks,

yielding 6144eN ). Note that the grid blocks in this example are arranged into a single

horizontal layer and the ensuing spatial distributions of parameters and state variables are

modeled as two-dimensional random fields.

The reference model of the reservoir is formed by two different facies (i.e., 2K ).

The spatial distribution of the facies has been obtained by the Markov Mesh (MM) procedure

described in Section 4.1 and by estimating the parameters of the MM from the training image

depicted in Figure 4.4.

Figure 4.4. Training image employed for the estimation of the parameters in the MM model

example.

The domain is then populated with the petrophysical properties (i.e., the log-permeability),

yielding the field depicted in Figure 4.5. Log-permeabilities are generated at element centers

200 400 600 800 1000

200

400

600

800

1

2

85

using a sequential Gaussian simulator, with the statistical parameters listed in Table 4.1.

Figures 4.6 - 4.7 display the sample histograms of the log-permeability (Y ) and the

corresponding permeability values (k) in the reference field, respectively. In our example we

consider a permeability tensor Κ that is diagonal and displays a vertical anisotropy. It can be

written as the product between the scalar permeability k and a scaling tensor

1 0 0

0 1 0

0 0 0.1

k

Κ (4.34)

Porosity is treated as a deterministic constant over the numerical grid and set equal to 0.2,

yielding a pore volume of 6 315.36 10 m .

Figure 4.5. Log-permeability distribution and well locations in the reference model. Injectors

and producers are indicated as Ii and Pi (i = 1, 2, 3), respectively.

Mean Covariance function Sill Nugget Range x1 [m] Range x2 [m]

Facies 1 7.1 Exponential 0.10 0.0 800.0 400.0

Facies 2 4.6 Exponential 0.10 0.0 80.0 80.0

Table 4.1. Parameters adopted for the generation of the reference log-permeability field.

Permeability values are expressed in mDarcy.

I1

I2

I3

P1

P2

P3

20 40 60 80

20

40

602

3.75

5.5

7.25

9

1000

2000

3000

1000 2000 3000 4000

refY

1x m

2x m

86

Figure 4.6. Histogram of the log-permeability values in the reference model.

Figure 4.7. Histogram of the permeability values in the reference model.

We simulate the deterministic transient flow caused by the joint action of 6 wells: 3

producer wells (P1, P2 and P3) with constant Bottom Hole Pressure (BHP) equal to 30 bar

and three injectors (I1, I2 and I3), injecting water at a constant rate of 30, 230 and 30 m3/day,

respectively. As shown in Figure 4.5, only the two wells I2 and P2 lie in the region of the

domain characterized by the highest permeabilities and are connected by high permeability

channel. A constant initial oil saturation of 0.8 is imposed on the system, rendering an initial

total volume of oil of 6 312.29 10 m . Initial pressure is set to a uniform value of 100 bar

while all external domain boundaries are impervious. Flow is simulated for a time period of

3 4 5 6 7 8 90

0.2

0.4

0.6

0.8

1

Rel

ati

ve

freq

uen

cy

ln k

0 1000 2000 3000 4000 50000

0.0005

0.001

0.0015

0.002

0.0025

Rel

ati

ve

freq

uen

cy

mDarcyk

87

1200 days using the commercial software ECLIPSE developed by Schlumberger. The

dynamics of this reference model are strongly influenced by the underlying facies distribution

and are typical of a water-flooding environment in which the volume of the injected water is

mainly displaced along the meandering channel.

We sample the reference production curves at twenty times, separated by a fixed lag

of 60 days (i.e., kT 60, 120, …1200 days) and perturb these with a white Gaussian noise

having standard deviation d = 5 3 1m day and 5 bar for flow rates and pressures,

respectively, reflecting measurement errors as described in (2.2). In our example the vector

kTd introduced in (2.2) contains these perturbed data while the covariance matrix of the

corresponding measurement errors, kT

εεΣ , is diagonal with entries equal to

2

d . The temporal

behaviors of the reference production curves and of the corresponding conditioning

measurements employed during the assimilation are depicted in Figures 4.8 - 4.9. Reference

values of water production rate are zero at all assimilation steps and therefore are not

included in these figures.

Figure 4.8. Reference (solid lines) and measured (symbols) values of Well Bottom Hole

Pressure (BHP) versus time for injection wells I1 (blue), I2 (green) and I3 (red).

0 200 400 600 800 1000 1200120

140

160

180

200

time days

WBHP

bar

88

Figure 4.9. Reference (solid lines) and measured (symbols) values of Well Oil Production

Rate (WOPR) versus time for production wells P1 (blue), P2 (green) and P3 (red).

At initial time T0, we generate a collection of spatial fields of facies through the MM

model by using the same parameters employed for the reference field. These realizations are

also conditioned on the value of the volumetric proportion of facies 1 observed in the

reference model (the latter being equal to 0.34) following the acceptance/rejection procedure

described in Section 2.2. Each facies field is then populated by the log-permeability values

using a Gaussian simulator with the same variogram functions and parameters used for

generating the reference field. Following this procedure, we implicitly assume a perfect

knowledge of the statistical models describing the spatial distribution of the geological and of

the petrophysical properties in the reservoir. This allows testing the feasibility and accuracy

of the proposed algorithm without introducing additional sources of uncertainty in our

analysis. We compare the performances and accuracies of the inversion algorithm described

in Section 4.2 (denoted in the following as Facies-EnKF) against the traditional EnKF. We

consider 500NMC and start the assimilation with the same collection of model

realizations for both methodologies.

Figures 4.10 - 4.11 respectively compare the spatial distribution of the mean and the

variance of the log-permeability fields obtained at various assimilation steps using the two

0 200 400 600 800 1000 12000

50

100

150

200

250

time days

3

WOPR

m day

89

approaches. During the earliest assimilation steps the estimated mean fields obtained with the

two approaches are similar and reflect the plausibility for the meandering channel to occur

both within the upper and the lower parts of the domain. During the subsequent updating

steps, when additional data are assimilated into the model, estimates of log-permeability

obtained with Facies-EnKF evolve toward a pattern which is similar to that of the reference

field and enables a correct identification of the position of the channel. Note that the

production curves of injectors I1 and I3 play a key role towards an appropriate facies

identification. This is so because these two wells inject the same water flow rate but work at

different pressure values, as shown in Figure 4.8, suggesting that one of the two (i.e., I1) is

characterized by a lower injectivity and should be located further from the high permeable

channel. The estimated variance field obtained after the latest assimilation step using Facies-

EnKF suggests that the calibrated values of Y are characterized by the highest uncertainty at

locations corresponding to the boundary of the identified channel. These results are not

mirrored by the fields calibrated through EnKF, in which the estimated regions associated

with highest permeabilities correspond only in part to the high-permeable pattern displayed in

the reference field. Figure 4.10 shows that the EnKF algorithm identifies correctly the

presence of high/low permeable regions around each well without capturing the global field

architecture. Moreover, EnKF does not allow preserving the correct geological setting in the

calibrated realizations. This is evident from Figure 4.12, which displays five selected

realizations of log-permeability updated after the latest assimilation step using the two

methodologies. On the contrary, the fields estimated through Facies-EnKF preserve the

correct architecture. By visual inspection, one can also note that these fields tend to show a

degree of spatial variability of log-permeabilities which is similar to the one characterizing

the reference model.

90

Time step Facies-EnKF EnKF

0

1

3

6

12

16

20

Figure 4.10. Estimates of mean log-permeability at initial time and at 6 assimilation steps.

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

91

Time step Facies-EnKF EnKF

0

1

3

6

12

16

20

Figure 4.11. Estimation variance of log-permeability at initial time and at 6 assimilation steps.

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

0

0.5

1

1.5

2

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

0

0.5

1

1.5

2

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

0

0.5

1

1.5

2

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

0

0.5

1

1.5

2

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

0

0.5

1

1.5

2

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

0

0.5

1

1.5

2

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

0

0.5

1

1.5

2

92

Relization

n. Facies-EnKF EnKF

100

200

300

400

500

Figure 4.12. Five selected realizations of the updated log-permeability field after the latest assimilation step.

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

I1

I2

I3

P1

P2

P3

2

4

6

8

93

These results are also confirmed by the temporal behaviors of YE and YV (as defined

in (3.37) and (3.38)) displayed in Figure 4.13 - 4.14, respectively. Values of YE obtained

through Facies-EnKF are always smaller than those obtained using a traditional EnKF

algorithm. The curves of YV obtained through the two methodologies are similar. However,

while the EnKF results in a monotonically decreasing trend of the average estimation

variance, the curve obtained with Facies-EnKF is characterized by the appearance of some

fluctuations.

Figure 4.13. Average absolute difference, YE , between estimated and reference Y values

obtained with Facies-EnKF (solid line) and EnKF (dashed line).

Occurrence of filter inbreeding in the performed assimilations is checked by plotting

the temporal behavior of 2 YP in Figure 4.15. Values of 2 Y

P obtained with Facies-EnKF are

always larger than those obtained with a traditional EnKF, reflecting the highest quality of

the Y estimates based on Facies-EnKF. Both approaches display a decreasing temporal trend

of 2 YP . This can be attributed to a systematic underestimation of the error variance in time

and can be due to the occurrence of spurious covariances in the empirical covariance matrix,

which is based on only 500 MC realizations. It is nonetheless remarkable that the dashed

0.5

0.7

0.9

1.1

0 200 400 600 800 1000 1200

YE

time days

94

curve displayed in Figure 4.15 and obtained with EnKF is characterized by a much higher

decreasing rate than the one obtained through Facies-EnKF, suggesting that the proposed

approach is effectively capable to attenuate the occurrence of inbreeding effects during the

assimilation.

Figure 4.14. Average estimation variance ( YV ) of Y obtained with Facies-EnKF (solid line)

and EnKF (dashed line).

Figure 4.15. 2 Y

P versus time obtained with Facies-EnKF (solid line) and EnKF (dashed

line).

0.4

0.6

0.8

1.0

1.2

1.4

0 200 400 600 800 1000 1200

time days

YV

0.80

0.85

0.90

0.95

1.00

0 200 400 600 800 1000 1200

2 YP

time days

95

We finally analyze the ability of the updated model realizations to predict the forecast

production during an additional period of 2400 days after the latest assimilation time. This

analysis is performed by re-running the flow simulation for all calibrated log-permeability

fields from time 0 and for a total simulation period of 3600 days. We explore two distinct

scenarios differing form each other in the flow configuration imposed during the additional

time period of 2400 days. These simulations mimic a procedure which is commonly adopted

in the management of a reservoir where measurements acquired until a given time (in our

example, until the first 1200 days of production) are used to build a calibrated model of the

reservoir. The latter is then typically employed to (a) investigate the future production under

diverse scenarios and (b) select the most efficient development strategy amongst a range of

plausible choices. In this context, we explore two scenarios. In our first scenario, the same

flow setting imposed in the course of the assimilation time is maintained also during the

additional simulation period. A second study is performed by considering the presence of two

additional wells (i.e., one producer and one injector) that become operative after 1200 days.

Figures 4.16 - 4.17 depict the production curves corresponding to the bottom hole

pressure at injectors and to the production rates of oil and water at the production wells

obtained for the first scenario. Predicted and reference water flow rates at wells P1 and P3 are

not displayed because they are zero at all time steps. These figures show that the predicted

pressure curves and oil production rates obtained from the two set of calibrated fields are

both in good agreement with the corresponding reference production curves (which have

been obtained upon relying on the true, reference reservoir model). It is remarkable that the

model realizations calibrated through EnKF provide such a high quality prediction despite the

observation that the corresponding log-permeability distributions do not honor the correct

geology architecture of the reference reservoir (see Figure 4.12). We also note that the water

flow rates at well P2 predicted through Facies-EnKF is characterized by a smaller uncertainty

96

when compared to the estimation provided by EnKF. This is also demonstrated in Figure

4.18, which compares the sample histograms of the predicted water flow rate at the final

simulation time (i.e., 3600 days) obtained with the two approaches. The same observation

holds also for the predicted field oil production, FOPT. While both data assimilation

approaches allows obtaining a good match between the estimated curves and the reference

FOPT production history (see Figure 4.19), the prediction obtained through EnKF at the final

time is characterized by the largest uncertainty (Figure 4.20).

Facies-EnKF EnKF

I1

WBHP

bar

I2

WBHP

bar

I3

WBHP

bar

time days time days

Figure 4.16. Time dependence of WBHP values related to the injection wells for the collection of models

updated through Facies-EnKF and EnKF (solid grey) during test scenario 1. Corresponding mean (solid

black), 10th

and 90th

percentile (dashed black) are also reported. Red curve indicates the reference model

solution.

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

97

Facies-EnKF EnKF

P1 3

WOPR

m day

P2 3

WOPR

m day

P3 3

WOPR

m day

P2 3

WWPR

m day

time days time days

Figure 4.17. Time dependence of WOPR and WWPR values related to the production wells for the

collection of models updated through Facies-EnKF and EnKF (solid grey) during test scenario 1.

Corresponding mean (solid black), 10th

and 90th

percentile (dashed black) are also reported. Red curve

indicates the reference model.

0 1200 2400 36000

50

100

0 1200 2400 36000

50

100

0 1200 2400 36000

100

200

300

400

500

0 1200 2400 36000

100

200

300

400

500

0 1200 2400 36000

50

100

0 1200 2400 36000

50

100

0 1200 2400 36000

50

100

150

200

0 1200 2400 36000

50

100

150

200

98

Facies-EnKF EnKF

3WWPR m day

3WWPR m day

Figure 4.18. Histograms of water production rates values predicted at well P2 at time 3600

days during test scenario 1. Vertical red lines indicate corresponding reference values.

Facies-EnKF EnKF

6 3

FOPT

10 m

time days time days

Figure 4.19. Time dependence of FOPT for the collection of models updated through Facies-

EnKF and EnKF (solid grey) during test scenario 1. Corresponding mean (solid black), 10th

and 90th


solution.

Facies-EnKF EnKF

6 3FOPT 10 m

6 3FOPT 10 m

Figure 4.20. Histograms of FOPT values predicted at time 3600 days during test scenario 1.

Vertical red lines indicate corresponding reference values.

0.7 0.8 0.9 10

20

40

60

Rel

ativ

e fr

equen

cy

50 100 150 2000

0.02

0.04

0.06

0.08

0.1

50 100 150 2000

0.02

0.04

0.06

0.08

0.1

0 1200 2400 36000

0.5

1

0 1200 2400 36000

0.5

1

0.7 0.8 0.9 10

20

40

60

Rel

ativ

e fr

equen

cy

0.7 0.8 0.9 10

20

40

60

0.7 0.8 0.9 10

20

40

60

99

As mentioned above, in the second scenario we explore the presence of two additional

wells (i.e., one injector, I4, and one producer, P4) that become operative after time 1200 days.

The locations of these new wells are depicted in Figure 4.21 and are selected upon relying on

the information provided by the mean Y field updated through Facies-EnKF, which displays

in these positions a high probability of occurrence of the high permeability facies. In this

second forecast study, wells I4 and P4 work at a fixed water flow rate of 450 3m day and at

a constant bottom hole pressure of 30 bar, respectively.

Figure 4.21. Mean log-permeability distribution estimated through Facies-EnKF after the

latest assimilation step. Spatial location of additional wells I4 and P4 is also displayed.

Figures 4.22 - 4.23 compare the production curves predicted through the model

realizations obtained with two assimilation approaches. These figures highlight the improved

prediction ability of the log-permeability fields updated through Facies-EnKF with respect to

those calibrated from the assimilation performed with the traditional EnKF approach. All

production curves estimated through Facies-EnKF are characterized by a much smaller

degree of uncertainty than those obtained using EnKF. They also provide a better match of

the reference production values also at the additional wells I4 and P4, as shown by the

I1

I2

I3

P1

P2

P3

I4

P4

20 40 60 80

20

40

602

3.75

5.5

7.25

9

1000

2000

3000

1000 2000 3000 4000

Y

1x m

2x m

100

histograms displayed in Figures 4.24. The improved prediction ability of the model

realizations updated through Facies-EnKF results in a more precise estimation of the total oil

production curve, as shown in Figures 4.25 - 4.26.

Facies-EnKF EnKF

I1

WBHP

bar

I2

WBHP

bar

I3

WBHP

bar

I4

WBHP

bar

time days time days

Figure 4.22. Time dependence of WBHP values related to the injection wells for the collection of models

updated employing Facies-EnKF and EnKF (solid grey) during test scenario 2. Corresponding mean (solid

black), 10th

and 90th


solution.

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

0 1200 2400 36000

100

200

300

0 1200 2400 36000

50

100

150

200

0 1200 2400 36000

100

200

300

400

500

101

Facies-EnKF EnKF

P1 3

WOPR

m day

P2 3

WOPR

m day

P3 3

WOPR

m day

P4 3

WOPR

m day

P2 3

WWPR

m day

P4 3

WWPR

m day

time days time days

Figure 4.23. Time dependence of WOPR and WWPR values related to the production wells for the

collection of models updated employing Facies-EnKF and EnKF (solid grey) during test scenario 2.

Corresponding mean (solid black), 10th

and 90th

percentile (dashed black) are also reported. Red curve

indicates the reference model solution.

0 1200 2400 36000

50

100

0 1200 2400 36000

50

100

0 1200 2400 36000

100

200

300

400

500

0 1200 2400 36000

100

200

300

400

500

0 1200 2400 36000

50

100

0 1200 2400 36000

50

100

0 1200 2400 36000

100

200

300

400

500

0 1200 2400 36000

100

200

300

400

500

0 1200 2400 36000

100

200

300

400

500

0 1200 2400 36000

100

200

300

400

500

0 1200 2400 36000

100

200

300

400

500

0 1200 2400 36000

100

200

300

400

500

102

Facies-EnKF EnKF

I4

WBHP bar WBHP bar

P4

3WOPR m day

3WOPR m day

P4

3WWPR m day 3WWPR m day

Figure 4.24. Histograms of estimated production values at wells I4 and P4 at time 3600 days for test

scenario 2. Vertical red lines indicate corresponding reference values.

0.7 0.8 0.9 10

20

40

60

Rel

ativ

e fr

equen

cy

0 100 200 3000

0.05

0.1

0.15

0.2

0 100 200 3000

0.05

0.1

0.15

0.2

0.7 0.8 0.9 10

20

40

60

Rel

ativ

e fr

equen

cy

0 50 100 1500

0.05

0.1

0.15

0.2

0 50 100 1500

0.05

0.1

0.15

0.2

0.7 0.8 0.9 10

20

40

60

Rel

ativ

e fr

equen

cy

0 100 200 300 400 5000

0.005

0.01

0.015

0.02

0 100 200 300 400 5000

0.005

0.01

0.015

0.02

103

Facies-EnKF EnKF

6 3

FOPT

10 m

time days time days

Figure 4.25. Time dependence of FOPT for the collection of models updated through Facies-

EnKF and EnKF (solid grey) during test scenario 2. Corresponding mean (solid black), 10th

and 90th


solution.

Facies-EnKF EnKF

6 3FOPT 10 m

6 3FOPT 10 m

Figure 4.26. Histograms of FOPT values predicted at time 3600 days during test scenario 2.

Vertical red lines indicate corresponding reference values.

We conclude our analysis by comparing the computational costs of the two

approaches. Solving one assimilation step on a 2.80 GHz Intel i7-860 processor requires

3,400 s and 8,500 s for EnKF and Facies-EnKF, respectively. The proposed algorithm

requires more CPU time than the traditional EnKF because (a) the flow simulations are

iterated and must be solved twice within a single updating step and (b) the facies fields must

be re-generated at each assimilation time, as detailed in Section 4.2.

0 1200 2400 36000

0.5

1

0 1200 2400 36000

0.5

1

0.7 0.8 0.9 10

20

40

60

Rel

ativ

e fr

equen

cy

0.9 1 1.1 1.2 1.3 1.40

10

20

30

40

0.9 1 1.1 1.2 1.3 1.40

10

20

30

40

104

4.4 Conclusions

In this Chapter we present a novel data assimilation scheme that allows the sequential

assimilation of production data into a complex reservoir model for conditioning its geological

and petrophysical properties.

In the proposed algorithm, a Markov Mesh (MM) model is used to describe the spatial

distribution of the facies within a consistent Bayesian framework. This is then integrated into

a history matching procedure which is based on the EnKF scheme. We test the proposed

methodology by way of a synthetic example corresponding to a reservoir within which two

distinct facies are spatially distributed. We analyze the accuracy and computational efficiency

of our algorithm with respect to the standard EnKF both in terms of history matching quality

and prediction ability of the forecast production.

We show that the proposed inversion scheme is conducive to an updated collection of

facies and log-permeability fields which maintain the type of geological setting displayed

prior to updating (i.e., in the example we analyze the updated fields are still characterized by

a single high-permeability channel, whose location, which defines the internal architecture of

the system, is updated as data assimilation progresses in time). On the other hand, the

standard EnKF is not capable of preserving the correct geological scenario.

We test the prediction ability of the realizations obtained through our procedure by

means of two forecast scenarios, in which diverse flow configurations are considered after the

latest assimilation time. In our first scenario, the same flow setting imposed in the course of

the assimilation time is maintained also during the additional simulation period. A second

study is performed by considering the presence of two additional wells (i.e., one producer and

one injector) that become operative after 1200 days. Both approaches yield a good estimation

of the target production values during the first scenario, the predictions provided by standard

EnKF being characterized by the highest degree of uncertainty. The performances of the two

105

methodologies strongly differ when considering the second scenario where our proposed

algorithm outperforms the standard EnKF by providing a superior match between the

reference and the predicted production curves.

107

Appendix A

We derive finite element equations which we employ for the numerical solution of the

moment equations (MEs) presented in Chapter 3.

Our MEs represent an extension of the work of Ye [2002] and Ye et al. [2004]. They

have been derived to allow embedding of MEs into the KF framework by considering the

general case when the initial head field is random and statistically correlated with the log-

conductivity field. In this Appendix we limit our discussion only to the equations that differ

from those of Ye et al. [2004] and are used for the numerical evaluation of the second-order

approximations of residual flux, cross-covariances and head covariances (see (3.24) - (3.29)).

An extensive and detailed derivation of the finite element equations for the solution of the

zero- and second-order mean heads, together with a description of the quotient-difference

algorithm employed for the computation of the inverse Laplace transform can be found in Ye

[2002].

A.1 Second-order residual flux

We discretize the flow domain into eN elements, each having constant log-

conductivity, and use bilinear Lagrange basis function to interpolate Laplace-transformed

heads between grid nodes according to

1

ˆ, ,

gN

n n n

i i

i

h h h

x x x 0,2n (A.1)

where ˆ

,n

h x is the finite element approximation of ,n

h x , gN is the number of

grid nodes, i x is a bilinear Lagrange basis function and n

ih is the n-th order

approximation of the mean transformed hydraulic head, ,n

h x , at node m, being the

Laplace parameter.

108

We interpolate the transformed zero-order Green’s function in a similar way as

0 0 0

,

1 1

ˆ, , , ,

g gN N

i j i j

i j

G G G

x y x y x y (A.2)

where 0

,i jG is the zero-order transformed mean Green’s function at node i in the x-plane

due to a source of unit strength at node j in the y-plane. In addition, we interpolate

2

0K h x y by

22

0 0,

1

gN

k k

k

K h K h

x y x y (A.3)

where 2

0,kK h x is 2

0K h x y evaluated at node k in the y-plane.

Substituting (A.1) - (A.3) into (3.24) yields

2 0 0

,

1 1 1

, , dg g gN N N

G G Y i j i j k k

i j k

K K C G h

x y yr x x y x y x y y y

2 0

0, ,

1 1 1

dg g gN N N

k k S i j i j

k i j

K h S G

xx y y x y y (A.4)

Expressing the domain integrals in (A.4) as a summation of integrals over each grid element,

the residual flux at point ex inside element e can be written as

' '

2 0 ' 0 '' , ' ' '

,

' 1 1 1 1'

, de e e e

e

N M M Mee ee e e e e e e e

G G Y i j i j k k

e i j ke

K K C G h

x y yx xr x x y x y y y

' '2

0 '' ' ' '

0, ,

' 1 1 1 1'

de e e e

e

N M M Meee e e e e e

k k S i j i j

e k i je

K h S G

x x xx y y x y y (A.5)

where eM and

'eM respectively are the number of nodes of elements e and e’, , 'e e

YC is the

covariance between log-conductivity in elements e and e’, 0 'ee

ijG is the zero-order

transformed mean Green’s function at node i of element e due to a source of unit strength at

node j of element e’, and 0 'e

kh is the value of 0

,h y at node k of element e’.

Rearranging terms in (A.5) leads to the compact form

109

' '

2 0 ' 0 '' , ' ' '

,

1 ' 1 1 1

,e e e e

e

M N M Mee ee e e e e e e e

G i G Y i j k jk

i e j k

K K C G h

x x xr x x x y

' ' 20 '' ' ' '

0, ,

1 ' 1 1 1

e e e e

e

M N M Meee e e e e e

i S k i j jk

i e k j

S K h G

x x xx y x (A.6)

where ' 'e e

jk and ' 'e e

jk are defined as

' ' ' '

'

de e e e

jk j k

e

y yy y y (A.7)

' ' ' '

'

de e e e

jk j k

e

y y y (A.8)

Following a similar approach, the domain integral included in the term

2, dn c nR

xr x x x that appears on the right hand side of the system used to solve

for the second-order mean head (see (3.11) of Ye [2002]) is discretized into

2

1

, deN

e e

n n

e e

R

xr x x x (A.9)

Substituting (A.6) into (A.9) yields

' '

0 ' 0 '' , ' ' '

,

1 1 ' 1 1 1

de e e e e

e

N M N M Mee ee e e e e e e e

n G i G Y i j k jk n

e i e j ke

R K K C G h

x xx x

x x y x x

' ' 20 '' ' ' '

0, ,

1 1 ' 1 1 1

de e e e e

e

N M N M Meee e e e e e e

i S k i j jk n

e i e k je

S K h G

x xx x

x y x x x

(A.10)

that can be also written as

' '

0 ' 0 '' , ' ' '

,

1 1 ' 1 1 1

e e e e eN M N M Mee ee ee e e e e e

n G in G Y i j k jk

e i e j k

R K K C G h

x y

' ' 20 '' ' ' '

0, ,

1 1 ' 1 1 1

e e e e eN M N M Meeee e e e e e

in S k i j jk

e i e j k

S K h G

y x (A.11)

110

A.2 Second-order cross-covariance between head and conductivity

We denote as 2,

yju x the cross-covariance

2, ,Khu x y between hydraulic

conductivity at point x and transformed hydraulic head at node yj of the y-plane and

interpolate the zero-order mean Green’s function as

0 0 0

,

1

ˆ, , , ,

g

y

N

i j i

i

G G G

z y z y z (A.12)

Substituting (A.1), (A.3) and (A.12) into (3.25) yields

2 0 0

,

1 1

, , dg g

y y

N N

j G G Y i i i j i

i i

u K K C h G

z zx x z z x z z z

2 0

0, ,

1 1

dg g

y

N N

S i i i j i

i i

S K h G

z x z z z (A.13)

Expressing the domain integral in (A.13) as a summation of integrals over each grid

element allows writing the cross-covariance between hydraulic conductivity at element e and

transformed head at node yj as

' '

2 0 ' 0 '' ' ' '

,

' 1 1 1

,e e e

y y

N M Me ee e e e e e e

j G G Y k j i ik

e i k

u K K C G h

x x z

' ' 2

0 '' ' ' '

, 0,

' 1 1 1

e e e

y

N M Mee e e e e

S k j i ik

e i k

S G K h

z x (A.14)

A.3 Second-order head head covariance

We interpolate the second-order covariance, 2

0 0h h x z , between initial

hydraulic heads at node m assocated with vector position x and vector position z as

2 2

0, 0 0, 0,

1

gN

m m i i

i

h h h h

z z (A.15)

and evaluate the covariance between initial head at node m and the transformed head at node

yj at vector position y, 2

0, ym jh h , upon substituting (A.1), (A.3), (A.12) and (A.15) into

(3.29)

111

2 20 0

0, 0, ,

1 1

dg g

y y

N N

m j i i m k j k

i k

h h h K h G

z zz z z z

2 0

0, 0, ,

1 1

dg g

y

N N

S m i i k j k

i k

S h h G

z z z z (A.16)

The domain integral in (A.16) is then decomposed into a sum of integrals over each grid

element as

2 20 0

0, 0, ,

1 1 1

e e e

y y

N M Me ee

m j m i k j ik

e i k

h h K h h G

z

2 0

0, 0, ,

1 1 1

e e e

y

N M Me ee

S m i k j ik

e i k

S h h G

z (A.17)

Numerically solution of (3.26) - (3.28) is performed by approximating the second-order head

covariance 2

, , ,hC sx y through

2 2 2

,

1

ˆ, , , , , , ,

g

y

N

h h m j m

m

C s C s C s

x y x y x (A.18)

where 2ˆ

, , ,hC sx y is the finite element approximation of 2

, , ,hC sx y and 2

, ,ym jC s

is the covariance between transformed head at node m associated with vector position x and

head at node yj associated with vector position y.

Galerkin orthogonalization of (3.26) yields

2 2 0ˆ, , , , , , dG h Kh nK C s u s h

x x x

x x y x y x x x

2ˆ, , , dS h nS C s

x x y x x

2

0 , dS nS h h s

x x y x x

2 2 0ˆ, , , , , , dG h Kh nK C s u s h

x x

x x y x y x n x x x

1, , gn N (A.19)

By virtue of (3.27) - (3.28), equation (A.19) becomes

112

2 2 0ˆ, , , d , , , dG h n Kh nK C s u s h

x x x xx x y x x x y x x x

2ˆ, , , dS h nS C s

x x y x x

2

0 , dS nS h h s

x x y x x

2 0, , , d

D

Kh nu s h

xx y x n x x x 1, , gn N (A.20)

We interpolate 2

0 ,h h s x y as

22

0 0,

1

,g

y

N

m j m

m

h h s h h s

x y x (A.21)

where 2

0, ym jh h s is the second-order approximation of the covariance between initial

head at node m associated with vector position x and head at node yj associated with vector

position y and is evaluated by taking the inverse Laplace transform of 2

0, ym jh h (A.17).

Substituting (A.1), (A.14), (A.18) and (A.21) into (A.20) yields

2 2 0

,

1 1

, d , dg g

y y

N N

G m j m n j m m n

m m

K C s u s h

x x x xx x x x x x x x

2

,

1

, dg

y

N

S m j m n

m

S C s

x x x x

2

0,

1

dg

y

N

S m j m n

m

S h h s

x x x x

2 0

1

, dg

y

D

N

j m m n

m

u s h

xx x n x x x 1, , gn N (A.22)

Rearranging terms and defining

dnm G m nA K

x xx x x x (A.23)

dnm S m nD S

x x x x (A.24)

2 2

,

1

, d , de

y y y

Ne

nm j j m n j m n

e e

F u s u s

x x x xx x x x x x x x (A.25)

2 0

1

, dg

y

D

N

n j m m n

m

T u s h

xx x n x x x (A.26)

113

where 2,

y

e

ju sx is rendered by taking the inverse Laplace transform of 2,

y

e

ju x

calculated through (A.14). Equation (A.22) can finally be written as

2

,

1

,g

y

N

nm nm m j

m

A D C s

nT 0

,

1

g

y

N

nm j m

m

F h

2

0,

1

g

y

N

nm m j

m

D h h s

1, , gn N (A.27)

115

References

Aanonsen, S.I., G. Nævdal,, D.S. Oliver, A.C. Reynolds, and B. Vallès (2009), Ensemble

Kalman filter in reservoir engineering – a review, SPE J., 14(3), 393-412,

doi:10.2118/117274-PA.

Ahmed, N., T. Natarajan, and K.R. Rao (1974), Discrete Cosine Tranform, IEEE T. Comput.

C-23(1), 90-93, doi:10.1109/T-C.1974.223784.

Alcolea, A., J. Carrera, and A. Medina (2006), Pilot points method incorporating prior

information for solving the groundwater flow inverse problem, Adv. Water Resour.,

29(11), 1678-1689, doi: 10.1016/j.advwatres.2005.12.009.

Anderson, J.L. (2007), An adaptive covariance inflation error correction algorithm for

ensemble filters. Tellus Series, 59(2), 210-224, doi: 10.1111/j.1600-0870.2006.00216.x.

Ballio, F., and A. Guadagnini (2004), Convergence assessment of numerical Monte Carlo

simulations in groundwater hydrology, Water Resour. Res., 40(4), W04603,

doi:10.1029/2003WR002876.

Bianchi Janetti, E., M. Riva, S. Straface, and A. Guadagnini (2010), Stochastic

characterization of the Montalto Uffugo research site (Italy) by geostatistical inversion

of moment equations of groundwater flow, J. Hydrol., 381(1-2), 42-51, doi:

10.1016/j.jhydrol.2009.11.023

Burgers, G., P.J. van Leeuwen, and G. Evensen (1998), Analysis Scheme in the Ensemble

Kalman Filter. Mon. Weather Rev., 126(6), 1719–1724, doi:10.1175/1520-

0493(1998)126<1719:ASITEK>2.0.CO;2.

Chang, H., Zhang, D., Lu, Z. (2010), History matching of facies distribution with the EnKF

and level set parameterization, J. Comput. Phys. 229(20), 8011-8030, doi:

10.1016/j.jcp.2010.07.005.

116

Chen, Y., and D. Zhang (2006), Data assimilation for transient flow in geologic formations

via ensemble Kalman filter, Adv. Water Res., 29(8), 1107-1122,

doi:10.1016/j.advwatres.2005.09.007.

Cohn, S.E. (1997), An Introduction to Estimation Theory, Journal of the Meteorological

Society of Japan, 75(1B), 257-288.

De Hoog, F. R., J. H. Knight, and A. N. Stokes (1982), An improved method for numerical

inversion of Laplace transform, SIAM J. Sci. Stat. Comput., 3(3), 357– 366, doi:

10.1137/0903022.

Deutsch, C.V., and A.G. Journel (1998), GSLIB, geostatistical software library and user’s

guide, 2nd ed., Oxford University Press, New York, ISBN-10:0195100158.

Dovera, L., and E. Della Rossa (2011), Multimodal ensemble Kalman filtering for Gaussian

mixture models, Computat. Geosci. 15(2), 307-323, doi: 10.1007/s10596-010-9205-3.

Evensen, G. (1994), Sequential data assimilation with a nonlinear quasi-geostrophic model

using Monte Carlo methods to forecast error statistics, J. Geophys. Res., 99(C5),

10143-10162, doi:10.1029/94JC00572.

Furrer, R., and T. Bengtsson (2007), Estimation of high-dimensional prior and posterior

covariance matrices in Kalman filter variants, J. Multivar. Anal., 98(2), 227–255,

doi:10.1016/j.jmva.2006.08.003.

Gelb, A. (1974), Applied optimal estimation, The MIT Press, Cambridge, Mass., ISBN-

10:0262570483.

Guadagnini, A., and S.P. Neuman (1999), Nonlocal and localized analyses of conditional

mean steady state flow in bounded, randomly nonuniform domains: 2. Computational

examples, Water Resour. Res., 35(10), 3019 – 3039, doi:10.1029/1999WR900159.

117

Hendricks Franssen, H.-J., and W. Kinzelbach (2008), Real-time groundwater flow modeling

with the Ensemble Kalman Filter: Joint estimation of states and parameters and the

filter imbreeding problem, Water Resour. Res., 44, W09408,

doi:10.1029/2007WR006505.

Hendricks Franssen, H.-J., H.P. Kaiser, U. Kuhlmann, G. Bauser, F. Stauffer, R. Muller, and

W. Kinzelbach (2011), Operational real-time modeling with ensemble Kalman filter of

variably saturated subsurface flow including stream-aquifer interaction and parameter

updating, Water Resour. Res., 47, W02532, doi:10.1029/2010WR009480.

Hernandez, A.F., S.P. Neuman, A. Guadagnini, and J. Carrera (2003), Conditioning mean

steady state flow on hydraulic head and conductivity through geostatistical inversion,

Stochastic Environ. Res. Risk Assess., 17(5), 329-338, doi:10.1007/s00477-003-0154-4.

Hernandez, A.F., S.P. Neuman, A. Guadagnini, and J. Carrera (2006), Inverse stochastic

moment analysis of steady state flow in randomly heterogeneous media, Water Resour.

Res., 42(5), W05425, doi:10.1029/2005WR004449.

Houtekamer, P.L., and H.L. Mitchell (1998), Data Assimilation Using an Ensemble Kalman

Filter Technique, Mon. Weather Rev., 126(3), 796-811, doi:10.1175/1520-

0493(1998)126<0796:DAUAEK>2.0.CO;2.

Jafarpour, B., and D.B. McLaughlin (2008), History matching with an ensemble Kalman

filter and discrete cosine parameterization, Computat. Geosci., 12(2), 227–244, doi:

10.1007/s10596-008-9080-3.

Jafarpour, B., and M. Khodabakhshi (2011), A Probability Conditioning Method (PCM) for

nonlinear flow data integration into multipoint statistical facies simulation, Math.

Geosci., 43(2), 133-164, doi: 10.1007/s11004-011-9316-y.

118

Jafarpour, B., and M. Tarrahi (2012), Assessing the performance of the ensemble Kalman

filter for subsurface flow data integration under variogram uncertainty, Water Resour.

Res., 47, W05537, doi:10.1029/2010WR009090.

Kalman, R.E. (1960), A New Approach to Linear Filtering and Prediction Problems, J. Basic

Eng., 82(D), 35-45, doi: doi:10.1115/1.3662552.

Kolbjørnsen, O., M. Stien, H. Kjønsberg, B. Fjellvoll, and P. Abrahamsen (2013), Using

Multiple Grids in Markov Mesh Facies Modeling, Math. Geosci., doi: 10.1007/s11004-

013-9499-5.

Le Loc’h, G., and A. Galli (1997), Truncated plurigaussian method: theoretical and practical

points of view. E.Y. Baafi and N.A. Schofield (eds), Geostatistics Wollongong ’96, 1,

211-222, Dordrecht, Kluwer Academic Press.

Liang, X., X. Zheng, S. Zhang, G. Wu, Y. Dai, and Y. Li (2012), Maximum Likelihood

estimation of inflation factors on error covariance matrices for ensemble Kalman filter

assimilation, Quart. J. Meteor. Soc., 138(662), 263-273, doi:10.1002/qj.912.

Liu, N. and D.S. Oliver (2005a), Ensemble Kalman filter for automatic history matching of

geologic facies, J. Petrol. Sci. Eng., 47(3-4), 147–161, doi:

10.1016/j.petrol.2005.03.006.

Liu, N. and D.S. Oliver (2005b), Critical Evaluation of the Ensemble Kalman Filter on

History Matching of Geologic Facies, SPE Reserv. Eval. Eng. 8(6), 470-477, doi:

10.2118/92867-PA.

Liu, Y., A.H. Weerts, M. Clark, H.-J. Hendricks Franssen, S. Kumar, H. Moradkhani, D.J.

Seo, D. Schwanenberg, P. Smith, A.I.J.M. van Dijk, N. van Velzen, M. He, H. Lee, S.J.

Noh, O. Rakovec, and P. Restrepo (2012), Advancing data assimilation in operational

hydrologic forecasting: progresses, challenges, and emerging opportunities, Hydrol.

Hearth Syst. Sci., 16(10), 3863-3887, doi: 10.5194/hess-16-3863-2012.

119

McLaughlin, D.B. (2002), An integrated approach to hydrologic data assimilation:

interpolation, smoothing, and filtering, Adv. Water Resour., 25(8-12), 1275-1286,

doi:10.1016/S0309-1708(02)00055-6.

Moreno, D. and S.I. Aanonsen (2007), Stochastic facies modeling using the level set method,

Petroleum Geostatistics, 10–14 September 2007, Cascais, Portugal, A16, Extended

Abstracts Book, EAGE Publications BV, Utrecht, The Netherlands.

Naevdal, G., L.M. Johnsen, S.I. Aanonsen, and E.H. Vefring (2005), Reservoir monitoring

and continuous model updating using ensemble Kalman filter, SPE J., 10(1), 66-74,

doi:10.2118/84372-PA.

Oliver, D.S., and Y. Chen (2011), Recent progress on reservoir history matching: a review,

Comput. Geosci. 15(1), 185-221, doi:10.1007/s10596-010-9194-2.

Rao, K.R., and P. Yip (1990), Discrete Cosine Tranform: Algorithms, Advantages,

Applications, Academic Press, Boston.

Riva, M., A. Guadagnini, S. P. Neuman, E. Bianchi Janetti, and B. Malama (2009), Inverse

analysis of stochastic moment equations for transient flow in randomly heterogeneous

media, Adv. Water Resour., 32(10), 1495-1507, doi:10.1016/j.advwatres.2009.07.003.

Riva, M., A. Guadagnini, F. De Gaspari, and A. Alcolea (2010), Exact sensitivity matrix and

influence of the number of pilot points in the geostatistical inversion of moment

equations of groundwater flow, Water Resour. Res., 46, W11513,

doi:10.1029/2009WR008476.

Riva, M., M. Panzeri, A. Guadagnini, and S.P. Neuman (2011), Role of model selection

criteria in geostatistical inverse estimation of statistical data- and model-parameters,

Water Resour. Res., 47, W07502, doi:10.1029/2011WR010480.

120

Schoeniger, A., W. Nowak, and H.-J. Hendricks Franssen (2012), Parameter estimation by

ensemble Kalman filters with transformed data: Approach and application to hydraulic

tomography, Water Resour. Res., 48, W04502, doi:10.1029/2011WR010462.

Stien, M., and O. Kolbjørnsen (2011), Facies modeling using a Markov Mesh model

specification, Math. Geosci., 43(6), 611-624, doi: 10.1007/s11004-011-9350-9.

Strebelle, S. (2002), Conditional simulation of complex geological structures using multiple-

point statistics, Math. Geol., 34(1), 1-21, doi: 10.1023/A:1014009426274.

Tarantola, A. (2005), Inverse Problem Theory, Society for Industrial and Applied

Mathematics, Philadelphia, ISBN-10: 0898715725.

Tartakovsky, D.M., and S.P. Neuman (1998), Transient flow in bounded randomly

heterogeneous domains: 1. Exact conditional moment equations and recursive

approximations, Water Resour. Res., 34(1), 1-12, doi:10.1029/97WR02118.

Ye, M. (2002), Parallel finite element Laplace transform algorithm for transient flow in

bounded randomly heterogeneous domains, Ph.D. dissertation, Univ. of Ariz., Tucson.

Ye, M., S.P. Neuman, A. Guadagnini, and D.M. Tartakovsky (2004), Nonlocal and localized

analyses of conditional mean transient flow in bounded, randomly heterogeneous

porous media, Water Resour. Res., 40(5), W05104, doi:10.1029/2003WR00209.

van Leeuwen, P.J. (1999), Comment on “Data Assimilatin Using an Ensemble Kalman Filter

Technique”, Mon. Weather Rev., 127(6), 1374-1377, doi:10.1175/1520-

0493(1999)127<1374:CODAUA>2.0.CO;2.

Vrugt, J., C.G.H. Diks, H.V. Gupta, W. Bouten, and J.M. Verstraten (2005), Improved

treatment of uncertainty in hydrologic modeling: combining the strengths of global

optimization and data assimilation, Water Resour. Res., 41(1), W01017,

doi:10.1029/2004WR003059.

121

Wang, X., and C.H. Bishop (2003), A comparison of breeding and ensemble transform

Kalman filter ensemble forecast schemes, J. Atmos. Sci., 60(9), 1140-1158,

doi:10.1175/1520-0469(2003)060<1140:ACOBAE>2.0.CO;2.

Wang, X., T.A. Hamill, J.S. Whitaker, and C.H. Bishop (2007), A comparison of hybrid

ensemble transform Kalman Filter-optimum interpolation and ensemble square root

filter analysis scheme, Mon. Weather Rev., 135(3), 1055-1076, doi:

doi:10.1175/MWR3307.1.

Wen, X.-H., and W.H. Chen (2007), Some practical issues on real-time reservoir updating

using ensemble Kalman filter, SPE J., 12(2), 156-166, doi:10.118/111571-PA.

Woodbury, A.D., and T.J. Ulrych (2000), A full-Bayesian approach to the groundwater

inverse problem for steady state flow, Water Resour. Res., 36(8), 2081 - 2093,

doi:10.1029/2000WR900086.

Xu, T., J.J. Gómez-Hernández, H. Zhou, and L. Li (2013), The power of transient

piezometric head data in inverse modeling: an application of the localized normal-score

EnKF with covariance inflation in a heterogeneous bimodal hydraulic conductivity

field, Adv. Water Resour., 54, 100-118, doi:10.1016/j.advwatres.2013.01.006.

Zeng, L., H. Chang, and D. Zhang (2011), A Probabilistic Collocation-Based Kalman Filter

for History Matching, SPE J., 16(2), 294-306, doi:10.2118/140737-PA.

Zeng, L., L. Shi, D. Zhang, and L. Wu (2012), A sparse grid Bayesian method for

contaminant source identification, Adv. Water Resour., 37, 1-9,

doi:10.1016/j.advwatres.2011.09.011.

Zhang, D., Z. Lu, and Y. Chen (2007), Dynamic Reservoir Data Assimilation With an

Efficient, Dimension-Reduced Kalman Filter, SPE J., 12(1), 108-117,

doi:10.2118/95277-PA.

123

Acknowledgements

This work was supported in part by a grant provided by Eni S.p.a. - Exploration and

Production Division through the project “History Matching per la caratterizzazione delle

facies di reservoir mediante tecniche di inversione stocastica”. Schlumberger is

acknowledged for allowing the use of the software ECLIPSE for research purposes.

125

Estratto in italiano

La corretta modellazione dei fenomeni di flusso e trasporto nei mezzi porosi è di

estrema rilevanza per affrontare e risolvere efficacemente una notevole quantità di problemi

ingegneristici ed ambientali. Tra le applicazioni più rilevanti si cita, a titolo di esempio,

l’approvvigionamento idrico per scopi civili e industriali, i trattamenti di bonifica di suoli e

acquiferi inquinati, la protezione dei pozzi di emungimento, la necessità di migliorare

l’efficienza legata all’estrazione degli idrocarubri presenti nei giacimenti petroliferi per far

fronte alla crescente domanda di risorse energetiche contestualmente alla diminuzione delle

disponibilità naturali, la quantificazione del rischio connesso allo stoccaggio di materiale

radioattivo nel sottosuolo.

La realizzazione di un modello di flusso e trasporto sotterraneo richiede la definizione

della distribuzione spaziale dei parametri contenuti nelle equazioni che governano il

fenomeno in esame, tipicamente costituiti da permeabilità e porosità. Tale distribuzione è

generalmente caratterizzata da un’elevata incertezza. In questo contesto, si rende necessario

l’utilizzo di un modello probabilistico, in cui i parametri del sistema sono trattati come dei

processi stocastici eventualmente condizionati sulla base delle misure disponibili tramite

tecniche di modellazione inversa o di assimilazione dati.

Tra le numerose metodologie descritte in letteratura, in questo lavoro si è utilizzata

una tecnica denominata "Ensemble Kalman Filter" (EnKF). EnKF permette di assimilare dati

di diversa natura in modelli dinamici in maniera sequenziale, quando le misure vengono

acquisite. EnKF consente di operare con modelli di elevata dimensione spaziale e

caratterizzati da dinamiche non-lineari tipiche della modellazione del flusso e trasporto

sotterraneo.

Nonostante la sua crescente popolarità, ci sono diversi aspetti negativi che limitano lo

spettro di applicabilità di EnKF. Tradizionalmente EnKF richiede l’utilizzo di un approccio

126

Monte Carlo (MC) per generare un insieme di realizzazioni del processo stocastico in esame.

Un fattore critico è costituito dal numero di realizzazioni MC utilizzato per approssimare i

momenti statistici delle variabili di interesse. Se infatti da un lato è necessario utilizzare un

elevato numero di realizzazioni per ottenere una buona stima di medie e covarianze delle

quantità analizzate, l'onere computazionale richiesto può impedirne l'utilizzo in applicazioni

di interesse pratico. Inoltre EnKF fornisce risultati ottimali solo se le variabili del sistema

(i.e., parametri del modello e variabili di stato) possono essere descritte da una distribuzione

Gaussiana multinormale mentre la realtà del sottosuolo è generalmente complessa e modelli

realistici devono considerare la presenza di diverse facies, ovvero di distinte unità geologiche

ciascuna caratterizzata da peculiari proprietà mineralogiche e petrofisiche. La distribuzione

spaziale delle facies, tipicamente descritta utilizzando funzioni indicatrici, influenza

considerevolmente il comportamento dinamico del sistema. A causa della natura non-

Gaussiana delle funzioni indicatrici, utilizzare EnKF per aggiornare la distribuzione spaziale

delle facies è, nella maggior parte dei casi, fonte di errore.

I principali obiettivi di questa tesi sono: (a) integrare all’interno di procedure EnKF le

equazioni stocastiche dei momenti del flusso sotterraneo per ovviare all'utilizzo di tecniche

MC; e (b) sviluppare un algoritmo in grado di condizionare la distribuzione spaziale delle

facies e delle loro proprietà petrofisiche utilizzando dati di produzione.

Nella prima parte della tesi si propone di ovviare all’utilizzo di simulazioni MC

risolvendo direttamente le equazioni stocastiche del flusso sotterraneo che governano

l’evoluzione spazio-temporale dei primi due momenti statistici (medie e covarianze) dei

carichi idraulici (h) e dei flussi. La nuova metodologia sviluppata è stata testata su un

problema sintetico di flusso sotterraneo in un acquifero eterogeneo, confinato, soggetto a

condizioni al contorno di tipo misto e in presenza di un pozzo di emungimento. Si è

analizzato l’effetto (a) degli errori di misura dei dati di log-conduttività (Y), (b) dell'intervallo

127

temporale dei dati di carico disponibili, (c) delle caratteristiche statistiche del campo Y, sulla

qualità dei campi calibrati di Y e h e sulle rispettive incertezze di stima. Si sono inoltre

confrontate le prestazioni e l’accuratezza dei risultati ottenuti con la nuova procedura e con la

tecnica tradizionale basata su simulazioni MC. Si è dimostrato che l’utilizzo delle equazioni

dei momenti all’interno di EnKF permette di stimare efficientemente ed in tempo reale i

parametri del modello e le variabili di stato evitando l’insorgere di problemi tipicamente

associati all’utilizzo dell’approccio basato sulle simulazioni MC. I risultati ottenuti

confermano che utilizzando solo poche centinaia di realizzazioni MC, come viene spesso

effettuato in letteratura, insorgono problemi di filter inbreeding, impattando negativamente la

qualità delle stime di Y e h e delle loro rispettive incertezze di stima.

Nella seconda parte del lavoro si illustra un nuovo algoritmo di assimilazione che

permette di aggiornare la distribuzione delle facies e delle proprietà petrofisiche di un

insieme di realizzazioni del sistema sotterraneo caratterizzato da una complessa architettura

geologica. La distribuzione delle facies è descritta utilizzando un modello Markov Mesh

(MM) accoppiato con una tecnologia multi-griglia, secondo cui i pattern geologici vengono

inizialmente riprodotti ad una scala più grande e ridefiniti successivamente con risoluzione

maggiore. Questa tecnica permette di riprodurre in dettaglio geometrie complesse e

caratterizzate da correlazioni spaziali distribuite su diverse scale. L’algoritmo di

assimilazione si fonda sull’integrazione del modello MM all’interno dello schema dell’EnKF.

La metodologia è stata testata in un modello sintetico di giacimento caratterizzato dalla

presenza di due facies che rappresentano un sistema fluviale meandriforme. I risultati sono

inoltre stati confrontati con quelli ottenuti applicando un approccio EnKF standard. Si

dimostra che lo schema di assimilazione proposto fornisce un insieme di campi di

permeabilità e di facies nei quali è mantenuta l’architettura geologica del modello di

riferimento, contrariamente a quanto ottenuto utilizzando EnKF tradizionale. La capacità

128

predittiva dei modelli di giacimento calibrati è stata testata attraverso due casi in cui, al

termine del periodo di assimilazione, si sono considerati due scenari caratterizzati da

differenti configurazioni di flusso. Nel primo scenario viene mantenuta la stessa

configurazione adottata durante il periodo di assimilazione, mentre nel secondo studio si

investigano gli effetti prodotti dalla presenza di due pozzi addizionali che diventano operativi

dopo l’assimilazione dei dati. Nel primo scenario, sia l'approccio tradizionale sia quello

proposto in questo lavoro forniscono una buona previsione dei dati di produzione anche se

l'incertezza di stima ottenuta con la nuova metodologia diminuisce. L’analisi condotta nel

secondo scenario ha invece evidenziato una sensibile differenza tra le prestazioni dei due

metodi di assimilazione e ha confermato la netta superiorità dell’algoritmo proposto rispetto

all’EnKF standard nel fornire una previsione dei dati di produzione concorde con il modello

sintetico di riferimento.

Desidero ringraziare prima di tutto Monica e Alberto per il loro costante aiuto durante

questi anni.

Un ringraziamento anche a Ernesto e Laura per il loro supporto e per la possibilità di aver

collaborato con loro.

Ringrazio infine tutti i miei amici e la mia famiglia per il sostegno che ho sempre ricevuto.

data assimilation for complex subsurface flow fields · 2014-05-13 · 2014 politecnico di milano...

Documents