spe 170690

SPE-170690-MS

Assisted History Matching Benchmarking: Design of Experiments-basedTechniques

Eric Bhark and Kaveh Dehghani, Chevron Energy Technology Company

Copyright 2014, Society of Petroleum Engineers

This paper was prepared for presentation at the SPE Annual Technical Conference and Exhibition held in Amsterdam, The Netherlands, 2729 October 2014.

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contentsof the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflectany position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the writtenconsent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations maynot be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract

As the role of reservoir flow simulation increasingly impacts existing operations and field developmentdecisions, it follows that rigor, fitness and consistency should be imposed on the calibration of reservoirflow models to dynamic data through history matching. Although a wealth of history matching techniquesexist in the petroleum literature that propose novel algorithms or share case studies, seldom does thecontent guide the modeler in fit-for-purpose reservoir model calibration for an operating asset. To evaluatethe applicability of these diverse techniques against standards required for reservoir management, aninternal study was performed to benchmark four assisted history matching (AHM) techniques commonlypromoted in the oil and gas industry. The techniques were vetted against a comprehensive suite ofmodeling requirements for multiple asset classes, integrating a variety of historical dynamic data typesthrough the calibration of reservoir properties that control flow behavior from the field to inter-well scale.The methods benchmarked were: (1) Design of Experiments (DoE)-based, (2) Ensemble Kalman Filterand Ensemble Smoother, (3) Genetic Algorithm and (4) Generalized Travel Time Inversion. Thismanuscript focuses solely on the DoE-based technique.

In order to consistently benchmark the techniques, a set of standards was defined against which eachwas evaluated to determine its suitability for widespread history matching applications. The standardsinvolve: the capacity to parameterize (and therefore calibrate) a diversity of reservoir flow modelattributes, the capacity to integrate different types of dynamic data, the level of independence from theflow simulator and the capability to provide probabilistic outcomes for predictive uncertainty assessment.Of the four techniques, the DoE-based approach uniquely satisfied all requirements. Its history matchingworkflow has the flexibility to incorporate any form of reservoir model parameter and to assimilate ahistory matching error metric for any individual or group of historical data types; therefore, benchmarkingestablished DoE-based techniques as unambiguously the most compliant with generic asset modelingrequirements. The approach was also identified as the most straightforward, both theoretically and inpractical computation, and therefore applicable to the broadest range of practitioners. Perhaps mostimportantly, the approach demonstrated the capacity for accurate quantification of uncertainty (ornon-uniqueness) in reservoir quality resulting from an exhaustive, although approximate, exploration ofmodel parameter space and the associated history matching error metric(s).

This manuscript compiles the results and insights gained from benchmarking of the DoE-basedtechniques through the proposal of a comprehensive assisted history matching workflow. The workflowis designed for generality while providing best practices that guide the modeler in fit-for-purposeapplication. Limitations of the workflow are also recognized. Key components include: selection andscreening of calibration parameters based on statistical significance, development of surrogate models tocharacterize the relationship between parameters and the simulated historical data being integrated into thereservoir model, use of the surrogate models for exhaustive yet efficient exploration of parameter spaceto identify (non-unique) history matched models, and (deterministic or probabilistic) discrete reservoirmodel selection for use in forecast-based decision making. Each step of the AHM workflow is presentedfrom a conceptual and applied perspective, and field applications are provided to demonstrate keyconcepts. Although the applications presented include two deepwater Gulf of Mexico assets, the workflowand insights provided are developed from benchmarking across a diverse suite of asset types.

IntroductionHistory matching involves the conditioning of numerical reservoir flow models to field performancethrough the quantitative integration of historical dynamic data. With the more recent standardization ofreservoir flow simulation as an asset management tool, supporting a broad scope of activities includingexisting field operations, production forecasting, economic assessment of field development alternativesand reserves booking, the role and implications of history matching have become increasingly important.It is therefore beneficial to understand the attributes of the various classes of history matching techniquesavailable to the industry and their applicability to various asset types and associated dynamic data. Fromthis, rigor and consistent standards can be imposed on the calibration of reservoir flow models acrossassets, fundamentally delivering greater reliability in predictive performance to support asset managementdecisions.

Since the 1960s, there has been an increasing quantity of (assisted) history matching literature focusingon algorithmic development and application of diverse approaches to dynamic data integration. Despitethe actuality that history matching, or parameter estimation, by definition poses a discrete inverse problemwhen using reservoir simulation, the approaches used to solve the problem can be broadly categorized asforward and inverse approaches. Forward approaches, which encompass the topic of this article, funda-mentally explore the history matching solution space, defined by prior uncertainty assumptions for eachreservoir parameter being calibrated, to locate those parameter combinations corresponding to theminimum history matching error(s). The resultant calibrated parameters are not solved for, but rather areselectively, and often iteratively, identified from a larger ensemble of trials that are rejected from thesolution set based on logic specific to the algorithm. Therefore, the optimal solution(s) must exist withinthe prior uncertainty ranges and the majority of trials will be rejected. Accordingly, the focus onalgorithmic development of a forward approach lies in the efficient location of the local or globalminimum of history matching error with the fewest number of trials and, therefore, least computationalexpense. In the petroleum literature, the most commonly applied classes of (iterative) forward approachesto history matching are based on Design of Experiments (Peak et al., 2005; Cullick et al., 2006; Billiteret al., 2008; Schaaf et al., 2009), evolutionary algorithms (Schulze-Riegert et al., 2002; Cheng et al., 2008;Yin et al., 2011; Park et al., 2013) and swarm algorithms (Mohamed et al., 2010).

The variety of inverse methods for history matching is considerably greater. Inverse methods funda-mentally use the deviation between simulated and measured historical data, relative to a single locationin the reservoirs parameter space, to solve for the parameter set that most improves history match quality.Although seemingly less (computationally) wasteful than forward approaches, and potentially lessconstrained by prior assumptions for identification of the parameter solution(s), inverse methods areaffected by a myriad of numerical challenges. These fall into the primary categories of solutionuniqueness, resulting from a nontrivial model null space due to parameter correlation or insensitivity, and

2 SPE-170690-MS

solution instability or ill-conditioning, which is resolved through the addition of (biasing) inversionconstraints known as regularization. The classes of inverse approaches to history matching within thepetroleum literature include sensitivity-based, gradient-based and ensemble methods. Oliver and Chen(2011) provide a comprehensive review of these methods for which a rich variety of techniques have beendeveloped to mitigate solution non-uniqueness and instability.

Despite the wealth of history matching resources in the literature, the content is infrequently useful forguiding the reservoir engineer in fit-for-purpose flow model calibration under practical, field-scaleconditions. Applications rather are often based on (semi-)synthetic reservoir flow modeling scenarios thatare of ideal or simplistic reservoir characterization, and what are deemed field applications integratelimited types of dynamic data or focus on calibration of only a single or few reservoir model attributes.On the other hand, publications that focus solely on a history matching field application may not provideinsight as to how the technique can be generalized. While all such presentations are appropriate for theintroduction of theoretical concepts, to share workflows and to progress industry acceptance of historymatching techniques, they are of less use for guiding the reservoir engineer in development of a calibratedsimulation model that can stand up to peer review as a decision making tool for an operating asset.

In order to understand the practical utility of the wealth of history matching techniques reported, aninternal study was performed to benchmark four assisted history matching (AHM) techniques that havebeen applied in the oil and gas industry for asset management applications. The techniques were testedagainst a comprehensive suite of modeling requirements for various operations- and business-relatedapplications. They are:

Design of Experiments (DoE)-based methods Genetic Algorithm (GA) Ensemble Kalman Filter and Ensemble Kalman Smoother (EnKF/ES) Streamline-based Generalized Travel Time Inversion (GTTI)

While all techniques were found to have unique advantages and limitations, the DoE-based approachwas consistently confirmed the most robust in terms of compatibility with a broad range of asset types,algorithmic simplicity, history match quality and uncertainty quantification. The foundation for thisproposal lies in the definition of capabilities that we propose a technique should encompass if any generichistory matching study is to be successfully completed. These involve the capability of a technique to:

1. Parameterize and calibrate any reservoir attribute2. Integrate with any type of numerical simulation method and grid structure, for any run timeduration

3. Quantify any type of reservoir and production response into one or more history matching metricsfor their assimilation into the reservoir model

4. Output an ensemble of equivalently calibrated flow models suitable for probabilistic analysis

This article highlights the findings of the benchmarking study, focusing specifically on the forwardapproach of DoE-based AHM, and proposes that this class of techniques is capable of satisfying the aboverequirements for generalized yet effective application in most history matching frameworks. However,instances are identified in which other classes of techniques may complement the DoE approach in astructured workflow. The structured or hierarchical approach to history matching is currently recognizedas a best practice. It involves the systematic reconciliation of the reservoir model, from the geologic griddown to the grid cell scale, with multi-resolution static and dynamic data appropriate to the spatiotemporalscale of the flow process being modeled (Landa and Horne 1997; Williams et al. 1998; Cheng et al., 2008;Yin et al. 2010; Bhark et al., 2012). Workflows typically begin with the characterization of geologicstructure, compressibility, regional pore volume, regional permeability and fluid properties to matchaverage reservoir pressure (energy), spatial pressure trends and volumetric field production. Based upon

SPE-170690-MS 3

the flow process(es) modeled, fit-for-purpose calibration of smaller scale reservoir attributes are thenperformed, from the facies scale possibly down to inter-well behavior at grid cell resolution, to historymatch higher-resolution pressure transients and advective behavior. Therefore, the DoE-based AHMworkflow is presented within a framework amenable to, although not bound to, a structured historymatching approach.

In the remainder of this article, the DoE-based approach to history matching is first detailed and aworkflow intended for general application is presented. The benchmarking objectives and methodologyare then reviewed, integrating the workflow steps into a subset of key, sequentially applied historymatching procedures, beginning with reservoir model parameterization and concluding with the (deter-ministic or probabilistic) selection of discrete history matched models that are statistically representativeof one or more static or dynamic descriptors of reservoir quality. In each procedure, field applications arepresented from the benchmarking study to demonstrate the primary concepts. Last, the benchmarkingresults are summarized and conclusions presented. Here the proposition is established that the DoE-basedAHM technique is uniquely suited to satisfy most history matching requirements that may be encounteredwhen performing an AHM study for an operating asset.

The Design of Experiments Technique for History MatchingIn addition to its conceptual simplicity, the forward approach of DoE-based history matching is appealingbecause it at once mitigates the challenges of solution uniqueness and stability associated with the inverseproblem of history matching. This is achieved by mapping out the objective function response surface(s),across the range of estimable reservoir parameter combinations, and then identifying the history matchedmodels as those locations in parameter space that are at either the response local minima or globalminimum. An objective function, which quantifies history matching error, may correspond to one or moreunique dynamic data types (e.g., flowing bottom-hole pressure, well water cut, field gas production rate),and multiple objective functions may be simultaneously yet independently considered for the parameterselection process. As the name suggests, Design of Experiments techniques are used to achieve themapping and exploration of each response surface with minimal computational cost; therefore, analyticalor numerical proxy functions are typically used to increase this efficiency. Cost is associated with therun-time of numerical flow simulation to measure the impact of the calibration parameters on the objectivefunction(s).

From a different perspective, DoE-based history matching may also be used for discrete modelselection, i.e., for the identification of flow models (or parameter combinations) that are statisticallyrepresentative of one or more decision-making metrics, typically production forecast measures related toEUR or staged development production wedges. This requires that the history matching workflow includea (deterministic) forecasting component and acknowledges the fundamental use of reservoir character-ization for recovery-related prediction. The outcome of this approach is an ensemble of history matchedmodels that are ranked by one or more forecast metrics, providing a probabilistic understanding ofreservoir/field quality with respect to those metrics. At the least, this approach provides a generalunderstanding of how reservoir quality is related to low- to high-side recovery scenarios.

In this benchmarking study, an unambiguous yet generically applicable DoE-based workflow (Figure1) is presented for both history matching and discrete model selection. The strengths of this workflow layin its simplicity and, therefore, flexibility for application to any history matching problem. It is straight-forward to individually compute and perform most of the sequential workflow steps using either customor commercially available software. There are few automated components and, importantly, no theoreticalconstraints on parameter type or simulation requirements that must be satisfied as for other moresophisticated AHM techniques; therefore, any project-specific modeling complexities can be incorpo-rated. Additionally, the simulator is treated as a black box. Parameterization of the estimable reservoir andwell attributes requires access only to the simulators input and output data structures, potentially including

4 SPE-170690-MS

grid cell IDs and coordinates. Beyond this, access to source code would be required (e.g., for adjointsensitivity calculation), which immediately renders an AHM technique specialized or non-generic.Although a potential downside of the workflow, and of the DoE-based technique in general, can (but notalways) be the time consuming manual analysis required for certain workflow components, this isultimately beneficial to the engineer for improved understanding of input parameter-output data relation-ships. Increased user intervention is also beneficial for quality assurance and the identification of errors.

Following Figure 1, key steps in the DoE-based workflows are:A. Identification and characterization of potential reservoir model calibration parametersB. Screening for final identification of reservoir model calibration parametersC. Selection of a DoE parameter sampling strategy for response surface (or proxy model) construc-

tionD. Application of parameter samples (or model realizations) in simulation of the historical periodE. Construction of proxy models for individual history matching error metrics using the results from

Step DF. Exhaustive Monte Carlo sampling of input parameter combinations for definition of proxy-based

history matching errorsG. Rejection sampling of Monte Carlo samples based on history matching error tolerance(s) while

simultaneously assessing stability and bounds of posterior parameter uncertainty distributionsH. Application of history matched model realizations in simulation, including a forecast componentI. Discrete model selection

BenchmarkingThis section presents the results of the DoE-based AHM benchmarking study. Each sub-section focuseson one of five primary elements of the AHM workflow: (1) Parameter selection and parameterization, (2)Parameter screening, (3) Response surface construction, (4) History matched model definition and (5)Discrete model selection. The individual workflow steps in Figure 1 are appropriately grouped into oneof the elements, although each step is explained and insights related to practical application are provided.

Figure 1DoE-based workflow history matching and discrete model selection

SPE-170690-MS 5

The concepts in each sub-section are also demonstrated with a field application. Although thebenchmarking study and contributions to this paper are based on history matching studies performed fora variety of asset types, for intellectual property purposes the applications presented include only twocurrently operating deepwater Gulf of Mexico (GoM) reservoirs. Both reservoirs are similar with respectto structure, geology and fluid properties, and together encompass all general AHM components thatrequire demonstration in the proposed DoE-based workflow. The reservoirs are briefly introduced in thefollowing section, and more specific descriptions of these attributes and of dynamic data are providedwhen relevant in the subsequent sub-sections.

It should also be noted that a structured history matching workflow is not demonstrated in the fieldapplications. Although such workflows were applied for different assets in the benchmarking study, it isimportant to understand that the DoE-based history matching techniques are independent of the differentstages encountered within a structured workflow. That is, the steps in Figure 1 are applicable if calibrating(a) reservoir-wide, (b) flow unit or (c) well-level parameters, or if matching (a) average reservoir pressureand field rates, (b) flow unit pressures and saturations or (c) well pressures and phase cuts. In the fieldapplications presented in this paper, only a single iteration of the DoE-based workflow in Figure 1 isperformed. These reservoirs demonstrate field-wide pressure communication, and also vertical andhorizontal continuity in flow units, to the extent that well-level pressure and saturation matches can beachieved simultaneously using a consistent set of reservoir calibration parameters.

Reservoir and Data Description

The applications of AHM workflow components are demonstrated using either one of two similarlycharacterized deepwater GoM reservoir simulation models. As stated, the two modeling applicationstogether encompass all demonstrative requirements.

Both fields comprise hydraulically isolated, stacked reservoirs composed of Miocene turbidite depo-sitions in either three- or four-way structural traps. Figure 2 depicts initial fluid saturations in one of thefields, a three-way structural closure trapped against a salt dome with dips ranging from approximately20 near the original oil-water contacts (OWCs) to 70 or more near the salt face. The different sands areclassified as either amalgamated (massive) or bedded (sheeted) sandstone facies with good horizontalcontinuity. Faults provide secondary trapping mechanisms, although no faults have yet to be identified as

Figure 2Initial fluid saturations in one of the deepwater GoM reservoir models applied for the benchmarking study. This field is comprised by twohydraulically isolated, stacked Miocene sands within a three-way structural closure trapped against a salt dome, with dips ranging fromapproximately 20 near the original OWCs to 70 or more near the salt face.

6 SPE-170690-MS

compartmentalizing, and act rather as baffles. Average porosity among the sands is high, approximately30%, as is absolute permeability with a range of 500 to 1500 md among the sand layers. There is, however,measured degradation of porosity and permeability with depth due to compaction and diagenesis in thewater legs.

Initial fluid types among the isolated, stacked reservoirs range from highly undersaturated oils, withbubble points greater than 15,000 psi below initial pressure, to thin saturated oil rims with large gas caps.All black oils are lighter (API 30) with initial GORs between 500 and 1000 SCF/STB.

Although all reservoirs have supporting aquifers to various degrees, all are also under waterfloodduring primary depletion to supplement depleting aquifer support, improve volumetric sweep and reducegas evolution in those sand packages at or near saturation pressure. Aquifer characterization, in additionto reservoir characterization, is therefore an important component of history matching. Transient aquiferstrength is largely characterized by calibration of aquifer model parameters to well pressure depletion andsupport trends, corresponding to pre- and post-waterflooding, respectively. The majority of productionand injection wells are equipped with down-hole pressure gauges (DHPGs) which provide these trends athigh-resolution when a well is both operating and shut-in. At wells without DHPGs, reservoir pressure isinferred from wellhead measurement during extended shut-ins.

Other historical dynamic data available for reservoir model calibration are MDT/RFT pressure profilescollected during infill or development well drilling, and three-phase surface production well rates. In thecase of commingled wells, none of which have vertical profile control (e.g., via active or intelligentdownhole well control), zonal contributions are indirectly measured from geochemical fingerprinting ofsurface oil samples over time intervals on the order of months. Although PLT logs would provideequivalent information, these are not performed due to the high cost (of subsea well intervention).

Also important to the history matching workflow, particularly for the forecasting and discrete modelselection components, is that the reservoirs are produced under either completion rate or drawdownmanagement, whichever is the more limiting of the constraints. Rates are managed relative to a maximumcompletion flux beyond which completion degradation may occur, leading to PI decline and increaseddrawdown for a given rate. When drawdown later approaches a maximum operating threshold that isassociated with additional PI degradation, well management becomes drawdown constrained and pro-duction rates begin to drop below their associated maximum completion flux.

Parameter Selection and ParameterizationFor determination of reservoir calibration parameters, the question should first be asked: To what stageof the structured (or hierarchical) history matching workflow does the study belong: the reservoir, facies,flow unit or well level? Only then can the asset team propose the set of parameters to which the relevanthistorical data are sensitive. To reiterate from the Introduction section, there is abundant literature relatedto structured history matching that discusses reservoir attributes and strategies relevant to calibration ofthe transient reservoir energy distribution and advective flow behavior, from the facies to well level.Alternatively, it may be appropriate to simultaneously include various parameter types (e.g., rock/fluidproperties, structure and heterogeneity) that affect multiple spatio-temporal scales because the historicaldata being matched are sensitive to all parameters of interest (i.e., the selected parameters are correlatedrelative to their impact on the historical data). It is in fact difficult, perhaps unachievable, to isolatereservoir model parameters that affect different data types independently. For example, structuredworkflows typically calibrate spatial and transient trends in average reservoir pressure using pore volume,fluid contacts and rock compressibility, effectively performing a material balance analysis, beforeattempting to match multiphase production rates. However, when next matching these rates, modelparameters to which rates are primarily sensitive may not be appreciably adjusted without degrading thepressure match. Relative permeability parameters provide one such example. While they are typicallypressure insensitive, they alone cannot likely be used to calibrate multiphase production rates at multiple

SPE-170690-MS 7

wells unless several saturation (or relative permeability) regions are defined, which in turn may affectpressure through fluid redistribution.

To manage this challenge, this workflow proposes simply to include the main uncertain reservoirattributes, whether they are structural, rock, fluid or well productivity related, to which the historical databeing matched are sensitive. Selection of the historical data to be integrated is, again, determined fromproper identification of the structured AHM workflow stage to which the study belongs. Because aDoE-based workflow applies a forward approach to history matching, an implicit requirement is that oneor more solutions lie within the parameter space explored, reemphasizing the importance of the parameterselection process. Although the selection of history matching parameters is not a task that can bestandardized as each field and reservoir is unique relative to recovery mechanisms and flow behavior, theDoE workflow does offer standard parameter screening techniques that can be used to select those towhich the relevant historical data are statistically sensitive.

In addition to identification of model calibration parameters, the parameters should themselves bejudiciously characterized, or parameterized, for the AHM process. The fundamental step of parameter-ization is to categorize the relevant reservoir and well attributes as either continuous or discrete variables.Both continuous (e.g., contact depths, relative permeability function parameters, region property multi-pliers) and discrete parameters (e.g., fault scenarios, or more generically Low - Mid - High scenarios ofany attribute) have an important role in the structured history matching approach. For example, discreteparameters representing multiple structural geologic interpretations, often defined as multiple simulationgrid geometries, are applied at the global-scale. On the contrary, reservoir calibration at the facies or layerscale requires the parameterization of continuous, higher-resolution attributes that may be achieved usingseveral approaches (e.g., pilot points [LaVenue et al., 1995; Doherty, 2003], Principle ComponentAnalysis [Gavalas et al., 1976; Jafarpour and McLaughlin, 2009]).

This benchmarking study has found the DoE-based technique to be the sole approach to AHM thatenables flexibility for all of the above considerations, indicating that the DoE-based technique can beapplied in any stage of a structured history matching workflow to calibrate any parameter type. In fact,the study has addressed a common misconception that high resolution parameters, spatial parameters inparticular, cannot be applied with this technique because the number of unknowns would require aprohibitively large Experimental Design. The workaround to this dilemma is to apply a parameterizationtechnique that captures the most salient features that can be resolved by the data in a low dimensionalspace, removing parameter redundancy or autocorrelation by some implicit or explicit grouping. Thisallows calibration of the uncertain reservoir heterogeneity at scales finer than the parameterization. If aftera DoE-based history matching study there is evidence that property adjustments at finer scales are requiredto refine history match quality, then refinement of the higher-resolution parameters alone can be achievedusing a second AHM technique following a structured workflow. For example, streamline methods suchas the Generalized Travel Time Inversion (Datta-Gupta and King, 2007) have been successfully appliedin similar hierarchical AHM studies to refine inter-well grid cell permeability after calibration of regionalgeologic heterogeneity (i.e., reservoir-scale flow paths) (Bhark et al., 2012; Watanabe et al., 2013).

Field Application The reservoir models calibrated comprise sheeted sandstones with good hydrauliccommunication, both laterally and vertically per sand package, making isolation of historical pressure andsaturation matches imprudent. Calibration of reservoir-scale to inter-well flow and transport behavior wastherefore performed in a single (as opposed to structured) AHM study. Between the two simulationmodels, four categories of parameters were identified: reservoir energy, rock and fluid properties, staticgeologic properties and structural uncertainty. Rather than present all parameters, the selection of whichis again an asset-specific task, the following exemplifies standardized types of parameters and differentapproaches to parameterization that may be generalized to any asset class using a DoE-based workflow.

8 SPE-170690-MS

The simplest type of parameter is a categorical parameter for which there are a finite number of discretevalues. Discrete parameters are typically used when continuity between parameter values is difficult toachieve or non-physical, e.g., if different reservoir model grids are included for structural reservoiruncertainty. In this example, Figure 3A shows three geologically plausible fault scenarios, defined indifferent simulation grids, which are assigned the discrete parameter values of -1, 0 and 1 (with unequalprobability) in an experimental design. Due to the commonality of this approach to parameterization, itis important to consider if an AHM technique (e.g., gradient-based) can calibrate discrete values whenselecting an algorithm.

Continuous parameters cover the largest range of reservoir attributes and can be used to characterizeany property for which is it physically plausible to describe using a continuous probability distribution.Common examples are uncertain fluid contacts, rock compressibility (and dilation) or a regional propertymultiplier (e.g., porosity, transmissibility).

Figure 3Examples of history matching parameter characterization including (A) discrete parameters capturing low-, mid- and high-side faultscenarios, (B) water-oil relative permeability functions using the six Corey function parameters with correlated continuous uncertainty distributions,and (C) high-resolution aquifer pore volume heterogeneity using the pilot point method.

SPE-170690-MS 9

What is less common, and which emphasizes the flexibility of DoE-based AHM approaches, is theability to parameterize complex reservoir model attributes with a considerably less complex set ofcontinuous parameters. Two such examples for these field applications involve the parameterization ofrelative permeability and high-resolution porosity heterogeneity. In the first example of relative perme-ability characterization, Figure 3B (left) shows the parameterization of oil-water curves using the sixCorey function parameters. Each parameter is represented by an independent continuous distribution;however, it is imperative to correlate the parameters so that their combined effect represents a consistentinfluence on oil displacement in low- versus high-side cases across the range of permissible saturations.For the Corey parameters, this is achieved by enforcing perfect inverse correlation between the Coreyexponents. For example, in a water displacing oil scenario, Figure 3B (center and right) shows thatlow-side Corey parameters (labeled the P10 case in the figure) will result in a higher mobility ratio acrossthe range of saturations then the mid- or high-side parameters.

In the second example of porosity or pore volume parameterization, the pilot point parameterization isused to smoothly characterize pore volume heterogeneity areally and with depth, in a geologicallyconsistent manner, to assist in calibration of aquifer strength. Figure 3C (left) identifies the reservoir andaquifer regions, and also ten (labeled) pore volume pilot points that serve as continuous calibrationparameters in the DoE workflow. When the pilot point values, which each constitute a DoE parameter,are updated for any single simulation run, a variant of Kriging is performed to interpolate the pore volumemap within the aquifer using the pilot points as conditioning values. Note that the unlabeled pilot pointsthat stretch out along the aquifer-reservoir contact are defined as constants for the interpolation to ensurecontinuity in the pore volume field across this boundary. Because the actual aquifer extent and downdipthicknesses are distant from well control and are highly uncertain, the parameterization provides thenecessary degrees of freedom to calibrate variability in aquifer size in a geologically plausible manner(Figure 3C right) without adjustment to the grid. This flexibility was in fact requisite to achieve anaccurate calibration without deferring to more computationally demanding use of multiple discrete gridgeometries and corresponding pore volume maps.

In summary, these applications have demonstrated that DoE-based techniques are capable of simul-taneously incorporating a variety of model parameters in the AHM process, both discrete and continuous,and from simple univariate parameters to those representing complex and correlated parameterizationtechniques. It was concluded from the benchmarking study that, relative to the structured AHM philos-ophy, DoE-based techniques are in fact able incorporate any type of reservoir model parameter with theexception of properties defined at the individual grid cell level. However, calibration at this scale is likelyunjustified in most cases given that pressure transients and fluid cuts measured at wells containinformation related to spatial averages of reservoir heterogeneity, both physically (e.g., Oliver, 1992) andnumerically (e.g., Vasco et al., 1997), thereby resulting in low parameter resolution. Therefore, theincorporation of heterogeneity parameterization techniques within DoE-based workflows is expected to besufficient in all but the highest-resolution AHM applications.

Parameter ScreeningHaving selected an initial set of parameters, together with their uncertainty distributions, to which thesensitivities in the historical calibration data are thought to depend upon, a parameter screening analysisis performed to finalize the calibration parameter set. Per the Design of Experiments philosophy, thescreening of parameters is performed using a sparse or efficient parameter sampling methodology that atthe same time enables the statistical assessment of key uncertainties and, potentially, their interactions. Itis re-emphasized at this point in the AHM workflow that the key parameter uncertainties are related totheir influence on history matching error metrics and not on metrics related to production forecasting,which become important during the later step of discrete model selection. In the benchmarking study,screening is based upon the quantitative analysis of parameter main effects and first-order interaction

10 SPE-170690-MS

effects, defined below, and involves the common approach of one-variable-at-a-time (OVAT) sensitivityanalysis (SA) as well as a more rigorous identification of statistical significance.

Sensitivity Analysis In a DoE study, sensitivity analysis is typically synonymous with OVAT uncer-tainty analysis in which each input parameter is independently varied at its low and high values relativeto a single reference set of parameters. For m parameters, the number of simulations required to completethe analysis is 2m1. This permits calculation of the parameter main effect, defined (for this application)as the magnitude of change in a response across the total range of parameter uncertainty, or relative to theminimum and maximum values of the parameter. Main effects and OVAT parameter sensitivity aretypically viewed on a Tornado diagram, where the response change is shown for each parameter relativeto its deviation from the reference case (i.e., the 1 run in the 2m1 total runs).

Although it is possible to solely use Tornado diagrams to screen out parameters, leaving only theso-called heavy hitters in the calibration parameter set, the limitations of OVAT SA do not make thisa best practice. Because an OVAT sensitivity metric does not provide any information about the parameter- response relationship for parameter values between the end-points and the center-point, the metric canbe misleading as a sole screening tool. The parameter end-points are at the extreme limits of uncertaintyspace, indicating an extreme sensitivity metric, which (in the nonlinear case) is unlikely representative ofthe solution sensitivity which should lie closer to the mode of the distribution if prior uncertainty isproperly characterized. OVAT SA also calculates the impact of a single parameter on a response;therefore, interactions and sensitivity statistics other than first-order cannot be computed. Most impor-tantly, although the analysis indicates the impact of a parameter on a response, it does not indicate if theimpact is significant.

Rather than using OVAT SA and the associated Tornado diagram(s) for parameter screening, theseanalysis methods were found during benchmarking to be of better use as practical tools for manual historymatching and for quality assurance testing of workflows. The Tornado is useful for the broad under-standing of the change in a modeled response due to a change in a parameter (which should not beconfused with the definition of sensitivity as a differential, or the model response to a parameterperturbation). When manual history matching is required, possibly during the last component of theDoE-based AHM workflow, use of Tornado diagrams is critical to efficiently understand how a parameter adjustment will affect one or more history matching errors without performing simulation.More practically, OVAT SA is useful for initial quality control of the simulation workflows by allowinga sense check of parameter - response relationships, i.e., the sensitivities must adhere to mathematical andengineering understanding, otherwise indicating an error in computational workflow definition. Thisconcept is demonstrated in the ensuing application. Finally, it is useful to note that all simulationsperformed during a SA can be re-used at the later stage of response surface construction for historymatching by rejection sampling.

Statistical Significance of Effects To supplement the limitations of OVAT analysis as a parameterscreening tool, the statistical significance of each effect is used to more rigorously screen out parametersfrom an AHM workflow. The key role of this analysis is to identify if the magnitude of an effect issignificant and statistically repeatable. Additionally, effects may now consider the interaction betweenparameters, which is a measure of the magnitude of a response change when two or more parameters aresimultaneously perturbed at some combination of their extreme (low and/or high) values.

The statistical analysis begins with the definition of a null hypothesis for the main or interaction effectof each parameter: The individual parameter (main effect) or parameter combination (interaction effect)does not have a significant influence, at a defined significance level, on the response (which in this caseis an individual history matching error metric). The next step is to calculate the mean effect over severalexperiments, where each sample effect is computed as the magnitude of the change in a response whenthe single parameter or parameter combination (being assessed for sensitivity) is at its low versus high

SPE-170690-MS 11

uncertainty value. The experiments themselves are developed as part of a parameter design intended forscreening, guidelines for the development of which are presented below. At this point it is important onlyto understand that the variability between effects is derived from its repeated calculation, each time atdifferent random selections of the remaining parameters (not currently being tested for parametersignificance) at their low/high values, thereby distributing the potential influence of all other uncontrolledeffects on the current set of experiments. The final outcome is that each effect will have a sample meanand variance, preferably unbiased due to a sufficiently large sample size and random spreading of theuncontrolled interactions between parameters.

The statistical significance of the parameter effect can be determined using a dependent t-test for pairedsamples (Schmidt and Launsby, 2005), where the sample t-test statistic is defined as

(1)

x is the mean of the sampled effects, O is zero under the null hypothesis and the denominator of Eq.(1) is the standard error of the effects for the n experiments in the design. Assuming a significance levelof , a critical t statistic is computed against which the significance of the effect is finally determined. Ift* is beyond the critical value, then the null hypothesis can be rejected and the parameter effect identifiedas significant because the probability of achieving such a high t-test statistic by chance alone is small. Akey insight to this approach is to recognize that the numerator of t* represents the magnitude of the effect,and the denominator a measure of the total variability between the effects; therefore, a significant andrepeatable statistic value indicates that the effect is large relative to a smaller effect variability, the latterof which further indicates that the variability of response samples at the low and at the high parameter(combination) values is also small.

For a given response, the statistical significance of each effect is typically reported on a standardizedPareto chart, which ranks each parameter (y-axis) by its t-test statistic (x-axis) in descending order. In thePareto chart that will be discussed in the application below (Figure 6), the test statistics are shown nextto each bar. Those with negative values indicate an inverse relationship between that parameter and theresponse (or history matching error metric). Also shown is the critical statistic value corresponding to thesignificance level, or p-value, assumed to be 0.05 for this application. The p-value is the probabilityof achieving a t-test statistic from the sampled effects at least as large as the critical value if the nullhypothesis is true; therefore, the parameters with large t-test statistics that fall to the right of the criticalt value would have a small p-value, less than 0.05, and the null hypothesis of no significant effect canbe rejected. For those parameters with t-test statistics to the left of the critical p-value line, the null

Figure 4Showing only a single structural layer of the reservoir, six material balance regions are identified, four in the oil column and two in theaquifer, each to which an individual pore volume multiplier is applied to redistribute initial energy for the history matching parameterization.

12 SPE-170690-MS

Figure 5At the left, a Tornado chart displays OVAT SA results for a single static metric, the initial aquifer-to-oil-column pore volume ratio(AOCR), and also for a single history matching error metric related to observed shut-in pressures at well MER07 (HMERR:MER07_SWP). At theright, observed shut-in well pressure is compared against simulated pressure corresponding to the base case or center-point parameters labeled asBaseline Value on the Tornado charts.

Figure 6(A) The standardized Pareto chart for the main and interaction effects, characterized by the multilinear regression model, for the historymatching error metric related to shut-in pressure at well MER07. (B) A sub-section of the D-Optimal design table used to condition the multilinearregression model, showing only five parameters and their interactions, for 20 out of the 250 D-Optimal samples. In order to identify aliasing, eachcolumn of such a table would have to be compared with all other columns for uniqueness.

SPE-170690-MS 13

hypothesis cannot be rejected. This does not indicate that these parameters are insensitive, but rather thata statement cannot be made regarding the statistical significance of the effect.

After constructing a Pareto chart for each history matching error metric, those parameters for which thenull hypothesis cannot consistently be rejected may be safely removed from the AHM workflow. In fact,it is a good practice to purposefully include a dummy or completely insensitive parameter into thesimulation workflow to ensure that the screening analysis performs as intended. For example, if thedummy variable is identified as significant then it is likely that the n samples in the screening design aretoo few.

Before moving to the application, a final discussion is required regarding selection of the parametersampling design used to test for effect significance. In alignment with the aforementioned approach tohypothesis testing, the samples should follow a two-level design that at the least enables uniqueidentification of effects for individual parameters. This requires that these so called main effects are notaliased, or do not share identical parameter low/high values across all design experiments with parameterinteractions, which would prevent the analyst from identifying if a certain response is caused by that singleparameter (main effect) or combination of parameters (interaction). Benchmarking findings have alsoindicated that it is beneficial to identify the significance of first-order bivariate interactions because theyindirectly indicate that two parameters may be associated with the same response, or may be correlatedwhen that association is linear. These main and interaction effects are characterized by the multilinearpolynomial

(2)

for two parameters x1 and x2, and for response variable y. It is also in general preferred for screeningthat the design is both vertically and horizontally balanced, meaning that each parameter is assigned itslow and high values an equal number of times across all design runs and within design samples,respectively. Balancing promotes pairwise orthogonality (or linear independence) between parametersamples within the design, thereby promoting independent evaluation of effects (Schmidt and Launsby,2005).

Although these properties are useful to understand at a fundamental level when analyzing the statisticalsignificance of effects, several screening designs are well developed and have been commonly studied inthe petroleum literature. The Plackett-Burman is probably the most commonly applied for screening as itis a two-level, orthogonal design of Resolution III, i.e., main effects are independent from each other butare aliased with two-parameter interactions. This design is therefore useful when it can be safely assumedthat the influence of interactions on the modeled responses of interest is nominal. Another limitation ofthis deterministic design is that the number of experiments must be a factor of four. The FoldedPlackett-Burman design, which doubles the number of Plackett-Burman experiments for a given numberof parameters, improves the design to Resolution IV. Main effects can be uniquely identified fromtwo-parameter interactions, although interactions remain aliased. The more complete two-level, orthog-onal designs are the (half) fractional factorial and exhaustive full factorial designs. These provide thehighest resolution and can enable unique identification of single- and two-parameter effects; however,their use is typically impractical for field application due to the computational demand of 2m-1 and 2m

experiments (or reservoir simulation runs), respectively. If a fractional factorial design is selected, then itis important to compute and analyze the aliasing patterns to understand which combinations of parameterscan and cannot be uniquely identified as (in)significant. For field application, a practical fit-for-purposedesign is the two-level D-optimal design. The advantages are that any number of n experiments can bedefined for customization to computational capacity, for any number of m parameters (at two or morelevels, with the number of levels possibly differing per parameter). The design is not a deterministicsampling scheme but rather is solved for by minimizing the variance of the underlying regression model

14 SPE-170690-MS

coefficients that captures the assumed relationship between the parameters and responses. Per the aboverecommendation, Eq. (2) defines the underlying regression model appropriate for construction of atwo-level D-optimal screening design. The main and interaction effects can then be calculated directlyfrom the experiments, where as many experiments should be performed as is feasible. Alternatively, if thecoefficients of the regression model are used to calculate the effects (Yeten et al., 2005), a minimum of(m(m1))/(21) experiments are required to uniquely solve for the coefficients in Eq. (2).

Field Application Following the above recommendations, the first component of parameter screening isOVAT SA to identify the general relationship between individual model responses to parameterchanges, and also to QC the modeling workflow. Once verified, the parameters are then statisticallyassessed for their impact on each of the history matching error types, and those parameters to which alldata are insensitive can be safely removed from the calibration workflow. For demonstrative purposes, inthis section these concepts are exemplified using a reservoir model that is characterized with a simpleparameterization of material balance and mobility properties. Figure 4 shows a structural layer of thereservoir and its segregation into six material balance regions, four in the oil column and two in theaquifer, each to which an individual pore volume multiplier is applied to redistribute initial energy.Formation compressibility and the initial oil-water contact are also parameters. Fluid mobility is charac-terized using the six Corey parameters for oil-water relative permeability, and an individual absolutehorizontal permeability multiplier (assuming isotropy) is defined for each of the aforementioned materialbalance regions. Gas-oil relative permeability relationships are assumed constant.

For the OVAT SA, sensitivities are exemplified for a single static metric, the initial aquifer-to-oil-column pore volume ratio (AOCR), and also for a single history matching error metric related to observedshut-in pressures at a single well, MER07. The corresponding sensitivity Tornado diagrams are shown inFigure 5, computed from a total of 2m1 37 simulation runs for m 18 parameters. For AOCR, allparameters related to both aquifer and oil pore volumes are appropriately sensitive, with the incrementsin AOCR properly characterized, e.g., a deeper OWC results in a smaller AOCR. Aquifer pore volumehas the largest sensitivity because it is permitted the largest parameter range due to poor characterizationof the aquifer extent away from well control. Equivalently, all other parameters related to mobility haveexactly zero sensitivity, an observation which otherwise would indicate an error in the modelingworkflow. On the dynamic side, Figure 5 also shows the OVAT Tornado for a metric quantifying historymatching error for shut-in bottomhole pressure at well MER07. The base case run, corresponding to thecenter line in the Tornado chart, is depicted by the orange line and under predicts the observed pressures(black dots) for most of the history. It would therefore be expected (prior to additional simulation-basedanalysis) that an increase in the regional permeability and/or pore volume near this well would increasethe simulated pressure and improve the data misfit. The Tornado plot in fact shows this; an increase inthe associated parameters (red bars) corresponds to a reduction in history matching error (x-axis). Alsonote that the sensitive material balance regions (i.e., MBOCREG1 and MBOCREG2) appropriatelycontain or are closest to well MER07 (in Figure 4). If or when manual parameter adjustments are requiredin the history matching workflow, the utility of the Tornado analysis beyond screening is evident.

Moving to a more quantitative analysis, the second parameter screening step involves testing of eachparameter for statistical significance. The analysis is at once performed by construction of a standardizedPareto chart for each relevant history matching error metric. In this application, main effects and two-wayparameter interactions are applied as the screening criteria. The analysis of higher-order interactionscannot in practice be considered without an analysis of aliasing (e.g., see Schmidt and Launsby, 2005)which, although straightforward, may be unwarranted for a screening study. While a full or half fractionalfactorial design would provide exhaustive and balanced parameter samples, minimizing the risk ofaliasing for interaction effects and also providing accurate computation of the sample mean and variancefor a given effect, the computational expense is clearly too great at 218 or 217 simulation runs, respectively.

SPE-170690-MS 15

Alternatively, the Plackett-Burman and its folded variant, respectively requiring 20 and 40 runs (as theclosest multiple of 4 to m 18), alias two-way interactions and also may not provide robust samplestatistics due to the few samples. More practical is use of a D-Optimal design to conform to theapplication-specific constraints. In this case, computational capacity permits on the order of 250 simu-lations (in the time allotted for screening), which is used to define a two-level D-optimal design. Note thatif the regression model in Eq. (2) was used to compute the effects, a minimum of 172 simulations wouldbe required to solve for the coefficients. In this case, more than this minimal sample number would berecommended to reduce the error variance of the coefficient solution.

Returning now to the history matching error metric for shut-in pressure at well MER07, the standard-ized Pareto chart is shown in Figure 6A for the main and interaction effects characterized by themultilinear regression model. There are (m(m1))/2 parameter combinations for m 18 parameters, soonly the parameters with the largest and smallest t-statistics are shown together with the critical t valuecorresponding to the 0.05 significance level. Similar to results from the OVAT SA for this error metric,the significant parameters for which the null hypothesis of no effect can be rejected are predominantlyrelated to pore volume and permeability in the material balance regions near to or enclosing the well.However, unlike the SA, this design includes interactions which in fact comprise the complete set ofsignificant parameters for this error metric. This indirectly indicates (although does not quantify)parameter correlation, which is the primary source of parameter non-uniqueness during history matching.The impact of two simultaneously varied parameters has a greater influence on the well pressure than eachparameter individually or, stated differently, an increase in one parameter and decrease in the other willnot impact the well pressure, indicating correlation. As a caveat, it is also possible when using aD-Optimal design of undetermined resolution that these significant interactions are aliased with others.The design table for five parameters and their interactions is shown in Figure 5B for only 20 out of the250 samples. In order to identify aliasing, each column of such a table would have to be compared withall other columns for uniqueness. Although achievable, this is in practice not required for screeningbecause the objective is to remove insignificant parameters rather than to identify significant parameters(per the hypothesis test). Accordingly, once a Pareto chart has been defined for each history matchingerror metric, any parameter for which the null hypothesis cannot consistently be rejected, or which onlyseldom cannot be rejected, can be removed from the history matching workflow. In this example, twoparameters were removed for these reasons, the permeability multipliers for regions MBAQ5SW andMBOCSEH (Figure 4). These correspond to the smallest material balance region in the aquifer and in theoil column, respectively, and indicate that flow through these regions nominally impacts the observedhistorical data.

Response Surface ConstructionThe computational expense of DoE-based AHM workflows is mitigated by use of proxy or responsesurfaces that analytically or numerically characterize the relationship between all history matchingparameters and each history matching error metric. This enables exhaustive yet efficient parametersampling, simulation by proxy and parameter sample rejection in the next step of history matched modelselection. Because model selection is performed by proxy, response surface construction is, from analgorithmic as opposed to reservoir engineering standpoint, indisputably the most critical component ofthe AHM workflow. Model selection is based upon acceptable history matching errors, so inaccuracy ofa history matching error proxy directly introduces an error into the selected model parameters, a problemthat does not become apparent until late in the workflow when moving back from proxy-based modelingto simulation. Such immediate degradation of history match quality will ultimately result in erroneousprobabilistic model identification, e.g., the P50 model identified is actually the P40.

It should be noted that challenges associated with the use of response surfaces result only from alimitation in computational capability. Current internal efforts are underway to perform simulations via

16 SPE-170690-MS

commercial cloud computing with a 10,000 CPU capability, thereby making proxy usage obsolete formost simulation applications. For simulation models with small run times or that can be efficientlyparallelized, response surfaces may be unnecessary if even limited distributed computing capabilities areavailable. However, in this benchmarking study, the focus is on AHM for generic reservoir simulationproblems; therefore, the requisite use of response surfaces is assumed.

The classes of analytical and numerical proxy models most commonly evaluated for history matchingand other optimization problems are polynomials, Kriging, splines and artificial neural networks. Ana-lytical approaches can be of more practical use as they are easier to transfer across software platforms ifneeded, a simple but nontrivial point when executing the workflow. A diversity of studies have concludedthe use of proxy models for history matching to be both applicable and inappropriate (Yeten et al., 2005;Cullick et al., 2006; Billiter et al., 2008; Webb et al., 2008; Zubarev, 2009). Cases of inappropriatenessinevitably result from either insufficient proxy sampling or their misuse due to false confidence in aproxys predictive capability based on insufficient validation. Such conflict demonstrates the importanceof characterizing the pervasive nonlinearity in parameter - response relationships for AHM, as well asdemonstrating how the degree of nonlinearity can vary between history matching problems. Advanta-geously in the DoE-based approach, each history matching error metric can be characterized by a differentclass of proxy, so multiple classes can be tested and that with the best predictive performance for thatresponse surface can be uniquely applied. In this study, predictive performance is measured through blindtesting, where a percentage of the conditioning points are excluded from generation of the surface so thatthe simulated versus proxy response at those locations (in parameter space) can be statistically compared.

The question then arises, what is an appropriate measure of predictive performance? It is this questionthat is fundamental to success of the workflow as it is the primary source of either failure or unnecessaryprolonging of a DoE-based AHM study. To prevent this, the proxy error, either quantified by theregression error or predictive error (such as blind testing error), should be used to determine what isdeemed an acceptable history match, and two error comparisons are required. The first is a comparisonof the proxy error with the historical data error or noise. Qualitatively speaking, the historical data errordefines the lower bound of the requisite proxy accuracy; the proxy does not need to resolve historymatching error that is within the range of data noise. Note that the error units (e.g., root mean squared error[RMSE] or an error confidence interval) must be consistent between the proxy and historical data. Thesecond and more important comparison is required between the proxy error and the variability of thehistorical data misfit (i.e., the difference between the observation and simulated data and the variation ofthis difference throughout parameter space). The proxy error variability should be less than the data misfitvariability for discernment of history match quality, or equivalently, the uncertainty or confidence intervalof the proxy should not span the history matching error tolerance that is used to discern acceptablehistory match quality. In other words, at any given point in parameter space, the uncertainty range orconfidence interval of the proxy estimate should fall entirely beneath the error tolerance if the model isto be retained or entirely above the error tolerance if the model is to be rejected. When proxy predictionerror spans this tolerance for a given parameter combination, the model may be erroneously either rejectedor accepted. While this may be acceptable when proxy error is small, a large proxy error on the order ofthe data misfit variability results in a flawed analysis, rendering the remainder of the AHM workflowvalueless. Although it is recognized that definition of the acceptable history matching error tolerance issubjective, a topic discussed in the following section, the above statements related to proxy accuracy holdtrue regardless of the tolerance magnitude.

In order to increase proxy accuracy, independent of the proxy class, it is inevitable in practice that thenumber of conditioning points must be increased (Liu, 2005). An optimal sample size cannot be knowna priori for a nonlinear system; however, use of an analytical proxy will require a minimum number ofsamples to uniquely solve for the model (or regression) coefficients for the given parameter set (e.g., alinear model requires n m 1 for a unique solution, with degrees of freedom n-m-1). In practice, the

SPE-170690-MS 17

maximum number of parameter - response conditioning locations achievable, given computationalconstraints, should be used to construct a proxy. There is no downside to over-sampling, if it is evenpossible for the system, because the number of Monte Carlo samples in the next step of model selectionwill outnumber the conditioning samples by several orders of magnitude. If a large number of condi-tioning points cannot initially be achieved, then a proxy can be iteratively constructed until regressionerror and or predictive performance is satisfactory (Jones, 1998; Queipo et al., 2000; Wang, 2003; Slotteet al., 2008; Castellini et al., 2010). In these approaches, an objective function is used to drive therefinement iterates that quantifies one or more measures of proxy quality including predictive variance,low versus high history match magnitude, nonlinearity, predictive sensitivity to local parameter pertur-bations and distance from from nearby conditioning locations.

Selection of the parameter sample design is perhaps more straightforward. Space filling designs,particularly variants of Latin Hypercube Sampling (LHS), have been demonstrated to well reproduce theunderlying uncertainty distribution in nonlinear systems (Yeten et al., 2005; Helton and Davis, 2002;Zubarev, 2009; Osterloh et al., 2013). LHS is a stratified random scheme from which any number ofexperiments with m parameters are sampled from equi-probably spaced bins, thereby approximatinguniform coverage with greater density in higher probability parameter space. Parameter correlation canalso be imposed. This strategy merges well with the objective of AHM response surface modeling whichis to provide sufficiently uniform coverage of parameter space, with as sparse a sampling resolution as ispossible, for identification of local history matching error minima. For these reasons, this benchmarkingstudy exclusively applies LHS for the construction of all error metric proxies. It is also a good practiceto include in the conditioning sample set the simulated responses acquired during the prior workflow stepof parameter screening. Because the OVAT SA applies experimental designs that capture the low/highextremes for different parameter combinations, these responses are useful to accurately condition theproxy response at its borders in parameter space. Other extreme parameter combinations, based solely onasset-specific consideration, can and should be added to the conditioning set.

Last, before proceeding with the AHM workflow, it should be confirmed that the historical observationdata are spanned by the simulated responses corresponding to the parameters used for proxy conditioning.This ensures that the parameter space sampled contains a history matching solution(s). If not, then theparameter selection process must be revisited. To reiterate, DoE-based workflows apply a forwardapproach to history matching as opposed to solving for the parameters in an inverse approach; therefore,the solution(s) must exist within the prior sampling space.

Field Application Following from the application in the prior study, sixteen calibration parameters wereidentified from the screening analysis. To next develop a proxy model for each history matching errormetric, 500 parameter combinations were sampled using a 25-bin LHS. The number of samples selectedwas related simply to the maximum affordable computational capacity which is a function of run time, thenumber of processors available for distributed (serial or parallel) runs and the project schedule. Thenumber of LHS bins was selected to ensure continuity of sampling for continuous parameters. Figure 7Ashows scatterplots for six of the parameters exemplifying approximate uniform coverage of parameterspace for both discrete and continuous parameters. Correlation is also honored when specified, in this casefor the Corey exponents to ensure consistent (low-to-high side) behavior in mobility ratio across the rangeof saturations for any given set of oil-water relative permeability curves (Figure 3B). The blue points inthe figures represent the blind testing points which also approximate uniform coverage throughout eachparameter range.

Before proceeding to proxy construction, it is also verified that each type of historical data series isspanned by the simulated data corresponding to the parameter space explored via the DoE. For example,Figure 7B plots two historical data series (i.e., water production rate at well MER03ST2 and oilproduction rate at well MER11) against only the first 20 simulations from the 500 LHS conditioning

18 SPE-170690-MS

simulations. Had the ensemble not spanned the data, then additional reservoir attributes would have to beexplored for inclusion to the history match and the preceding parameter sensitivity analysis repeated.

Having satisfied the prerequisite tests, three types of proxies were constructed for each historymatching error metric: linear polynomial, quadratic polynomial with first order interactions and Kriging.Figures 8A and 8B compare the simulated versus proxy-derived history matching error from the priorexample (of oil production rate at well MER11 [Figure 7B]) for the quadratic and Kriging proxies,respectively. The grey points represent the samples used to condition each proxy and the blue data are thetesting points left out of the regression for blind testing. This is evident in Figure 8B as Kriging is an exactinterpolator. The question is, are these proxies acceptable? The RMSEs of the quadratic and Kriging

Figure 7(A) Scatterplots of 500 LHS samples for three different parameter combinations exemplifying approximate uniform coverage of parameterspace for both discrete and continuous parameter types. Correlation is also honored when specified. The blue points are testing samples used tocompare simulation versus proxy accuracy and are therefore not applied in proxy conditioning. (B) Plots of two historical data sets (well MER03ST2water production rate and well MER11 oil production rate) versus 20 simulations from the condition set of 500 Latin Hypercube samples confirmthat the parameter explored by the DoE contains at least one history matching solution set.

Figure 8Simulated versus proxy-derived history matching error from the prior example (Figure 7B) for (A) the quadratic and (B) Kriging proxies.The grey points represent the samples used to condition each proxy and the blue data are the testing points left out of the proxy regression. (C) showsthe true response surface of the simulated history matching error for the oil production rate in Figure 7B as a function of two parameters whoseinteraction was found to be sensitive in the screening analysis.

SPE-170690-MS 19

proxies are 22 and 82 STBO, respectively. Depending on the location in parameter space, each proxy mayresult in a prediction less than or greater than its global RMSE. Proxy acceptability is first tested relativeto the range of the oil production data error. Although the error or noise in oil production at an instant intime is small, on the order of barrels, the RMSE values represent the match error over the completeproduction history and are therefore considered to be within the range of data accuracy (e.g., theabove-reported proxy RMSEs correspond to a daily measurement error of 1 STBO/day over the nineyear production history). There is therefore no justification to improve the proxies from the perspectiveof data reproduction. However, it should also be confirmed that the mean proxy errors are less than therange of true history matching error variability so that a tolerance can be accurately applied for the laterstep of model rejection. For example, using the 500 LHS samples, Figure 8C shows an interpolated (andassumed to be exact) response surface of the simulated history matching error for the MER11 oilproduction rate as a function of two parameters whose interaction was found to be sensitive in thescreening analysis (a Corey oil relative permeability end-point [KROCW] and a regional permeabilitymultiplier [KXOCREG1]). It is evident that MER11 oil production is relatively insensitive to KROCWalone, but that it can become sensitive when combined with the regional absolute permeability. The setof minima valleys map out non-unique history matching solutions that differ from the peaks by an errormagnitude on the order of 500 STBO. The proxy RMSEs are below this range and indicate that they canbe used for model filtering. However, it is recognized from comparison of Figure 8A&B with 8C thatrejection will be inaccurate in certain regions of parameter space where the proxy error is larger than theRMSE. These errors can be mitigated at this stage only through additional parameter sampling, simulationand proxy re-conditioning. At the other extreme of minimal proxy error, during parameter rejection it isimportant not to define the error tolerance at a value lower than the RMSE.

History Matched Model DefinitionA DoE-based AHM approach will result in an ensemble of history matched models that pass all historymatching error filters, one filter per metric (or response surface). This approach is used for bothdeterministic and probabilistic history matching. In the deterministic case, the selected model may havethe lowest history matching error of the ensemble, may target the most likely parameter values relativeto their uncertainty distributions, or may otherwise embody fit-for-purpose reservoir qualities targeted bythe asset team.

The following sub-sections explain and demonstrate a generic workflow for performing the filteringprocessing. Notably, strengths and inadequacies of this approach that are inherent to any forwardmodeling-based AHM algorithm (e.g., genetic algorithms), and which typically evade discussion inpresentation of AHM results, are emphasized.

Monte Carlo Sampling The first step to the forward approach of history matching is to exhaustivelysample calibration parameter combinations in a Monte Carlo approach. For each location sampled inparameter space, all history matching error metrics are computed. As discussed, an analytical or numericalproxy model is used to map out the response surface of history matching errors to alleviate thecomputational burden of reservoir flow simulation. The proxy with the best predictive performance, perindividual error metric, should be used. In this study predictive performance is measured using blindtesting, described below.

Typically, between 104 and 106 locations are sufficient for exhaustive sampling. It is necessary toperform a sufficient number of samples such that the posterior distribution of each parameter (i.e., afterrejection) is stable, an objective that is straightforward to perform in this workflow simply andefficiently sample more parameter combinations using the proxy model(s). Accordingly, the Monte Carlosampling must honor the prior uncertainty distribution of each parameter, as well as any associatedcorrelation(s), so that a consistent comparison of each prior distribution can be made against eachposterior distribution (corresponding to the history matched model ensemble). For this it is additionally

20 SPE-170690-MS

important to consider if discrete versus continuous parameters were used to condition the proxy. Althoughthe continuous nature of proxy models permit sampling from assumed continuous distributions, evenwhen constructed for a discrete parameter, in this case the discrete distributions should be honored duringMonte Carlo sampling.

Rejection Filtering Either during or following Monte Carlo sampling of the parameter combinations, arejection threshold is applied per history matching error metric to screen out those combinations with poorhistory match quality. Two thresholds are applied per error metric, a maximum threshold above which thecorresponding parameter set is screened out, and a lower threshold (defined from proxy error) belowwhich no filtering can be performed.

Selection of the maximum rejection threshold can be subjective and is practically challenging.Techniques have been proposed to assist the engineer in this process, although these have been found oflimited application during this benchmarking study. For example, the approach of systematic filteringapplied in Billiter et al. (2008) identifies rejection thresholds by sequentially decreasing the maximumallowable history matching error until the cumulative probability of achieving one or more associatedproduction forecast metrics becomes constant for the filtered ensemble. Although practical in the sensethat the statistical representativeness of a model is independent of the number of samples passing allfilters, or is statistically stable, this approach disconnects history match quality from model selection.

Recent approaches based on the Pareto front concept appear more promising for application in historymatched model filtering, although limitations currently appear undocumented. For reasons to be de-scribed, this benchmarking study has assumed use of the Pareto front approach for all DoE-based AHMstudies, the methodology of which is presented in the following sub-section. Ultimately, however, it isconcluded that the engineer cannot avoid the manual and, to some extent subjective, definition of themaximum tolerable history matching error for one or more data types. To date, the authors have notidentified a defensible (semi-)automated approach for model rejection.

To define the minimum history matching error that can be used as a rejection threshold, there are twopossibilities. The first is simply to set the proxy error or confidence interval as the minimum thresholdbelow which filtering can no longer be performed. That is, history match quality can only be as good asproxy accuracy in the regions of error minima. The second and uncommon approach to minimum errortolerance definition is by use of a statistical measure of the historical data error or noise. In this case theproxy error would be less than the data error, likely due to over-fitting of the regression model. Such atolerance would ensure that history match quality is only as good as data accuracy. Again, for thresholddefinition, the data error, the data misfit error and the proxy error must all be quantified in consistent unitsand the threshold identified as the largest of the three errors. RMSE is used to quantify all metrics in thisstudy.

The final step in filtering is to compare the posterior parameter distributions (from models passing allfilters) with the prior distributions (derived from all Monte Carlo samples). It should be confirmed thatthe posterior distributions are either reproduced, albeit less densely, or that uncertainty is reduced. Theseexamples are demonstrated in the application below. Only if a posterior distribution appears truncated atone of the tails should the prior assumptions be revisited and the workflow possibly recycled. Thisindicates an incorrect prior assumption and leaves the possibility that the true solution space remainsunsampled, thereby biasing subsequent selection of discrete models.

Pareto Front Pareto front analysis is known to be a powerful technique for multi-objective optimi-zation, which defines the history matching problem when multiple error metrics are simultaneouslyconsidered as model selection constraints (Park et al., 2013). In this approach, the reservoir modelparameters are mapped into the objective space, which in an AHM application is defined by the set ofhistory matching error metrics per parameter combination (Figure 9). In objective space the models canthen be ranked according to their level of dominance, and the Pareto optimal set of models identified as

SPE-170690-MS 21

those corresponding to the non-dominated front along which there are no combinations of error metricsof lower value. Success of the approach assumes that the objectives are conflicting. That is, along thePareto optimal front, an improvement in one metric must result in degradation of the other metric, and viceversa, thus defining an optimal trade-off curve between all error metrics.

From this perspective, it is unnecessary to define a maximum error threshold for model filteringbecause the selection of models along the Pareto optimal front objectively defines the set for which nolower history matching error is achievable. Although AHM algorithms, using the class of evolutionarytechniques in particular, follow this logic, practical considerations remain unaddressed. These are:

Do all models corresponding to the Pareto optimal front have an acceptable history match qualityfor all metrics?

Although sub-optimal, do models corresponding to the dominated fronts (Figure 9) also yieldacceptable history match quality?

If the Pareto optimal front spans a large objective space or is nearly linear, at which points shouldthe front be terminated for model selection?

How should models be selected in the case that a front does not exist because the different errormetrics are non-conflicting relative to the calibration parameters?

How should models be selected in the case that a front does not exist because the parametersolution for one data type does not overlap with the solution set for another data type?

These challenges were encountered during this benchmarking study and are in part exemplified in thefollowing application section. To address them as part of a robust model selection workflow, the outcomeis that the engineer should manually define the maximum error threshold per metric simply due to the factthat Pareto-based analysis does not provide any indication of history match quality. Although the Paretofront technique is shown to have use for both understanding of the relationship between different errormetrics and for identifying the history matched model ensemble, intervention by the engineer is rightfullyalways required. This is best achieved by visualizing the history match quality of all simulations used forOVAT SA, effects calculation and response surface construction, and identifying an acceptable errormetric tolerance for each data type. Although a manual process, the understanding of parameter - responserelationships gained during this exercise is valuable and not something that can be achieved using anautomated history matching technique.

Field Application Continuing from the prior exercise of proxy model construction, 5 105 parametercombinations were Monte-Carlo sampled using the (correlated) prior uncertainty distributions. As thesamples were collected, each associated proxy-based history matching error was computed and the

Figure 9Schematic depicting mapping of model responses, in this case history matching error metrics, from their location in parameter space (forparameters X1 and X2) to objective space (for objectives f1 and f2) for identification of (non-)dominated Pareto fronts (after Park et al., 2013).

22 SPE-170690-MS

parameter combinations rejected if any of the errors were above the threshold per metric. Table 1 liststhese rejection thresholds per well and per observation data type. The entry NR indicates that theassociated filter is not required (for reasons discussed below), and the entry FP indicates that theassociated proxy was of insufficient quality to use for parameter rejection. Development of this table wasachieved by a joint comparison of the Pareto front for each pair of history matching metrics with a visualcomparison of the production data misfit (observed vs. simulated) based on the (500) simulation runs usedfor proxy conditioning. Construction and analysis of the Pareto fronts reveals (in theory) the models thatyield the optimal tradeoff between all error metrics. From this perspective, the maximum error thresholdper metric can be objectively reduced until one (non-dominated) or more (dominated) fronts are defined.For example, Figure 10 shows all Monte Carlo samples within a two-error objective space and the impactof applying a rejection threshold per error metric. Identification of the error tolerances below which adominated front is acceptable is subjectively determined from visual inspection as described above. Forexample, using the simulations applied for proxy construction, Figure 11 demonstrates that it can bestraightforward to identify the error tolerance that partitions model realizations with unacceptable historymatch quality. It is also noted that subjective visual discernment is justified in history matching workflowsbecause selection of the historical observation data themselves can be subjective, particularly in the caseof high-temporal-resolution data or noisy data.

What concurrent inspection of the Pareto fronts uniquely provides is an understanding of whichhistorical data to include in the rejection process, hence the NR entries in Table 1. During benchmarking,it was found that a solution set based solely on Pareto optimality could not provide an ensemble of historymatched models because a front did not actually exist across all objectives, or across all error metrics. Thisoccurs because the solution space for one set of error metrics does not consistently map to the solutionspace of other error metrics. To demonstrate these concepts, the following figures depict the historymatched parameter combinations within the entire objective space for difference error metrics. Figure 12Ashows the case where a well-defined Pareto front does not exist. The two error metrics are non-conflictingor are correlated, indicating that the mechanisms controlling the associated flow behavior are consistent

Table 1A list of the maximum error thresholds applied during Monte Carlo rejection sampling, per well and per observation data type. Theentry NR indicates that the associated filter is not required, and the entry FP indicates that the associated proxy was of insufficient quality to

use for parameter rejection.

Figure 10Monte Carlo parameter samples are shown at their location in objective space (horizontal and vertical axes) for two history matchingerror metrics. The parameter samples in blue correspond to models that pass one or more rejection thresholds

SPE-170690-MS 23

for each. Rejection of models using only one of these metric thresholds is required, and is one source ofthe NR entries in Table 1, although both thresholds in this case can be applied if desired. Importantly, hadan automated Pareto-optimal algorithm been applied for model rejection without identification andmitigation of this correlation, an inappropriately small number of non-dominated models may have passedthe rejection criteria. On the contrary, Figure 12B shows the most commonly observed filtered-frontbehavior in which the non-dominated front does not exist for these two metrics because the associatedmodels have been rejected in another dimension (of objective space). At the other end of maximum ratherthan minimum error, the associated thresholds are also not identifiable (as they are in Figure 11) because

Figure 11Identification of history matching error tolerance for an individual historical data type (well water production rate) by sequentialreduction of error distribution percentiles until match quality is visually acceptable

Figure 12Various examples of Pareto front behavior, or lack thereof, when multiple data thresholds are used to reject models from the completeset of Monte Carlo samples (grey). The parameter samples in blue correspond to models that pass all rejection thresholds

24 SPE-170690-MS

the poorest solutions of these two error metrics have been rejected as a result of rejection in anotherdimension. This is the se

spe 170690

Documents

reservoir management

history matchingworkflow

petroleum literature

diverse techniques

design of experiments

doebased approach

doebased technique

benchmarkingestablished