machine learning - verdazo€¦ · 31/05/2018 · different types of machine learning supervised...
TRANSCRIPT
![Page 1: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/1.jpg)
Machine Learning
Practical Use in Upstream Oil & Gas
SPE Oil and Gas Analytics Breakfast Series
May 31, 2018
1
![Page 2: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/2.jpg)
Outline
2
1) Machine Learning Hype
2) What is Machine Learning?
3) Feature Importance
4) Machine Learning Challenges
5) What We’ve Found to Work Well in Practice
6) Machine Learning Power
7) Case Study 1: Predicting Reservoir Rock Properties
8) Case Study 2: Drilling Location & Completion Optimization
9) Conclusions
![Page 3: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/3.jpg)
1) Machine Learning Hype
3
![Page 4: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/4.jpg)
Gartner Hype Curve for Analytics & BI
4
Visual Data Discovery
Exp
ecta
tio
ns
Time
Predictive Analytics (Machine Learning)
![Page 5: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/5.jpg)
Top Two Red Flags Signaling Your Analytics Program will Fail
1) Executive team doesn’t have a clear vision for its advanced
analytics program
• CEO regularly mentions the company is using artificial intelligence or machine
learning, but never any specifics
2) No one has determined the value that initial use cases can deliver
within the first year
• Large-scale projects have long time-to-value, high chance of failure…and are
expensive!
(Source: McKinsey)
5
![Page 6: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/6.jpg)
How I Got into Machine Learning
6
Source: xkcd
![Page 7: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/7.jpg)
Algorithmic Trading System Development Process
7
Price Data
Volume Data
Technical Indicators
Machine Learning
Trading Signals
(BUY or SELL)
Risk & Position Sizing
Rules
Automated Trading
System
Optimize ProfitabilitySource: The Wolf of Wall Street
![Page 8: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/8.jpg)
Oil & Gas Development Planning Process
8
Reservoir Data
Completion Data
Drilling Data
Machine Learning
Performance
Predictions
Development
Possibilities, Costs,
Commodity Prices
Development Plan
Optimize NPV
Source: The Wolf of Wall Street
![Page 9: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/9.jpg)
2) What is Machine Learning?
9
![Page 10: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/10.jpg)
What is Machine Learning?
Machine Learning (ML) is a field of computer science that uses
statistical techniques and algorithms to give computers the ability
to “learn”:
• with respect to some task
• to optimize one or more performance measures
• without being explicitly programmed
10
the magic
![Page 11: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/11.jpg)
What Machine Learning Isn’t
• New
• It’s been around in the form of statistical models for centuries
• Smarter than us
• It can’t think laterally and has no understanding of causality
• Able to overcome the laws of physics, statistics, etc.
11
![Page 12: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/12.jpg)
Different Types of Machine Learning
Supervised LearningTraining a model by example – predicting an outcome using data where examples
of input-output pairs (“correct answers”) can be provided
Unsupervised LearningUsing a model to draw inferences from unlabeled data to describe structures or
patterns
Reinforcement LearningGiven a specific environment/context and a feedback mechanism, a model
automatically tries to determine optimal behavior
12
Today’s focus
![Page 13: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/13.jpg)
Supervised Learning
13
Today’s focus
Predict which group
each sample
belongs to
Predict some
continuous value
output
Classification Regression
What is the sugar
content of an
orange in grams
given its size, color,
weight, age, etc.?Based on its
characteristics, is it an
apple or an orange?
![Page 14: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/14.jpg)
Supervised Learning: Regression
y = f(a, b, c, …)
14
The goal of regression is to find the function ‘f’
![Page 15: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/15.jpg)
Supervised Machine Learning Terminology
1) Target: what we want to predict, with examples of the correct answers provided
y = f(a, b, c, …)
2) Features: inputs used by the learning algorithm in the training process to form a
predictive model for the Target
3) Model: the algorithm(s) used to calculate a prediction of the Target
15
Model
![Page 16: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/16.jpg)
Supervised Machine Learning Terminology
4) Training: the process that uses Features and Targets to inform a predictive Model
5) Feature Importance: how much impact a Feature has on the predictive power of a
Model
6) Fitness Function: a measure of the predictive power of a Model (i.e. this is what we
want to optimize, like R2 or Mean Absolute Error)
16
![Page 17: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/17.jpg)
3) Feature Importance:The Real Prize?
17
![Page 18: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/18.jpg)
The Importance of Feature Importance
18
Helping us understand what matters most
from a feature perspective can direct:
• our time and attention
• data acquisition and quality efforts
• which features we may want to remove
from our dataset
![Page 19: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/19.jpg)
The Many Faces of Feature Importance
19
Linear RankInformation Theoretical
Statistical Impurity Permutation
Dropout Additive Recursive
Bayesian GroupedDomain (Physics)
![Page 20: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/20.jpg)
Features: Timing and Influence
20
• Which features are we able to influence rather than simply measure?
• Which features are knowable at each stage of the process?
• The features we know earliest are the ones that can guide our decisions the
soonest
![Page 21: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/21.jpg)
Feature Interdependence (Correlation) – A Challenge
21
1) It’s hard to understand and separate the effects
of two or more features that are correlated
2) It’s harder to intelligently select which features to
include in a predictive model
Some are obvious:
• Total proppant and total fluid volume
Some aren’t:
• Total proppant and location within field
![Page 22: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/22.jpg)
Handling Correlated Features: Stage 1 of 4
22
Stage 1: Denial
• Let’s pretend we can just evaluate
the impact of all these different
features independently
• That would be so much easier…
![Page 23: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/23.jpg)
Handling Correlated Features: Stage 2 of 4
23
Stage 2: Acceptance
• Accepting that some features are correlated,
and that this must be part of our overall
understanding, is valuable in itself
• Handling correlated features is difficult, but
necessary
![Page 24: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/24.jpg)
Handling Correlated Features: Stage 3 of 4
24
Stage 3: Analysis
• Measure feature correlations in different
ways
• Linear
• Rank
• Mutual Information
• Look for instances of unexpected
correlation (or lack of correlation)
• Identify feature groups where information
is unique and where it’s redundant
Read more: Multivariate Analysis Using Advanced Probabilistic Techniques for Completion Optimization
![Page 25: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/25.jpg)
Handling Correlated Features: Stage 4 of 4
25
Stage 4: Understanding
• There is no “cure for” or “solution to” feature correlation
• Understanding which features are strongly correlated informs feature selection and helps
us intelligently reduce dimensionality
• We want maximum relevance with minimum redundancy
• Understand how features are correlated (linearly, ordinally, mutual information, etc.)
• Noting moderate correlations may uncover hidden insights
• e.g., so far we’ve only drilled longer laterals in areas of poorer reservoir quality, so we should be
cautious in making generalizations
![Page 26: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/26.jpg)
Feature Grouping
26
• Feature Grouping can help us understand
which broader factors matter most (e.g.,
geology, pressure, lateral length, completion
design parameters)
• e.g., how important are all the completion
parameters in aggregate versus all the
geological parameters in aggregate?
Geological Geophysical
Completion
Design
Proximal
Production
![Page 27: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/27.jpg)
4) Machine Learning Challenges
27
![Page 28: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/28.jpg)
Challenge: Data
• Data quantity
• Data quality
• Missing data
• Data normalization/calibration
• Outlier treatment
• Data matching (when integrating datasets)
• Representativeness of the data
28
![Page 29: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/29.jpg)
Challenge: Communication, Transparency, Domain Expertise
29 Read the blog: Machine Learning: Is it really a Black Box?
Communication
• Terminology
• Turning results into recommendations
Transparency (i.e. black box syndrome)
• Explaining the result
• It’s easy to get an answer, but tough to back it up
Domain Expertise
• Understand data choices
• Evaluate results
• Have a clear goal
• Choose appropriate predictive performance measures
![Page 30: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/30.jpg)
Challenge: Underfitting/Overfitting
30
Sign of underfitting: poor model fit
during training and poor model fit
on new, unseen data
Missing Relevant Relations Good Generalization Fitting the Noise
Sign of a good fit: good model
fit during training and good
model fit on new, unseen data
Sign of overfitting: excellent
model fit during training and poor
model fit on new, unseen data
Source: https://pythonmachinelearning.pro
![Page 31: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/31.jpg)
Wait….What is the Goal?
31
Are we trying to get the best fit to the data?
or
Are we trying to get the best predictive capability?
Read the blog: Machine Learning: Finding the signal or fitting the noise?
![Page 32: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/32.jpg)
Validation: Why Do We Do It?
32
Three main purposes:
1) To estimate the predictive capability (error) our model will have in
the future
2) To reduce the chance our model will “overfit” our available data
3) To get an indication of the predictive limits of our dataset
![Page 33: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/33.jpg)
full dataset
testtraining
Validation Example: K-Fold Cross-Validation
33
K-fold cross-validation is one technique
used to estimate the predictive capability
of a machine learning model on data it
has not yet seen
in-sampleout-of-sample
1
3
2
4
5
![Page 34: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/34.jpg)
Validation Provides Predictive Capability Estimates
34
full dataset
testtraining
full dataset
test
training
full dataset
testtraining
testset
5-f
old
cro
ss-v
alid
atio
n
test set validation
Bad(overfitting likely)
Best(cross validation + test set)
Better(cross validation)
training
![Page 35: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/35.jpg)
Post-Validation Out-of-Sample Testing: Why Do We Do It?
35
• It is entirely separate from the iterative training, validation and model
selection process
• It provides one last “sanity check” to make sure the training, validation and
model selection process was sound
• It is as close to a live, real-world test of a model as we can get
• Unfortunately, the only way to truly test a model’s future predictive power
is to actually use it to make predictions about the future, wait for the
results, and then measure how well it did
![Page 36: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/36.jpg)
Noise and Bias in the Data
36
Noise: Unexplained variability within a data sample
• Measurement precision
• Data processing
Bias: Systematic difference between measurement and true value
• Selection bias, analytical bias, survivorship bias, observer bias, …
• Perverse incentives, laziness, career risk, honest mistakes, …
![Page 37: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/37.jpg)
Illustration 1: A Simple Equation
37
Equation: Y = 20A + B3 + 10eC
A is a random integer between 0 and 10 from a uniform distribution
B is a random integer between -10 and 10 from a uniform distribution
C is a random real number between -5 and 5 from a normal distribution
We are predicting target Y, given features A, B and C
![Page 38: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/38.jpg)
Noise and Bias Added to the Data
38
Model 1: Trained with original A, B, C values
Model 2: A, B, C values with 20% noise
Model 3: A, B, C values with 10% noise, plus 10% bias
(half the values for each of A, B, C were biased upward and half were biased downward)
![Page 39: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/39.jpg)
Noise and Bias in Data Reduce Model Predictive Power
39
Original 20% noise 10% noise and 10% bias
Almost all oil & gas data is both noisy and biased to some degree
![Page 40: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/40.jpg)
Missing Features Reduce Model Predictive Power
40
Original No C values provided No B values provided
This illustrates the inherent predictive limits of a dataset with important information
missing (e.g., an attempt at completion optimization with no geological data)
![Page 41: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/41.jpg)
5) Our Approach General Principles We’ve Found to Work Well in Practice
41
![Page 42: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/42.jpg)
Where We Spend Our Time
42Visualization supports all stages of our process
Machine Learning
10%
Data Preparation
20%
Analysis & Building
Understanding
30%
Discussing
40%
![Page 43: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/43.jpg)
Never Talk About the In-Sample Training Fit
• Out-of-sample predictive power is the goal, not in-sample training
fit perfection
• Modern ML algorithms can achieve a nearly perfect in-sample
training fit to virtually any dataset (overfitting)
• Presenting an unrealistic R2 value can inflate expectations of a
predictive model to unachievable levels
43
![Page 44: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/44.jpg)
Make Sure We Have Enough Data
Three equally unsatisfactory answers to “How much data is enough?”
1. There is no right answer – it is unknowable
2. More is always better – we can never have enough
3. It depends…
Factors that can affect sample size requirements
• Complexity of the problem (e.g., nonlinearities, dependencies, empirical understanding)
• Number of features
• Range and distribution of feature and target data
• Data quality, cleanliness, representativeness
• Complexity of the machine learning algorithm(s) used44
![Page 45: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/45.jpg)
Make Sure We Have Enough Data
Our rule of thumb
If a dataset has fewer than 200 samples, there’s a good chance it isn’t well -suited for
machine learning…
…however, useful insights can still be found in smaller datasets (e.g., feature
importance)
45
![Page 46: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/46.jpg)
Understand the Predictive Limits of Our Dataset
46
A “good” R2 value (model fitness) could be R2=0.10 for one dataset
and R2=0.70 for another
• How predictable is our target?
• How relevant are our available features to our target?
• How many samples do we have?
• How good is the quality of our data?
• How good are existing predictive models?
• How good do predictions need to be for the model to be useful to us?
![Page 47: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/47.jpg)
Linear
RegressionNeural Networks Random Forests
Gradient Boosted
Trees
Genetic
Programming
Support Vector
Machines
K-Nearest
Neighbours Smart Scaling
Feature Engineering
Encoders Bayesian PCA/ICA
Use a Library of Algorithms
No single ML algorithm is best for all cases
47
Learning algorithms in combination with preprocessing algorithms generally perform better
![Page 48: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/48.jpg)
Aside: Neural Networks
48
• We include some neural networks in our library of
algorithms
• There are many different classes of neural networks
with essentially infinite configurations
• Neural networks perform best with very large datasets
• On smaller datasets, including most oil & gas
datasets, we’ve found neural networks are usually
outperformed and out-generalized by other methods
(at least within a reasonable amount of time)
Source: https://leonardoaraujosantos.gitbooks.io/artificial-inteligence
![Page 49: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/49.jpg)
Wisdom of the Crowd
49
• Ensembles of models almost
always outperform single-
algorithm models
• Ensembles protect against
any individual model’s
weaknesses or biases
• An iterative evolutionary
approach to creating,
optimizing, and validating
these ensembles has
consistently yielded our best
results
https://www.analyticsvidhya.com/blog/2015/08/optimal-weights-ensemble-learner-neural-network/
Model 1
Model 2
Model 3
Model 4
Ensemble Model
Model 5
Input Data
Predictions
![Page 50: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/50.jpg)
Use Supporting Visualizations
50
![Page 51: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/51.jpg)
Focus on Time to Value
51
Insights from machine learning can and should be realized and applied to
decisions in weeks or months, not years
• Time has a significant opportunity cost if ML benefits can’t be realized quickly
• Time has a significant real cost
• worker-hours
• salaries
• software fees
• Well-defined shorter term projects have a much better cost-benefit profile than trying to
“boil the ocean” with machine learning
![Page 52: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/52.jpg)
Domain Expertise is Critical
Domain expertise lets us make sure the results make sense
• Reduces time spent chasing spurious relationships
• Enables a quicker understanding of “why”
The best domain experts are the technical teams working on the problems every day52
Source: xkcd
![Page 53: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/53.jpg)
Domain Expertise is Critical
The laws of physics still matter
• Healthy skepticism to any data-driven approach is good, especially when data is noisy,
sparse, biased, incomplete, etc.
But, sometimes there are genuine unexpected findings in the data that
warrant challenging the status quo…isn’t this part of our goal?
53
![Page 54: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/54.jpg)
6) Machine Learning Power
54
![Page 55: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/55.jpg)
Not Everything is Linear
55
Sigmoid Function
Source: xkcd
![Page 56: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/56.jpg)
Not Everything is Continuous
56
Source: Wikipedia
Source: National Academy of Science
Source SPE-185077-MS : Verdazo
Analytics
![Page 57: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/57.jpg)
Not Everything is a Number
57
Source: http://survivestatistics.com/variables/
![Page 58: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/58.jpg)
Not Everything is Simple
58
Boss: “Look at these 80
features and tell me how
everything relates to
everything else and what
matters most”
Tyler: “Sure, give me a few
months”
ML Algorithms: “Sure, give
me a few minutes”
![Page 59: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/59.jpg)
Illustration 2: Latitude and Longitude
59
Goal: Predict Latitude and Longitude from a UWI
07 – 31 – 054 – 24 W 5
LS
D
Se
cti
on
To
wn
sh
ip
Ra
ng
e
Me
rid
ian
Boss: “please write me an Excel macro that returns
latitude and longitude, given any UWI in Western
Canada, by the end of the day”
Tyler: “yikes!”
Source: Alberta Environment and Parks
![Page 60: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/60.jpg)
Why This is Difficult
60
• Curvature and ellipticity of the Earth affect this
translation differently in different places
• Discontinuous thresholds where some numbers
get reset (LSDs, Sections, Ranges) and some
don’t (Townships), and these thresholds vary
• It’s relatively easy for a small, focused area,
but broad generalization is difficult
Source: An engineering textbook from 1897
![Page 61: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/61.jpg)
Why This is Difficult
61
![Page 62: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/62.jpg)
Model Predicted vs. Target, Out-of-Sample
62
Latitude Longitude
![Page 63: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/63.jpg)
Why This is Not Difficult for Machine Learning
63
• We know there is some connection between the features (LSD,
Section, Township, Range, Meridian) and the result
(Latitude/Longitude)
• Whenever there truly are informational relationships between the
features and the target, even if they’re very complex, modern machine
learning algorithms will almost certainly find them
![Page 64: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/64.jpg)
7) Case Study 1Predicting Reservoir Rock Properties
64
![Page 65: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/65.jpg)
Case Study 1: Predicting Rock Properties
Business Case
Populate a detailed reservoir model with reliable (core) rock property
values for development planning purposes
• To do this with coring & lab analysis is cost prohibitive
• Many existing wells have no core data, but they do have log data
• Predictions from best existing model (using wireline log data and traditional
approaches) are not as accurate as we would like
Can we use machine learning to develop a better predictive model?
65
![Page 66: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/66.jpg)
Case Study 1: Data
1) Detailed core analysis from dozens of wells → source of target values
2) Open-hole log data from the same wells → source of feature values
3) Established depth matching between core and log data
66
Sample size: ~2500
Data acquisition costs for this case study: ~$50 million
![Page 67: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/67.jpg)
Modelling Process
Example: CRISP DM Model
67
1) Predict core porosity (Target)
2) Using open hole log data (Features)
3) Use learning algorithm(s) to Train a predictive
Model using the Target and Features
4) Evaluate the predictive power of the Model using
a Fitness Function (e.g., MSE, R2)
5) Iterate
![Page 68: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/68.jpg)
Case Study 1: Feature Importance
68
A
B
C
D
E
F
G
H
I
J
K
L
M
Fe
atu
re
Features we
could probably
exclude from
our model
![Page 69: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/69.jpg)
Case Study 1: Top 4 Features
The top 4 features all have very weak linear correlation to porosity
69
![Page 70: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/70.jpg)
Case Study 1: Porosity Distribution (Density Function)
70
![Page 71: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/71.jpg)
Case Study 1: Model Comparison
71
Best Traditional Model 0.48 0.23
ML Logs-Only Model 0.61 0.37
ML Logs+Drilling Model 0.68 0.46
R R2
Best Traditional Model 17.4 E-5
ML Logs-Only Model 7.3 E-5
ML Logs+Drilling Model 6.2 E-5
Mean Squared
Error (MSE)
![Page 72: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/72.jpg)
A Similar Use Case: Generating Log Traces
72
1) Generate a predicted log trace for a
missing density or sonic log
• From other log traces
• From drilling data
• From both
2) Generate a synthetic log trace for
predicted core properties
Source: Wikipedia
![Page 73: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/73.jpg)
8) Case Study 2Optimizing Drilling Locations and Completion Designs
73
![Page 74: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/74.jpg)
Case Study 2: Optimizing Drilling Locations and
Completion Designs
74
Business Case
Decide where to drill new wells in a light tight oil play and how to complete
them to maximize NPV
• Don’t drill where we’re unlikely to be successful
• Where we do drill, use the best completion design
• Our existing ability to predict productivity from new drills has been poor
Can we use machine learning to develop a better predictive model?
![Page 75: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/75.jpg)
Case Study 2: Data
Sample size: ~300 wells
We want to predict first-six-month cumulative oil production (Target) from:
• Geological data
• Seismic data
• Completion data
• Drilling data
• Proximal production data
With these predictions in hand, we can then layer in cost and commodity price
information to optimize expected NPV
75
Features
![Page 76: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/76.jpg)
The Problem is Complex
76
• > 80 features
• Varying degrees of interdependence
• Most features could plausibly
impact well performance
• Some feature values only exist in
combination with each other –
difficult to distinguish individual
feature impacts
![Page 77: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/77.jpg)
Individual Feature Importance
77
• In this dataset, many features are
informative
• The top two individual features are
related to geology
• Different feature importance measures
yield different ranking orders
![Page 78: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/78.jpg)
Grouped Feature Importance
78
• Grouped feature importance allows
characterization of which categories of data
have the greatest impact
• In this dataset, reservoir geology appears to
have about 50% more influence than
completion design – but both are important
![Page 79: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/79.jpg)
Model Results – Predicted vs. Target
79
TestingCross-Validation
![Page 80: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/80.jpg)
Target Sensitivity to Individual Features
80
How does changing the value of just one feature impact the prediction?
(holding everything else constant)
A B
![Page 81: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/81.jpg)
Statistical Distribution of Error
81
How do the absolute error, relative (%) error and direction of error vary within the dataset?
What is the P10/P90 ratio of the error distribution?
Note: this is not the case study area, it is intended for illustrative purposes only.
![Page 82: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/82.jpg)
Spatial Distribution of Error
82 Note: this is not the case study area, it is intended for illustrative purposes only
![Page 83: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/83.jpg)
So, What Do We Get? (Final Outputs)
83
1) One or more predictive models which can be used to predict performance of future
well locations and completion designs
2) Using the model, we can generate range of predicted outcomes for each possible
drilling location (covering a variety of possible completion designs)
3) An understanding of what matters (feature importance characterization)
4) Statistical and spatial characterization of prediction error (confidence)
5) Ability to test hypotheses (e.g., “I think combining X and Y might deliver a better
production result…what would our model predict?”)
![Page 84: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/84.jpg)
Conclusions
84
![Page 85: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/85.jpg)
Conclusions
A good result relies on:
1) Domain Expertise: knowledge of the problem space, data and goals
2) Data Expertise: ability to explore, understand, select and condition data
3) Technical (Machine Learning) Expertise: ability to use tools and technology for
efficient, effective, reliable outcomes
4) Communication Expertise: ability to craft a compelling narrative from
modelling insights and make actionable recommendations
85
![Page 86: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/86.jpg)
Good News: Machine Learning Won’t Take Our Jobs
• Domain expertise is critical
• Business understanding is critical
• Communication is critical
• Oil & gas data “needs us” – it’s too
messy on its own
• Just like us, all ML models are wrong,
but sometimes they’re useful
86
Source: xkcd
![Page 87: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/87.jpg)
Thank YouTyler Schlosser
Chief Data Scientist
Verdazo Analytics
403-708-2864
Check out our blog at verdazo.com
87
![Page 88: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/88.jpg)
Appendix
88
![Page 89: Machine Learning - VERDAZO€¦ · 31/05/2018 · Different Types of Machine Learning Supervised Learning Training a model by example –predicting an outcome using data where examples](https://reader035.vdocuments.us/reader035/viewer/2022070112/605419e41cdca375887b1815/html5/thumbnails/89.jpg)
Aside: Time Series Forecasting with ML
89
• Can be framed using an empirical model (Arps)
• Can be framed as a stochastic process (Markov)
• Can be framed as a supervised learning
problem where:
• Target is the value in the future (Vt+1)
• Features are past and current target values along
with any other relevant data available at the
current time (t)
• Need to be extra careful to get good
generalization
• Is it likely to offer significant improvement?