cs229 final writeupcs229.stanford.edu/proj2016spr/report/028.pdf · production rate ( mcf / day )...

5
CS229 Final Project Shale Gas Production Decline Prediction Using Machine Learning Algorithms Wentao Zhang [email protected] Shaochuan Xu [email protected] In petroleum industry, oil companies sometimes purchase oil and gas production wells from others instead of drilling a new well. The shale gas production decline curve is critical when assessing how much more natural gas can be produced for a specific well in the future, which is very important during the acquisition between the oil companies, as a small under estimate or overestimate of the future production may result in significantly undervaluing or overvaluing an oilfield. In this project, we use the Locally Weighted Linear Regression to predict this future production based on the existing decline curves; Then, we apply the K means to group the decline curves into two categories, high and low productivity; Moreover, Principal Component Analysis is also tried to calculate the eigenvectors of the covariance matrix, based on which we also predict the future production both with Kmeans as a preprocess and without Kmeans. At last, three methods are compared with each other in terms of the accuracy defined by a standardized error. Dataset The data we used are monthly production rate curves of thousands of shale gas wells as below. In order to deal with different lengths of different curves, and the 0 production rate data points of some curves, we modify these data a little bit. We substitute 0 data points in any curve by a very small number 0.0001, and we make all the curves the same length by adding zeros to the end, for the sake of being loaded into MATLAB as a matrix. Figure 1 2152 decline curves used to learn to predict production in the future Locally Weighted Linear Regression (LWLR) Our goal is to predict the future gas production of a new well, given its historical production data and information from other wells with longer history. Suppose that we randomly choose a decline curve r with n months in total. We want to use the first l month to 0 20 40 60 80 100 120 140 month 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 production rate ( mcf / day ) shale gas production decline - all curves

Upload: others

Post on 13-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS229 Final Writeupcs229.stanford.edu/proj2016spr/report/028.pdf · production rate ( mcf / day ) Well 1219 - LWLR known curve predicted curve test curve 20 40 60 80 100 120 month

CS229        Final  Project    

Shale  Gas  Production  Decline  Prediction  Using  Machine  Learning  Algorithms    

Wentao  Zhang  [email protected]  Shaochuan  Xu  [email protected]  

 In  petroleum  industry,  oil  companies  sometimes  purchase  oil  and  gas  production  wells  from  others  instead  of  drilling  a  new  well.  The  shale  gas  production  decline  curve  is  critical  when  assessing   how   much   more   natural   gas   can   be   produced   for   a   specific   well   in   the   future,  which  is  very  important  during  the  acquisition  between  the  oil  companies,  as  a  small  under-­‐estimate  or  over-­‐estimate  of  the  future  production  may  result  in  significantly  undervaluing  or  overvaluing  an  oilfield.  In  this  project,  we  use  the  Locally  Weighted  Linear  Regression  to  predict   this   future  production  based  on   the  existing  decline  curves;  Then,  we  apply   the  K-­‐means  to  group  the  decline  curves  into  two  categories,  high  and  low  productivity;  Moreover,  Principal   Component   Analysis   is   also   tried   to   calculate   the   eigenvectors   of   the   covariance  matrix,   based   on   which   we   also   predict   the   future   production   both   with   K-­‐means   as   a  preprocess   and  without  K-­‐means.  At   last,   three  methods  are   compared  with  each  other   in  terms  of  the  accuracy  defined  by  a  standardized  error.    

l Dataset  The  data  we  used  are  monthly  production  rate  curves  of  thousands  of  shale  gas  wells  as  

below.  In  order  to  deal  with  different  lengths  of  different  curves,  and  the  0  production  rate  data  points  of  some  curves,  we  modify  these  data  a  little  bit.  We  substitute  0  data  points  in  any  curve  by  a  very  small  number  0.0001,  and  we  make  all   the  curves   the  same   length  by  adding  zeros  to  the  end,  for  the  sake  of  being  loaded  into  MATLAB  as  a  matrix.  

 Figure 1 2152 decline curves used to learn to predict production in the future  

 l Locally  Weighted  Linear  Regression  (LWLR)  Our   goal   is   to   predict   the   future   gas   production   of   a   new   well,   given   its   historical  

production   data   and   information   from   other   wells   with   longer   history.   Suppose   that   we  randomly  choose  a  decline  curve  r  with  n  months  in  total.  We  want  to  use  the  first  l  month  to  

0 20 40 60 80 100 120 140 month

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

pro

duct

ion

rate

( m

cf /

day

)

shale gas production decline - all curves

Page 2: CS229 Final Writeupcs229.stanford.edu/proj2016spr/report/028.pdf · production rate ( mcf / day ) Well 1219 - LWLR known curve predicted curve test curve 20 40 60 80 100 120 month

predict  the  rest  (n-­‐l)  months  of  the  curve.  In  order  to  find  curves  from  the  training  set  that  are  “similar”  to  r,  we  define  the  distance  between  two  curves  by  squared  L-­‐2  norm.  Before  we   calculate   the   distance,   we   need   to   filter   the   training   set   by   removing   curves   whose  history  is  shorter  than  n.  Then  we  pick  k  wells  from  the  filtered  training  set  that  are  closest  to  r,  give  each  of  them  a  weight  wi  and  make  prediction  for  r  as:    

!!

fpredicted =w(d( fpast _existing(i ) , fmeasured )/h)⋅ f future_existing(i )

i∈neighbk ( fpast _existing )∑

w(d( fpast _existing(i ) , fmeasured )/h)i∈neighbk ( fpast _existing )

∑  

Where  h  is  the  longest  distance.    Results:  

 

 

 Figure 2 High-productivity, good fit (upper left); High-productivity, bad fit (upper right);

Low-productivity, good-fit (lower left); Low-productivity, bad fit (lower right).    We  restrict  the  number  of  neighbors  k  equal  to  3.  In  Figure  2,  four  typical  predicted  curves  are   shown.   The   results   are   generally   consistent   with   the   real   values.   Comparatively,   the  predicted   curves   are   smoother   than   the   real   ones,   because   the  predictions   are   the   sum  of  multiple  training  wells.      

 20 40 60 80 100 120

month0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )

Well 1219 - LWLR known curve predicted curve test curve

20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )

Well 1219 - LWLR known curve predicted curve test curve

Page 3: CS229 Final Writeupcs229.stanford.edu/proj2016spr/report/028.pdf · production rate ( mcf / day ) Well 1219 - LWLR known curve predicted curve test curve 20 40 60 80 100 120 month

 Figure 3 Predicted curves with l (known months) increasing for a good fitting; error vs. l  

 

 

 Figure 4 Predicted curves with l (known months) increasing for a bad fitting; error vs. l  

 In  Figure  3  and  Figure  4,  we  change  the  known  curve  from  short  to  long,  and  plot  the  error  versus   the   known  months.   The   error   does   not   decrease  when  we   know   longer   curve   and  predict  shorter.  The  reason  might  be  that  the  Standardized  Error   is  defined  as  the  average  relative   error   of   predicted  months.   For   this   reason,   in   the   tail   of   the   curve,   since   absolute  values  are  small,  relative  errors  are  easily  to  be  large.  A  better  error  needs  to  be  defined  if  we  really  want  to  tell  if  the  prediction  is  better  with  longer  known  curve.    

l Principal  Component  Analysis  (PCA)  Since  each  well  has  a  history  as  many  as  tens  of  months,  intuitively,  we  want  to  reduce  

the  dimensions  of   time  and  keep   the   intrinsic   components   that   reflect  production  decline.  First  of  all,  we  filter  the  training  set  by  removing  the  wells  whose  history  is  shorter  than  the  total  months  n  of  a  test  curve.  After  the  normalization  on  the  data,  we  eigen-­‐decompose  the  empirical  covariance  matrix  and  extract  the  first  5  eigenvectors  as  the  principal  components.  Then,  we  fit  the  known  part  of  the  test  curve  by  using  a  linear  combination  of  5  eigenvectors.  The  coefficients  θ of  the  linear  combination  are  calculated  from  linear  regression,  

known ly U θ=    

And  we  predict  the  future  decline  curve  as,  

estimate hy U θ=    

20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )

Well 1219 - LWLR known curve predicted curve test curve

20 40 60 80 100 120 Month

0

0.2

0.4

0.6

0.8

1

1.2

Sta

ndar

lized

Erro

r

Well 1219 - LWLR

20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )

Well 1923 - LWLR known curve predicted curve test curve

20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )

Well 1923 - LWLR known curve predicted curve test curve

20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )

Well 1923 - LWLR known curve predicted curve test curve

20 40 60 80 100 120 Month

0

0.2

0.4

0.6

0.8

1

1.2

Sta

ndar

lized

Erro

r

Well 1923 - LWLR

Page 4: CS229 Final Writeupcs229.stanford.edu/proj2016spr/report/028.pdf · production rate ( mcf / day ) Well 1219 - LWLR known curve predicted curve test curve 20 40 60 80 100 120 month

Where  yknown∈Rl  is  the  normalized  known  history  of  the  test  well,  Ul∈Rl*5  is  the  eigenvectors  with  the  first  l  dimensions.  Our  estimation  is  therefore  yestimate.    Results:    

 

 Figure 5 High-productivity, good fit (upper left); High-productivity, bad fit (upper right);

Low-productivity, good-fit (lower left); Low-productivity, bad fit (lower right).    As  can  be  seen  from  Figure  5,  the  prediction  is  either  too  smooth  or  too  variant  compared  to  the  real  data.  This  is  because  at  the  fitting  step,  θ   is  either  underfitted  (high  variance)  or  overfitted  (low   variance).   Another   problem   in   PCA   is   that   all   the   training  wells   have   contribution   to   the  estimation,  which  makes  it  unprecise  for  very  high  or  low  production  prediction.    

l PCA  after  K-­‐means  If   we   assume   high-­‐productivity   wells   are   similar   to   each   other   and   low-­‐productivity  

wells  are  similar  to  each  other,  we  can  group  all  the  decline  curves  into  two  categories.  We  modify   the   K-­‐means   method   to   be   applied   into   this   real   situation   that   different   decline  curves  have  different  dimensions.  We  calculate  the  distance  between  a  centroid  and  a  curve  by   using   the   dimension   of   the   shorter   one.   As   comparing   two   figures   in   Figure   6,   this  modified  K-­‐means  method  is  good  enough  to  distinguish  high-­‐productivity  wells  from  low-­‐productivity  wells.    

     Figure 6 Decline curves in the high productivity wells (left) and the low productivity wells (right)  

Page 5: CS229 Final Writeupcs229.stanford.edu/proj2016spr/report/028.pdf · production rate ( mcf / day ) Well 1219 - LWLR known curve predicted curve test curve 20 40 60 80 100 120 month

Then,  we  run  PCA  again  after  clustering  the  original  decline  curves  by  K-­‐means.    

 

 Figure 7 High-productivity, good fit (upper left); High-productivity, bad fit (upper right);

Low-productivity, good-fit (lower left); Low-productivity, bad fit (lower right).    From   Figure   7,   we   can   see   that   although   the   underfitting/overfitting   problem   still   exists,   the  results   are   better   than   the   original   PCA.   This  might   be   due   to   we   add   the   L-­‐2   norm   distance  information  into  the  PCA,  which  makes  it  an  integrated  method.    

l Discussion    

 Figure 8 Errors of three methods calculated from Leave-One-Out cross validation  

 We  apply  Leave-­‐One-­‐Out  cross  validation  to  all  the  three  methods,  compare  the  predictions  with  real  production  data  and  calculate  the  average  relative  errors  as  in  Figure  3  and  Figure  4.  We  also  define  a  threshold  value  to  avoid  the  extremely   large  errors.  The  reason  we  do  this   is  that  one  extreme  value  can  make  the  average  of  all  relative  errors  really  huge,  but  these  extreme  values  are  due  to  the  shutting  down  periods  of  the  wells  (when  the  production  is  nearly  zero).  Figure  8  verifies  our  intuition  that  LWLR  is  the  best  among  the  three  methods  because  no  information  is  lost   due   to   dimension   reduction.   PCA   has   the   largest   relative   error   among   three   methods  because   higher   order   principal   characteristics,   reflecting   the   details   of   decline   curves,   are   not  included.  K-­‐means  helps  cluster  the  wells  into  high  and  low  productivity  classes,  which  improves  PCA  with  the  availability  of  that  prior  information.