bootstrap( - amine ouazad · bootstrap(and(the(dangers(of outliers(• exercise: –...

31
Bootstrap Econometrics A Ass. Prof. Amine Ouazad

Upload: others

Post on 14-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Bootstrap  

Econometrics  A  Ass.  Prof.  Amine  Ouazad  

Page 2: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Mices  

•  Diabe:c  mices  get  a  treatment.  Their  sugar  level  is  measured  a@er  the  treatment.  

•  The  numbers  are  the  following:  –  2.3,4.1,1.2,2.6,4.4,1.9  in  the  control  group.  –  2.1,2.0,1.9,1.6,2.2,0.7  in  the  treatment  group.  

•  Exercise:  –  Es:mate  the  effect  of  the  treatment  and  the  standard  error  on  the  effect  of  the  treatment.  

– Discuss  the  assump:ons  needed  to  es:mate  the  standard  error  of  the  treatment.  

Page 3: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Outline  

1.  Problemo  2.  Mice:  the  Bootstrap  principle  3.  Implementa:on  in  Stata  4.  Theory:  

1.  Es:ma:on  of  the  C.D.F.  2.  Es:ma:on  of  confidence  intervals  by  bootstrap  3.  Improvement  over  the  Central  Limit  Theorem  

5.  Tricky  

Page 4: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Problemo  •  We  used  two  tricks  to  find  confidence  intervals:  –  Either  we  used  the  Central  Limit  Theorem  when  the  number  of  observa:ons  is  large.  

–  Or  we  assumed  normally  distributed  residuals  when  the  number  of  observa:ons  is  fixed  (and  small).  

•  What  if  none  of  these  is  true?  –  The  number  of  observa:ons  is  small,  and  the  residuals  are  not  normally  distributed.  

•  Possibili:es:  –  Assuming  another  distribu:on  for  the  residuals.  Theore:cally  possible  but  super  rare  and  nonstandard.  

–  Or  using  bootstrap.  

Page 5: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Mices  

•  Diabe:c  mices  get  a  treatment.  Their  sugar  level  is  measured  a@er  the  treatment.  

•  The  numbers  are  the  following:  – 2.3,4.1,4.1,1.2,2.6,4.4,1.9  in  the  control  group.  – 2.1,2.0,1.2,1.9,1.6,2.2,0.7  in  the  treatment  group.  

•  Exercise:  – Es:mate  the  effect  of  the  treatment  and  the  standard  error  on  the  effect  of  the  treatment  using  bootstrap.  

Page 6: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

2.3,  C  

4.1,  C  

4.1,  C  

1.2,  C  

2.6,  C  

2.2,  T  

0.7,  T  

2.1,  T  

2.0,  T  

1.2,  T  

1.9,  T  

1.6,  T  

4.4,  C  

1.9,  C  

Page 7: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Mice:  The  Boostrap  Principle  •  Sample  with  replacement  from  the  set  of  observa:ons  of  mices.  

•  Calculate  an  es:mate  b1  of  the  effect  of  the  medica:on  on  mices.  

•  Repeat  this  step  k=1,2,…,K  :mes.  At  each  step,  calculate  bk.  

•  The  2.5  percen:le  of  the  bks  provides  a  lower  bound  for  a  confidence  interval  on  b.  

•  The  97.5  percen:le  of  the  bks  provides  an  upper  bound  for  a  confidence  interval  on  b.  

Page 8: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Implementa:on  in  Stata  •  Use  of  the  bootstrap  command  in  Stata.  •  bootstrap  n(20):  regress  y  x  •  Issues:  – Only  works  with  i.i.d  residuals  (more  on  this  later).  –  Some  bootstrap  replica:ons  may  fail  because  A2  is  not  sa:sfied  for  these  samples.  

•  Upside:  –  Very  versa:le.  Works  with  almost  all  es:ma:on  procedures  in  Stata.  

–  Improves  confidence  intervals  for  i.i.d  residuals  in  small  samples,  beaer  than  the  normal  approxima:on.    

Page 9: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

THEORY,  PART  1:  ESTIMATION  OF  THE  EMPIRICAL  CDF  

Page 10: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Theory:  Es:ma:on  of  the  C.D.F  

•  Recap:  the  c.d.f.  of  a  random  variable  is  a  func:on,  the  probability  that  the  random  variable  is  lower  than  a  given  threshold.  

•  i.i.d  observa:ons  of  X,  {X1,X2,…,XN}  •  C.d.f.  of  X  is  F0(x).  

Page 11: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Examples  of  c.d.f.s  •  The  c.d.f.  of  firms’  earnings  

0

.2

.4

.6

.8

1

Cum

ulat

ive

Pro

babi

lity

-2 -1 0 1 2earnings over book value of equity

Page 12: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Empirical  c.d.f.  •  Empirical  c.d.f:  •  FN(x)  =  (1/N)  Σi  I(x<=Xi)  •  Using  the  law  of  large  numbers,  the  empirical  c.d.f.  converges  point  by  point  to  the  true  c.d.f.  of  X.  

•  Using  the  central  limit  theorem,  the  variance  of  the  es:mate  of  the  empirical  c.d.f.  is  F0(x)(1-­‐F0(x))/N.  

•  (Glivenko-­‐Cantelli  theorem:  empirical  c.d.f  converges  uniformly  almost  surely  to  the  true  c.d.f.  F0).  

Page 13: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

The  Empirical  C.d.f.  

•  Use  the  observa:ons  of  firms’  earnings  X1,X2,…,XN.  

•  Using  these  draws,  create  the  empirical  c.d.f.  •  Result:  – For  each  x,  the  empirical  c.d.f  converges  to  the  true  c.d.f.  of  X  as  the  number  of  draws  becomes  infinitely  large.  

Page 14: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

THEORY,  PART  2:  USING  THE  EMPIRICAL  CDF  TO  APPROXIMATE  THE  DISTRIBUTION  OF  A  STATISTIC  

Page 15: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Drawing  from  the  sample  

•  Drawing  a  sample  with  replacement  from  the  sample  is  iden:cal  to  drawing  from  a  random  variable  whose  c.d.f.  is  the  empirical  c.d.f.  

•  Indeed,  the  probability  of  picking  a  number  lower  than  Xi  is  exactly  equal  to  the  frac:on  of  observa:ons  below  Xi.  

Page 16: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Es:ma:on  of  confidence  intervals  by  bootstrap  

•  Example:  8,9,10,2,1,8,9,5.  – Calculate  the  mean,  and  give  an  es:mate  of  the  variance  of  the  emmean  either    

•  Mean  of  X.  – Empirical  mean  is  m  =  1/N  Σi  Xi.  – The  c.d.f  of  the  mean  is  either  approximated  using  the  normal  distribu:on  (asympto:c  approxima:on),  

– Or  using  the  empirical  c.d.f.  

Page 17: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Tricky  

•  Bootstrap  requires  i.i.d.  draws  from  the  same  random  variable.  –  If  the  observa:ons  are  correlated  (clustering  or  autocorrela:ons),  bootstrap  is  not  valid.  

–  If  the  observa:ons  do  not  have  the  same  distribu:on,  bootstrap  is  not  valid.  

Page 18: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Theory  of  the  bootstrap  

•  Key  ques:ons:  –  Is  the  bootstrap  es:mator  of  a  sta:s:c  a  consistent  es:mator  of  that  sta:s:c?  

–  Is  the  bootstrap  es:mator  beEer  than  the  approxima:on  provided  by  the  Central  Limit  Theorem?  

Page 19: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Theory  of  the  bootstrap  

•  Note,  as  before  Fn  the  empirical  c.d.f  of  the  observa:ons,  and  F0  the  c.d.f.  of  the  observa:ons  (the  true  cdf).  

•  We  are  interested  in  the  distribu:on  of  a  sta:s:c  Tn(X1,X2,…,Xn)  of  the  observa:ons.  This  sta:s:c  is  either:  (i)  an  es:mator  (ii)  a  test  sta:s:c  (iii)  a  quan:ty  of  interest  (a  ra:o  for  instance).    

•  Note  Gn(.,F0)  the  c.d.f  of  the  sta:s:c.  

Page 20: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Asympto:c  approxima:ons  

•  The  usual  technique  used  so  far.  –  For  instance,  we  use  the  asympto:c  normality  of  the  OLS  es:mator  to  es:mate  confidence  intervals.  

•  Principle:  replace  Gn(.,F0)  with  G∞(.,F0),  which  typically  does  not  depend  on  the  underlying  distribu:on  of  the  observa:ons.  –  For  instance,  the  distribu:on  of  the  OLS  es:mator  does  not  depend  on  the  specific  distribu:on  of  the  residuals  and  the  Xs  (only  their  variance-­‐covariance  matrix).  

Page 21: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Bootstrap  approxima:on  

•  The  bootstrap  approxima:on  uses  Gn(.,Fn)  as  the  approxima:on  to  Gn(.,F0).  

•  This  is  equivalent  to:  – Drawing  a  sample  of  the  same  size  from  the  sample,  using  draws  with  replacement.  

– Calcula:ng  K  values  of  the  sta:s:c  by  repea:ng  the  procedure  K  :mes.  

– Calcula:ng  the  empirical  c.d.f.  of  the  sta:s:c.  

Page 22: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Informal  statement  of  the  proper:es  of  bootstrap  

•  For  formal  proofs  see  Horowitz  (1999).  •  Bootstrap  is  consistent  for:  – OLS/IV/Panel  es:mators.  – Maximum  likelihood  and  GMM  es:mators.  –  t  sta:s:cs,  F  sta:s:cs.  

•  Fails  to  be  consistent  es:mate  of  the  cdf  for:  –  The  distribu:on  of  the  maximum/minimum  of  a  sample.  

– Heavy-­‐tailed  distribu:ons  (such  as  Cauchy  distribu:ons).  

Page 23: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Bootstrap  and  the  dangers  of  outliers  

•  Exercise:  –  Calculate  the  mean  and  the  s.e.  of  the  mean  of  the  sample  -­‐870,1,8,0.5,3,4  using  the  central  limit  theorem  approxima:on  and  the  bootstrap  approxima:on.  

– What  is  the  probability  that  the  outlier  -­‐870  is  drawn  more  than  3  :mes  for  all  10  replica:ons  of  a  bootstrap  calcula:on?  

•  Moral:  –  Bootstrap  results  depend  on  the  specific  draws.  Be  careful  with  outliers.    

Page 24: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

BLOCK  BOOTSTRAP  

Page 25: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Block  bootstrap  

•  If  you  believe  there  is  correla:on  of  the  observa:ons  within  firms,  within  an  area,  within  an  industry,  simple  bootstrap  fails  since  observa:ons  are  not  iid.  

•  For  this,  draw  blocks  (e.g.  firms)  of  observa:ons  rather  than  observa:ons.  

•  The  blocks  will  be  independent.  

Page 26: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Block  bootstrap  

•  Divide  the  dataset  into  blocks  j=1,2,…,J,  so  that  each  block  j  has  M  observa:ons,  and  observa:ons  across  blocks  are  not  correlated.  

•  (x11,…x1M),(x21,…,x2M),  …,  (xJ1,…,xJM)  are  iid  draws.  

•  Draw  a  sample  of  J  blocks  with  replacement.  Es:mate  your  effect  b1.  

•  Perform  the  previous  step  k=1,2,…,K  :mes  to  get  es:mates  b1,b2,…,bK.  

Page 27: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Block  bootstrap  exercise  

•  Take  the  dataset  of  firms’  earnings,  calculate  the  mean  of  firms’  dividends,  and  the  standard  error  of  firms’  dividends,  assuming  that  dividends  are  correlated  across  industries.  – Either  using  the  bootstrap  command.  – Or  by  drawing  blocks  yourself.  

Page 28: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

THEORY,  PART  3:  DOES  THE  BOOTSTRAP  IMPROVE  OVER  THE  ASYMPTOTIC  APPROXIMATION?  

Page 29: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Exercise  •  Generate  a  sample  of  Xs  of  size  N  with  Pareto  distribu:on.  –  Pareto  distribu:on  typical  for  income  distribu:ons.  

•  Calculate  the  mean  of  X,  and  the  standard  error  of  the  mean  of  X,  using  bootstrap  and  the  central  limit  approxima:on.  

•  Generate  many  samples  with  the  same  size,  generated  from  the  same  Pareto  distribu:on.  –  Es:mate  the  error  of  the  bootstrap  approxima:on,  the  error  of  the  central  limit  theorem  approxima:on,  and  the  difference  between  the  bootstrap  approxima:on  and  the  central  limit  theorem  approxima:on.  

Page 30: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

CONCLUSION  

Page 31: Bootstrap( - Amine Ouazad · Bootstrap(and(the(dangers(of outliers(• Exercise: – Calculate(the(mean(and(the(s.e.(of(the(mean(of(the(sample870,1,8,0.5,3,4 using(the(central(limit

Prac:cal  advice  •  The  prac:cal  use  of  bootstrap  is  super  simple.  –  Bootstrap  was  discovered  before  the  theory  of  bootstrap  was  wriaen.  It  works  so  well  that  people  started  wri:ng  the  theory  to  explain  why.  

•  The  theory  is  very  hard,  but  you  need  only  to  remember:  –  Bootstrap  works  for  i.i.d.  draws,  for  clustered  samples,  use  block  bootstrap.  

–  Bootstrap  fails  for  non  con:nuous  distribu:ons  and  for  heavy-­‐tailed  distribu:ons  (beware  of  outliers).  

–  If  you  have  wriaen  an  econometric  procedure  where  it  is  hard  to  write  the  closed-­‐form  formula  for  the  standard  error  of  the  coefficients,  use  bootstrap.