stas2cal,consul2ng,topic, - university of...

16
Reproducible Research Sta2s2cal Consul2ng Topic Discussion based on ar.cle in AMSTAT News, January 2011: SCIENCE POLICY Reproducible Research Keith A. Baggerly and Donald A. Berry, MD Anderson Cancer Center

Upload: others

Post on 01-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Reproducible  Research  

Sta2s2cal  Consul2ng  Topic  

Discussion  based  on  ar.cle  in  AMSTAT  News,  January  2011:    SCIENCE  POLICY  Reproducible  Research  Keith  A.  Baggerly  and  Donald  A.  Berry,  MD  Anderson  Cancer  Center  

Page 2: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

•  h?p://www.nature.com/nature/focus/reproducibility/  

•  Special  issue.  

Page 3: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Reproducible  Research  (RR)  

1.  Can  another  lab  collect  new  data  and  reproduce  the  findings  that  have  been  published?  •  Perhaps  the  most  common  way  to  think  of  RR.    

2.  Can  another  sta2s2cian  use  the  SAME  DATA  (provided  by  the  author)  and  find  the  same  published  results  and  conclusions?  •  Topic  of  this  ar2cle.    

Page 4: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Why  not  reproducible?  

•  Ethical  issue  –  Perhaps  not  presented  honestly  

•  Vaccine/au2sm,  cold  fusion,  high  voltage  wires/cancer  

•  Technical  issue  –  Coding  issue  –  Data  issue  

•  Should  be  reproducible  if  you  have  these  things…  –  Code  –  Data  –  Analysis  

Page 5: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Raw  Data  ‘normaliza2on’    

(pre-­‐processing)  Analysis  

Data  Explosion  

•  “As  data  sets  become  more  voluminous  and  complex,  RR  is  increasingly  important  because  our  intui2on  fails  in  high  dimensions.”  

– e.g.  Gene  expression  analysis    

Many  choices  

Page 6: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Non-­‐RR  discussed  in  AMSTAT  Ar2cle  

•  Earlier  Published  Report:  Gene  expression  pa?erns  could  predict  pa2ent  response  to  cancer  chemotherapy.  

– Method  descrip2on  in  paper  was  incomplete  –  Tried  to  re-­‐construct  the  analysis  –  Uncovered  basic  errors  (e.g.  mis-­‐labeling)  –  “Bioinforma2c  Forensics”  was  published  

–  Big  consequences  (resigna2on,  retrac2on)  

Page 7: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Responsible  research  

•  Be  well  organized  •  Double-­‐check  data  &  coding  •  Include  ‘spot-­‐checks’  (do  a  few  plots  of  data)  •  Mistakes  can  happen  to  the  best  of  us  – Disconcer2ng  to  have  others  dissect  your  work  – Done  for  accuracy  (not  accusa2on)  – Done  to  evolve  methods  to  the  next  step  

Page 8: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Journal  Editors  Taking  Note  

•  Biosta's'cs  journal  •  Associate  editor  (AE)  of  reproducibility  – Grading  submi?ed  papers  

•  D:  Data  is  available  •  C:  Code  is  available  •  R:  AE  can  re-­‐run  the  code  and  reproduce  the  results  without  much  difficulty  

Peng,  Roger  D.  (2009).  Reproducible  research  and  Biosta2s2cs.    Biosta's'cs,  Vol.  10:  405-­‐408.  

Page 9: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Journal  Editors  May  Also  Be  Skep2cal  of  Forensic  Process  

•  JASA  past  editor  (David  Banks):     “…  my  own  sense  is  that  very  few  applied  papers  are  perfectly  reproducible.    Most  do  not  come  with  code  or  data,  and  even  if  they  did,  I  expect  careful  check  would  find  discrepancies  from  the  publish  paper.    The  reasons  are  innocent:  code  wriFen  by  graduate  students  is  con'nually  tweaked  and  has  sketchy  documenta'on.”    

       (but  he  acknowledges  the  need)    

 “Clearly,  the  problem  of  RR  is  fundamental  to  the  integrity  of  science  and  public  trust…    We  should  be  op'mis'c  about  the  future,  and  proud  of  the  role  that  sta's'cians  have  played  in  crea'ng  it.”  

Page 10: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Submission-­‐Publica2on  Timeline*  

2005   2006   2007  

July  2007   2008   2009  

Submission  

Feb   Apr  

Reviewer  comments  

Re-­‐Submit  

May   Sept  

Accepted  

Appears  

Feb  

*Gene2cs  

Page 11: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Submission-­‐Publica2on  Timeline*  

2005   2006   2007  

July  2007   2008   2009  

Submission  

Feb   Apr  

Reviewer  comments  

Re-­‐Submit  

May   Sept  

Accepted  

Appears  

Feb  

*Gene2cs:  A  Publica2on  of  The  Gene2cs  Society  of  America  

Oct   Aug  Sept  

co-­‐author  Weizmann  Inst.  of  Science  (Israel)  

Johns  Hopkins  

Requests  for  data  

4+  years    for  accessing  

data  

Page 12: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Code/Data  for  the  Public  

•  Be  prepared,  you  never  know  which  ar2cle  will  be  popular  with  requests  – Some  public  databases  for  uploading  

•  Easy  to  follow  organiza2on  is  helpful  for  user  (and  yourself)  

•  readme.txt  •  descrip'on_of_files.txt  

•  Commen2ng  your  code  •  Transparency  

Page 13: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Example  Code  

Master  file  content.  

File  holding  all  func2ons  called  by  master  file.  

DeCook,  R.,  Ne?leton,  D.,  Foster,  C.,  and  Wurtele,  E.S.    (2006).  Iden2fying  differen2ally  expressed  genes  in    unreplicated  mul2ple-­‐treatment  microarray  2mecourse    experiments.  Computa'onal  Sta's'cs  and  Data  Analysis,    Vol.  50:  518-­‐532.  

readme.txt  

We  had  requests  for  data  and  code  from  this  paper  from  2006  –  2010.  

Page 14: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Reading  Other  Author’s  Code  

Design  matrix  for  a  2-­‐sample  (t-­‐test)  permuta2on  test  

######## FDR by permutations choose(6,3) # number of unique permutations for 6 arrays, 3 per group samp.1=expand.grid(0:1,0:1,0:1,0:1,0:1,0:1) # 2^6=64 possible 6-element vectors samp.1=samp.1[apply(samp.1,1,sum)==3,] # subset to 20 rows with exactly 3-1’s samp.1=as.matrix(samp.1) > samp.1

Var1 Var2 Var3 Var4 Var5 Var68 1 1 1 0 0 012 1 1 0 1 0 014 1 0 1 1 0 0

53 0 0 1 0 1 157 0 0 0 1 1 1

….  

Page 15: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

Reading  Other  Author’s  Code  

Read  journal  ar2cles  !  be?er  journal  ar2cle  writer    Read  other’s  R  code  !  be?er  R  coder    Read  other’s  SAS  code  !  be?er  SAS  coder  

Page 16: Stas2cal,Consul2ng,Topic, - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat6220/Class_notes/Reproducible_research.pdfReproducible,Research, Stas2cal,Consul2ng,Topic, Discussion(based(on(ar.cle(in(AMSTAT(News,(January(2011:((SCIENCEPOLICY

References  

Peng,  Roger  D.  (2009).  Reproducible  research  and  Biosta2s2cs.    Biosta's'cs,  Vol.  10:  405-­‐408.  

Baggerly,  K.A.,  Morris,  J.S.,  Edmonson,  S.R.  and  K.R.  Coombes  (2005).    Signal    to  noise:  Evalua2ng  reported  reproducibility  of  serum  proteomic  tests    for  ovarian  cancer.    Journal  Natl  Cancer  Inst,  Vol.  97:  307-­‐309.  

Baggerly,  K.A.  and  D.A.  Berry  (2011).  Science  policy:  Reproducible  research.      AMSTAT  News,  January(403):  16-­‐17.  

Banks,  D.  (2011).  Reproducible  research:  A  range  of  response.    Sta's'cs,    Poli'cs,  and  Policy,  Vol.  2,  Issue  1,  Ar2cle  4.