reproducible research - cornell universitybarrett.dyson.cornell.edu › research › group ›...
Post on 05-Jul-2020
4 Views
Preview:
TRANSCRIPT
ReproducibleResearch
LizBageanterb32@cornell.eduCornellUniversity
andCollabora*ve
Outline
1. ScienAficmethodandresearchfailures2. Definingreproducibleresearch3. Strategiesforreproducibility
ObservaAon AskQuesAon
BackgroundResearch
FormHypothesis
DesignExperiment/
Study
CarryOutExperiment/
Study
Data
Analysis
Conclusions
ReportResults
Results
ThescienAficmethod
Sche
maA
ccourtesyofE
rikaMud
rak
1.ScienAficmethodandresearchfailure
ConAnuumofresearchfailure
Disorganiza*on Egregiousbehavior
FailureofintegrityFailureofprocess
Deliberatemanipula.onofdatatogetresults-P-hacking-”Fishingexpedi.ons”
1.ScienAficmethodandresearchfailure
ObservaAon AskQuesAon
BackgroundResearch
FormHypothesis
DesignExperiment/
Study
CarryOutExperiment/
Study
Data
Analysis
Conclusions
ReportResults
Results
P-hacking/fishingexpediAon
P<0.05
1.ScienAficmethodandresearchfailure
ConAnuumofresearchfailure
Disorganiza*on Egregiousbehavior
FailureofintegrityFailureofprocess
Deliberatemanipula.onofdatatogetresults-P-hacking-”Fishingexpedi.ons”
HARK-ing
1.ScienAficmethodandresearchfailure
P-hackyourwaytoscienAficglory
hVps://projects.fivethirtyeight.com/p-hacking/
1.ScienAficmethodandresearchfailure
ObservaAon AskQuesAon
BackgroundResearch
FormHypothesis
DesignExperiment/
Study
CarryOutExperiment/
Study
Data
Analysis
Conclusions
ReportResults
Results
HypothesizingAYerResultsareKnown(HARK-ing)
1.ScienAficmethodandresearchfailure
IsHARK-ingeverokay?
Exploratory Confirmatory
ResearchGoals
• Exploratoryresearch=hypothesisgeneraAon• Confirmatoryresearch=hypothesistesAng
1.ScienAficmethodandresearchfailure
ConAnuumofresearchfailure
Disorganiza*on Egregiousbehavior
FailureofintegrityFailureofprocess
Deliberatemanipula.onofdatatogetresults-P-hacking-”Fishingexpedi.ons”
HARK-ing“Gardenofforkingpaths”
1.ScienAficmethodandresearchfailure
Data
Analysis
ReportResults
Results
Thegardenofforkingpaths(GelmanandLoken,2013)
Thisobserva.onseemsfunny—shouldIthrowitout?
Imputemissingdata?
OtherstudiescontrolforX,somaybeIshouldaddthatin?
ShouldIlogtransformthis?
Logit,probitorlinearprobabilitymodel?
CanwereallyassumethatXisexogenous?Everyoneelsedoes.
Itriedthisthingbutitwasn’tsignificant,doIreportit?Thisdistribu.onlooksfunny—howcanIfixit?
Thoseresultsdidn’tmakesense,shouldIreportthemanyway?Towinsorizeornottowinsorize….
Myinterac.onisn’tsignificant…shouldItakeitout?
Thisobserva.onseemsfunny—shouldIthrowitout?
Imputemissingdata?OtherstudiescontrolforX,somaybeIshouldaddthatin?
ShouldIlogtransformthis?
Logit,probitorlinearprobabilitymodel?
CanwereallyassumethatXisexogenous?Everyoneelsedoes.
Itriedthisthingbutitwasn’tsignificant,doIreportit?
Thisdistribu.onlooksfunny—howcanIfixit?
Thoseresultsdidn’tmakesense,shouldIreportthemanyway?
Towinsorizeornottowinsorize….Myinterac.onisn’tsignificant…shouldItakeitout?
Imputemissingdata?
1.ScienAficmethodandresearchfailure
ConAnuumofresearchfailure
Disorganiza*on Egregiousbehavior
FailureofintegrityFailureofprocess
Deliberatemanipula.onofdatatogetresults-P-hacking-”Fishingexpedi.ons”
HARK-ing“Gardenofforkingpaths”
CodingerrorsPoordocumenta.on
1.ScienAficmethodandresearchfailure
Toavoidtheperilsofthegarden,HARK-ing,P-hacking,andsillymistakes…
• Integrity!-->Behonestwithyourself.• Transparency!-->Behonestwithyourreaders.• Doyoufeelgoodenoughaboutyourdecision-makingprocessestowritethemdownforalltosee?
Reproducibleresearch!
1.ScienAficmethodandresearchfailure
Replicabilityvsreproducibility• Replicability
– EssenAaltothescienAficmethod– repeaAngastudyfromscratchusingnewdata,analystandcode– ifagivenrelaAonshipbetweenXandYistrue,itshouldshowupin
mulAplestudies
2.DefiningReproducibility
ObservaAon AskQuesAon
BackgroundResearch
FormHypothesis
DesignExperiment/
Study
CarryOutExperiment/
Study
Data
Analysis
Conclusions
ReportResults
Results
Replicability2.DefiningReproducibility
Replicabilityvsreproducibility• Reproducibility
– GefngtheexactsameresultasanexisAngstudyusingnewanalyst,butsamedataandcode
– RecentlytractableduetocompuAngandsoYwareadvances
2.DefiningReproducibility
ObservaAon AskQuesAon
BackgroundResearch
FormHypothesis
DesignExperiment/
Study
CarryOutExperiment/
Study
Data
Analysis
Conclusions
ReportResults
Results
Reproducibility2.DefiningReproducibility
Reproducibility
• FacilitatetransparencybycommunicaAngprocedureseasily
• IdenAfyinadvertenterrors• Avoidembarrassment• FacilitatecollaboraAon• SaveAme• GreaterpotenAalforextensionofwork-->higherimpactoverAme
2.DefiningReproducibility
Thepublic/theintegrityofscience
Researchersinyourfield
Reviewers
Colleagues/Coauthors
Youin6months
Younextweek
You!
Whoareyouaccountableto?2.DefiningReproducibility
Whatareweaimingfor?• SufficientdocumentaAontobringanunfamiliaruserupto
speed– Codebook– Readmefile– Variableandvaluelabelsinanalysisdataset– EffecAvecommentsincode
• Asingleclickexecutesyourprojectfromstarttofinish.– Downloading– Reformafng– CleaningandvariableconstrucAon– Analysis– Outputtables,graphs,figures– Reproduciblereport
2.DefiningReproducibility
Howdowegetthere?
• Separatethephasesofdatawork• SystemaAcfileandnamingstructures• EffecAveandorganizedscripAng• Reproduciblereports
2.DefiningReproducibility
Separatephasesofdatawork
1. Dataconversion/cleaning/variableconstrucAon
2. Analysis3. ReportgeneraAon
3.StrategiesforReproducibility
NamingconvenAons• AgreewithyourcollaboratorsonnamingconvenAons.• Humanreadable
– Short,usefulnames– InformaAononcontent
• Machinereadable– Avoidspecialcharacters,spaces,etc
• CamelCase,ALLCAPS,lowercase,alloneword,underscore_between– Consistentnamingtofacilitatesearching
• Defaultordering– DateformatYYYYMMDD– Othernumbers—addleadingzeros
• Nevercallsomething“final”.Itprobablyisn’t.
3.StrategiesforReproducibility
SystemaAcfilestructure
• Mustbecommontoallusers!• ChooseafilestructureandsAcktoit.• Makeskeletonoffolderswhenyoustartaproject.
3.StrategiesforReproducibility
• /dta/original/stataraw/clean/analysis
• /documenta*on/metadata/reports
• /do/cleaning/analysismaster.do
• /output/figures/tables/oldoutput
• /wri*ng/paper1/paper2/notes/olddraYs
• /temp
Variable-ormodule-specificcleanfiles
Copyofread-onlyoriginalfilesexactlyasobtained.DataaYerconversiontoformatofchoice
Dataset(s)youwilluseforanalysis
Any/allcodebooksormetadatarelatedtodataCollecAonofdocumentswherethedatawasused,cited,described
Cleaning,merging,reshaping,variableconstrucAonscriptsAnalysisscripts
ScriptthatsetsuprelaAvefilepathsandcallsallscripts
SeparatefoldersifmulAplepapersusingthesamedata
OpAonalasneededKeepolderversionsofpaper,butgetthemoutoftheway
Keepforreference,ifyouchoose.
Subfoldersdependontypeofproject
GetridofcluVerasyoumakeit
ScripAngAps• Data+Script=ReproducibleOutput• Masterscript:Runsotherscriptsincorrectorder• ModularscripAngvs.onebigfile
– Separatetypesofprocesses(cleaning,analysis)– AvoidrepeaAngblocksofcode:Separateprogramforrepeatedprocesses
• Notes/comments.– Consistentheaders– Usefulcomments,notexpressionsoffeeling
• Clarity>efficiency?Consideryourcollaborators.• Re-runscriptfromthebeginningregularly.Itmustrun!
3.StrategiesforReproducibility
ReproducibleReports
• Integratecodeintotheproseofyourreport• Singlefilethatexecutesallstepsofdataprocessandoutputsafinalpaper
• Knowexactlywhatdatawasusedforanalysis,whatcodemadewhichfigure,etc.
• Disadvantages—learningcurve,iniAalinvestment.
• AlternaAvemethod:Copyandpaste.
3.StrategiesforReproducibility
Avoidresearchfailuresbyimplemen*ngreproducibleresearchtechniquestoimprove
organiza*onandtransparency
1. Separatephasesofresearch2. SystemaAcfilenamingandstructure3. EffecAveandorganizedscripAng4. Reproduciblereports
• PrioriAzeelementsthatareaVainableforyou.
Yourfutureselfthanksyou!
AddiAonalresources
• P-hackyourwaytoscienAficglory!hVps://projects.fivethirtyeight.com/p-hacking/
• GelmanandLoken(2013)GardenofForkingPaths.hVp://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
top related