ismb/eccb 2013 keynote goble results may vary: what is reproducible? why do open science and who...
DESCRIPTION
Keynote given by Carole Goble on 23rd July 2013 at ISMB/ECCB 2013 http://www.iscb.org/ismbeccb2013 How could we evaluate research and researchers? Reproducibility underpins the scientific method: at least in principle if not practice. The willing exchange of results and the transparent conduct of research can only be expected up to a point in a competitive environment. Contributions to science are acknowledged, but not if the credit is for data curation or software. From a bioinformatics view point, how far could our results be reproducible before the pain is just too high? Is open science a dangerous, utopian vision or a legitimate, feasible expectation? How do we move bioinformatics from one where results are post-hoc "made reproducible", to pre-hoc "born reproducible"? And why, in our computational information age, do we communicate results through fragmented, fixed documents rather than cohesive, versioned releases? I will explore these questions drawing on 20 years of experience in both the development of technical infrastructure for Life Science and the social infrastructure in which Life Science operates.TRANSCRIPT
![Page 1: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/1.jpg)
results may vary
reproducibility, open science
and all that jazz
Professor Carole Goble
The University of Manchester, UK
@caroleannegoble
Keynote ISMB/ECCB 2013 Berlin, Germany, 23 July 2013
![Page 2: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/2.jpg)
“knowledge turning”
[Josh Sommer, Chordoma Foundation]
• life sciences• systems biology• translational
medicine• biodiversity• chemistry• heliophysics• astronomy• social science• digital libraries• language
analysis
New Insight
Goble et al Communications in Computer and Information Science 348, 2013
![Page 3: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/3.jpg)
automate: workflows, pipeline & service
integrative frameworks
pool, share & collaborate
web systems
nanopub
semantics & ontologiesmachine readable documentation
scientific software
engineering
CSSE
![Page 4: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/4.jpg)
coordinated execution of services, codes, resourcestransparent, step-wise methodsauto documentation, loggingreuse variants
![Page 5: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/5.jpg)
http://www.seek4science.org
store/organise/link data, models, sops, experiments, publications
explore/annotatedata, models, sops
simulate models
yellow pages, find peers and experts
open and controlledpooling & credit
curation & data mgt support
catalogue and gateway to local and public resourcesAPIs
governance & policies
![Page 6: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/6.jpg)
• PALS
![Page 7: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/7.jpg)
reproducibilitya principle of the scientific method
separates scientists from other researchers and normal people
http://xkcd.com/242/
![Page 8: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/8.jpg)
“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
datasetsdata collectionsalgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardware
Morin et al Shining Light into Black BoxesScience 13 April 2012: 336(6078) 159-160
Ince et al The case for open computer programs, Nature 482, 2012
![Page 9: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/9.jpg)
• Workshop Track (WK03) What Bioinformaticians need to know about digital publishing beyond the PDF
• Workshop Track (WK02): Bioinformatics Cores Workshop,
• ICSB Public Policy Statement on Access to Data
![Page 10: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/10.jpg)
“an experiment is reproducible until another laboratory tries to repeat it.”
Alexander Kohn
hope over experience
even computational ones
![Page 11: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/11.jpg)
hand-wringing, weeping, wailing, gnashing of teeth.
Nature checklist.
Science requirements for data and code availability.
attacks on authors, editors, reviewers, publishers, funders, and just about everyone.
http://www.nature.com/nature/focus/reproducibility/index.html
![Page 12: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/12.jpg)
47/53 “landmark” publications could not be replicated
[Begley, Ellis Nature, 483, 2012]
![Page 13: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/13.jpg)
Nekrutenko & Taylor, Next-generation sequencing data interpretation: enhancing, reproducibility and accessibility, Nature Genetics 13 (2012)
Alsheikh-Ali et al Public Availability of Published Research Data in High-Impact Journals. PLoS ONE 6(9) 2011
59% of papers in the 50 highest-IF journals comply with (often weak) data sharing rules.
![Page 14: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/14.jpg)
Stodden V, Guo P, Ma Z (2013) Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals. PLoS ONE 8(6): e67111. doi:10.1371/journal.pone.0067111
Required as condition of publication
Required but may not affect decisionsExplicitly encouraged may be reviewed
and/or hostedImplied
No mention
Required as condition of publication
Required but may not affect decisionsExplicitly encouraged may be reviewed
and/or hostedImplied
No mention
170 journals, 2011-2012
![Page 15: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/15.jpg)
replication gap
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 142. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
Out of 18 microarray papers, resultsfrom 10 could not be reproduced
Out of 18 microarray papers, resultsfrom 10 could not be reproduced
More retractions: >15X increase in last decadeAt current % > by 2045 as many papers published as retracted
![Page 16: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/16.jpg)
re-compute
replicate
rerunrepeat
re-examine
repurpose
recreate
reuse
restore
reconstruct review
regeneraterevise
recycle
conceptual replication “show A is true by doing B rather than doing A again”verify but not falsify[Yong, Nature 485, 2012]
regenerate the figure
redo
[Lewis Carroll]
“When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.”
![Page 17: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/17.jpg)
reusereproduce
repeat replicate
same experimentsame lab
same experiment
different lab
same experiment
different set up
different experiment
some of same
test
Drummond C Replicability is not Reproducibility: Nor is it Good Science, onlinePeng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
![Page 18: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/18.jpg)
validation assurance meets the needs of a stakeholder
e.g. error measurement, documentation
verification complies with a regulation, requirement, specification, or imposed condition
e.g. a model
science review: articles, algorithms, methodstechnical review: code, data, systems
V. Stodden, “Trust Your Science? Open Your Data and Code!” Amstat News, 1 July 2011
![Page 19: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/19.jpg)
DesignDesign
ExecutionExecution
Result AnalysisResult Analysis
CollectionCollection
PublishPublish
Peer Review
Peer Review
Peer ReusePeer Reuse
defend repeat
review1/certify replicate
review2compare reproduce
transferreuse
* Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
make&run&document report&review&support
PredictionPrediction
Sound
![Page 20: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/20.jpg)
Corbyn, Nature Oct 2012fraud
“I can’t immediately reproduce the research in my own laboratory. It took an estimated 280 hours for an average user to approximately reproduce the paper. Data/software versions. Workflows are maturing and becoming helpful”
disorganisation
Phil Bourne
Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE under review.
inherent
![Page 21: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/21.jpg)
rigour reporting & experimental designcherry picking datamisapplication use of black box software*software misconfigurations, random seed reportingnon-independent bias, poor positive and negative controlsdodgy normalisation, arbitrary cut-offs, premature data triageun-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”
*8% validation Joppa, et al, Troubling Trends in Scientific Software Use SCIENCE 340 May 2013
![Page 22: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/22.jpg)
http://www.nature.com/authors/policies/checklist.pdf
![Page 23: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/23.jpg)
• anyone anything anytime
• publication access, data, models, source codes, resources, transparent methods, standards, formats, identifiers, apis, licenses, education, policies
• “accessible, intelligible, assessable, reusable”
http://royalsociety.org/policy/projects/science-public-enterprise/report/
![Page 24: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/24.jpg)
G8 open data charter
http://opensource.com/government/13/7/open-data-charter-g8
![Page 25: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/25.jpg)
republic of science*
regulation of science
institution coreslibraries
*Merton’s four norms of scientific behaviour (1942)
public services
![Page 26: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/26.jpg)
a meta-manifesto (I)• all X should be available and assessable forever
• the copyright of X should be clear• X should have citable, versioned identifiers• researchers using X should visibly credit X’s creators
• credit should be assessable and count in all assessments
• X should be curated, available, linked to all necessary materials, and intelligible
What’s the real issue?
![Page 27: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/27.jpg)
we do pretty well• major public data repositories• multiple declarations for depositing data• thriving open source community• plethora of data standardisation efforts• core facilities• heroic data campaigns • international and national bioinformatics coordination• diy biology movement
• great stories- Shiga-Toxin strain of E. coli, Hamburg, May 2011, China BGI Open data crowd sourcing effort.
• Oh, wait…University of Münster/University of Göttingen squabble http://www.nature.com/news/2011/110721/full/news.2011.430.html
![Page 28: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/28.jpg)
hard: patient data(inter)national complicationsbleeding heart paternalism
defensive researchinformed consent
fortresses
Kotz, J. SciBX 5(25) 2012
[John Wilbanks]
http://www.broadinstitute.org/files/news/pdfs/GAWhitePaperJune3.pdf
![Page 29: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/29.jpg)
massive centralisation – clouds, curated core facilities
long tail massive decentralisation –investigator held datasets
fragmentation & fragility
a data scarcity at point of delivery
RIP data
quality/trust/utility
Acta Crystallographica section B or C
data/code as first class citizen
![Page 30: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/30.jpg)
we are not bad peoplewe make progress
there was never a golden agethere never is
![Page 31: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/31.jpg)
a reproducibility paradox
big, fast,complicated, multi-step, multi-type multi-field
expectations of
reproducibility
diy publishinggreater access
![Page 32: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/32.jpg)
pretty stories shiny results feedback loop
novel, attention grabbing
neat, only positive
review: the direction of science, the next paper, how I would do it.
reject papers purely based on public data
obfuscate to avoid scrutiny
PLoS and F1000 counter
announce a result, convince us its correct
![Page 33: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/33.jpg)
the scientific sweatshopno resources, time, accountability
getting it published not getting it rightgame changing benefit to justify disruption
![Page 34: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/34.jpg)
citation distortion
Greenberg How citation distortions create unfounded authority: analysis of a citation network. British Medical Journal 2009, 339:b2680.
[Tim Clark]
Micropublications arxive reference
Simkin, Roychowdhury Stochastic modeling of citation slips. Scientometrics 2005, 62(3):367-384.
Clark et al Micropublications 2013 arXiv:1305.3506
![Page 35: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/35.jpg)
independent replication studiesself-correcting science
• hostility• hard• resource
intensive• no funding, time,
recognition, place to publish
• invisible to originators“blue collar science”
John Quackenbush
![Page 36: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/36.jpg)
independent review self-correcting science
• hostility• hard• resource
intensive• no funding, time,
recognition, place to publish
• invisible to originators“blue collar science”
John Quackenbush
![Page 37: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/37.jpg)
“the questions don’t change but the answers do”* • two years time when the paper is written• reviewers want additional work• statistician wants more runs• analysis may need to be repeated• post-doc leaves, student arrives• new data, revised data• updated versions of algorithms/codesquid pro quo citizenship• trickle down theory: more open more use more credit*others might• meta-analysis • novel discovery• other methods
what is the point: “no one will want it”
* Dan Reed
![Page 38: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/38.jpg)
emerging reproducible system ecosystemApp Store needed!
Sweave
ReproZip
instrumented desktop tools hosted servicespackaging and archivingrepositories, cataloguesonline sharing platformsintegrated authoringintegrative frameworks
XworX
![Page 39: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/39.jpg)
![Page 40: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/40.jpg)
![Page 41: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/41.jpg)
integrated database and journal
http://www.gigasciencejournal.com
copy editing computational workflowsfrom 10 scripts + 4 modules + >20 parameters to Galaxy workflows
galaxy.cbiit.cuhk.edu.hk
2-3 months2-3 weeks
[Peter Li]
made reproducible
![Page 42: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/42.jpg)
supporting data reproducibility
Data sets
Analyses
Linked to
Linked to
DOI
DOI
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18>11000 accesses
Open-Code
8 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-PipelinesOpen-Workflows
DOI:10.5524/100038Open-Data
78GB CC0 data
Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/>5000 downloads
Enabled code to being picked apart by bloggers in wiki http://homolog.us/wiki/index.php?title=SOAPdenovo2
[Scott Edmunds]
![Page 43: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/43.jpg)
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
Here is What I Want – The Paper As Experiment
1. User clicks on thumbnail2. Metadata and a
webservices call provide a renderable image that can be annotated
3. Selecting a features provides a database/literature mashup
4. That leads to new papers
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e34[Phil Bourne]
![Page 44: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/44.jpg)
"A single pass approach to reducing sampling variation, removing errors, and scaling de novo assembly of shotgun sequences" http://arxiv.org/abs/1203.4802
http://ivory.idyll.org/blog/replication-i.html
born reproducible
http://ged.msu.edu/papers/2012-diginorm/
[C. Titus Brown]
![Page 45: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/45.jpg)
made reproducible
[Pettifer, Attwood]
http://getutopia.com
![Page 46: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/46.jpg)
![Page 47: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/47.jpg)
The Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
SoftwareRepositories
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-
Based MetadataStandards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
[Phil Bourne]
![Page 48: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/48.jpg)
the neylon equation
Process = Interest
Frictionx
Number peoplereach
Cameron Neylon, BOSC 2013, http://cameronneylon.net/
message #1: lower frictionborn reproducible
![Page 49: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/49.jpg)
4+1 architecture of reproducibility
“development” view“logical” view
“process” view “physical” view
social scenarios
![Page 50: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/50.jpg)
rigourreporting
reassemblyrecognition
reviewreuse
resourcesresponsibility
reskilling
“logical view”
![Page 51: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/51.jpg)
reporting
availability
documentation
![Page 52: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/52.jpg)
observations• the strict letter of the law• (methods) modeller/ workflow makers vs (data)
experimentalists• young researchers, support from PIs• buddy reproducibility testing, curation help• just enough just in time • staff leaving and project ends• public scrutiny, competition• decaying local systems• long term safe haven commitment• funder commitment from the start
![Page 53: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/53.jpg)
(Harris and Miller 2011) (Benkler 2011)
(Thomson, Perry, and Miller 2009)(Nowak 2006)
(Malone 2010)(Lusch, Vargo 2008)
(Wood and Gray 1991)(Roberts and Bradley 1991)
(Shrum and Chompalov 2007)
(Clutton-Brock 2009)(Tenopir et al 2011)(Borgman, 2012)
[Kristian Garza]
![Page 54: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/54.jpg)
scientific ego-systemtrust, reciprocity, collaboration to compete
famecompetitiveadvantage
productivitycredit
adoption kudos
for love
blamescooped uncredited misinterpretation scrutinycostlossdistractionleft behinddependency Fröhlich’s principles of scientific communication (1998)Merton’s four norms of scientific behaviour (1942)
Malone, Laubacher & Dellarocas The Collective Intelligence Genome, Sloan Management Review,(2010)
![Page 55: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/55.jpg)
• local investment – protective
• collective purchasing– share
• sole provider– broadcast
trade
[Nielson] [Roffel]
local asset economieseconomics of scarce prized
commodities
(Harris and Miller 2011)(Lusch, Vargo 2008)
![Page 56: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/56.jpg)
• hugging
• flirting
• voyerism
• inertia
• sharing creep
• credit drift
• local control
• code throwaway
Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012
asymmetrical reciprocity
Borgman The conundrum of sharing research data, JASIST 2012
family
friends
acquaintances strangersrivals
ex-friends
![Page 57: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/57.jpg)
1 0 JA N UA RY 2 0 1 3 | VO L 4 9 3 | N AT U R E | 1 5 9
“all research products and all scholarly labour are equally valued except by promotion and
review committees”
recognition
![Page 58: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/58.jpg)
message #2
citation is like ♥ not $large data providersinfrastructure codes
“click and run”instrument platformsmake credit count
Rung, Brazma Reuse of public wide gene expression data Nature Review Genetics 2012Duck et al bioNerDS: exploring bioinformatics' database and software use through literature mining. BMC Bioinformatics. 2013 Piwowar et al Sharing Detailed Research Data Is Associated with Increased Citation Rate PLoS ONE 2007
visible reciprocity contract
![Page 59: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/59.jpg)
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Workshop: Reproducible Research: Tools and Strategies for Scientific Computing
Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)
![Page 60: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/60.jpg)
in perpetuity
“its not ready yet”, “I need another publication”
shame
“its too ugly”, “I didn’t work out the details”
effort
“we don’t have the skills/resources”, “the reviewers don’t need it”
loss
“the student left”, “we can’t find it”
insecurity
“you wouldn’t understand it”, “I made it so no one could understand it”.
Randall J. LeVeque ,Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM News
![Page 61: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/61.jpg)
the goldilocks paradox
“the description needed to make an experiment reproducible is too much for the author and too little for the reader”
just enough just in time
José Enrique Ruiz (IAA-CSIC)
Galaxy Luminosity Profiling
![Page 62: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/62.jpg)
:reducing the
friction of curation
1. Enrich Spreadsheet Template
2. Use in Excel or OpenOffice
3. Extract and Process
RDF Graph
http://www.rightfield.org.uk
![Page 63: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/63.jpg)
anonymous reuse is hard
nearly always negotiated
![Page 64: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/64.jpg)
reskilling: software making practices
“As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software”
Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a Computational science: ...Error…why scientific programming does not compute.
![Page 65: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/65.jpg)
http://sciencecodemanifesto.org/http://matt.might.net/articles/crapl/
![Page 66: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/66.jpg)
better software
better research
C Titus Brown
Greg Wilson
data carpentry
![Page 67: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/67.jpg)
a word on reinventing
innovation is algorithms and methodology.
rediscovery of profile stochastic context-free grammars
(re)coding is reproducing.reinvent what is innovative.reuse what is utility.
Sean Eddy
author HMMER and Infernal software suites for sequence analysis
Goble, seven deadly sins of bioinformatics, 35.5K viewshttp://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics
![Page 68: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/68.jpg)
message #3 placing value on reproducibility
take action
Execution
Organisation
MetricsCulture
Process
[Daron Green]
![Page 69: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/69.jpg)
(re)assemblyGather the bits together
Find and get the bits
Bits broken/changed/lost
Have other bits
Understand the bits and how to put together
Bits won’t work together
What bit is critical?
Can I use a different tool?
Can’t operate the tool
Who’s job is this?
![Page 70: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/70.jpg)
specialist codes libraries, platforms, tools
service based
(cloud) hosted services
commodity platforms
data collectionscatalogues software
sepositories
my datamy processmy codes
integrative frameworks
gateways
![Page 71: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/71.jpg)
Methods(techniques, algorithms, spec of the steps)
Instruments(codes, services, scripts, underlying libraries)
Laboratory(sw and hw infrastructure, systems software, integrative platforms)
Materials(datasets, parameters, seeds)
Experiment
repeat(re-run)
replicate(regenerate)
reproduce(recreate)
reuse(repurpose/extend)
Setup
Actors
Results
OrigDiff
snapshot spectrum
![Page 72: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/72.jpg)
interactivelocal & 3rd party independent resourcesshielded heterogeneous infrastructures
BioSTIF
materials
method
instruments and laboratory
use workflowscapture the steps
standardised pipelinesauto record of experiment and set-up report & variant reusebuffered infrastructure
![Page 73: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/73.jpg)
use provenance the link between computation and results
static verifiable recordtrack changesrepairpartially repeat/reproducecarry citationcalc data quality/trustselect data to keep/releasecompare diffs/discrepancies
W3C PROV standardPDIFF: comparing provenance traces to diagnose divergence across experimental results [Woodman et al, 2011]
![Page 74: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/74.jpg)
“an experiment is as transparent as the visibility of its steps”
black boxes
closed codes & services, proprietary licences, magic cloud services, manual manipulations, poor provenance/version reporting, unknown peer review, mis-use, platform calculation dependencies
Joppa et al SCIENCE 340 May 2013; Morin et al Science 336 2012
![Page 75: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/75.jpg)
dependencies & changedegree of self-contained preservationopen world, distributed, alien hosted
data/software versions and accessibility hamper replicationspin-rate of versions
[Zhao et al. e-Science 2012]
“all you need to do is copy the box that the internet is in”
![Page 76: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/76.jpg)
portability /
variabilitysameness
availabilityopen
descriptionintelligibility
[Adapted Freire, 2013]
preservation & distributionpackaging
gatherdependencies
capture steps
VM
Reproducibilityframework
![Page 77: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/77.jpg)
packaging bickering
byte execution
virtual machine
black box
repeat
description
archived record
white box
reproduce
data+compute co-location cloud
packagingELIXIR Embassy Cloud
“in-nerd-tia”
![Page 78: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/78.jpg)
big data big compute
community facilitiescloud host costs and confidence
data scalesdump and file
capability
![Page 79: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/79.jpg)
“the reproducible window”all experiments become less reproducible over time
icanhascheezburger.com
how, why and what mattersbenchmarks for codes
plan to preserverepair on demand
description persistsuse frameworks
partial replicationapproximate reproduction
verificationresults may vary
message #4:
Sandve, Nekrutenko, Taylor, Hovig Ten simple rules for reproducible in silico research, PLoS Comp Bio submitted
![Page 80: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/80.jpg)
message #5: puppies aren’t freelong term reliability of hosts
multiple stewardship fragmented
business modelsreproducibility service industry
24% NAR services unmaintained after three years Schultheiss et al. (2010) PLoS Comp
![Page 81: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/81.jpg)
the meta-manifesto• all X should be available and assessable forever• the copyright of X should be clear• X should have citable, versioned identifiers• researchers using X should visibly credit X’s creators• credit should be assessable and count in all assessments• X should be curated, available, linked to all necessary materials, and intelligible
• making X reproducible/open should be from cradle to grave, continuous, routine, and easier
• tools/repositories should be made to help, be maintained and be incorporated into working practices
• researchers should be able to adapt their working practices, use resources, and be trained to reproduce
• cost and responsibility should be transparent, planned for, accounted and borne collectively
• we all should start small, be imperfect but take action. Today.
http://www.force11.org
![Page 82: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/82.jpg)
research is like software
Jennifer Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012
• evolution of a body • fork, pull, merge• subpart different cycles,
stewardship, authors• refactored granularity• software release
practices for workflows, scripts, services, data and articles
• thread the salami across parts, repositories and journals
• chop up and micro-attribute Faculty1000
![Page 83: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/83.jpg)
http://www.researchobject.org/
bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms
http://www.w3.org/community/rosc/
![Page 84: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/84.jpg)
towards a release app store• checklists for
descriptive reproducibility
• packaging for multi-hosted research (executable) components
• exchange between tools and researchers
• framework for research release and threaded publishing using core standards
TT43 Lounge 81
![Page 85: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/85.jpg)
those messages again
• lower friction, born reproducible• credit is like love• take action, use (workflow) frameworks • prepare for the reproducible window • puppies aren’t free
![Page 86: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/86.jpg)
final message
The revolution is not an apple that falls when it is ripe. You have to make it drop.
![Page 87: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/87.jpg)
acknowledgements• David De Roure• Tim Clark• Sean Bechhofer• Robert Stevens• Christine Borgman • Victoria Stodden• Marco Roos• Jose Enrique Ruiz del Mazo• Oscar Corcho• Ian Cottam• Steve Pettifer• Magnus Rattray• Chris Evelo• Katy Wolstencroft• Robin Williams• Pinar Alper• C. Titus Brown• Greg Wilson• Kristian Garza
• Wf4ever, SysMO, BioVel, UTOPIA and myGrid teams
• Juliana Freire• Jill Mesirov• Simon Cockell• Paolo Missier• Paul Watson• Gerhard Klimeck• Matthias Obst• Jun Zhao• Pinar Alper• Daniel Garijo• Yolanda Gil• James Taylor• Alex Pico• Sean Eddy• Cameron Neylon• Barend Mons• Kristina Hettne• Stian Soiland-Reyes• Rebecca Lawrence
![Page 88: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/88.jpg)
Mr Cottam
10th anniversary today!
![Page 89: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/89.jpg)
https://twitter.com/csmcr/status/361835508994813954
[Jenny Cham]
summary
![Page 90: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?](https://reader038.vdocuments.us/reader038/viewer/2022103015/54c67dc74a7959a4368b469a/html5/thumbnails/90.jpg)
Further Information• myGrid
– http://www.mygrid.org.uk• Taverna
– http://www.taverna.org.uk• myExperiment
– http://www.myexperiment.org• BioCatalogue
– http://www.biocatalogue.org• SysMO-SEEK
– http://www.sysmo-db.org• Rightfield
– http://www.rightfield.org.uk• UTOPIA Documents
– http://www.getutopia.com• Wf4ever
– http://www.wf4ever-project.org• Software Sustainability Institute
– http://www.software.ac.uk• BioVeL
– http://www.biovel.eu• Force11
– http://www.force11.org
• http://reproducibleresearch.net• http;//reproduciblescience.org