goble jisc-digi fest15
TRANSCRIPT
RARE and FAIR Science Reproducibility and Research ObjectsProfessor Carole Goble FREng FBCS
The University of Manchester UK
The Software Sustainability Institute
carolegoblemanchesteracuk
Jisc Digital Festival 9-10 March 2015 ICC Birmingham UK
Knowledge Turning Flow
Barriers to Cure
raquo Access to scientific resources
raquo Coordination and Collaboration
raquo Flow of Information
httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation
[Josh Sommer]
[Pettifer Attwood]
httpgetutopiacom
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware
Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160
Ince et al The case for open computer programs Nature 482 2012
Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
7 studies listed necessary details
26 no access to primary data sets broken links to home websites
31 no sw version parameters exact version of genomic reference
sequence
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practicesldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Knowledge Turning Flow
Barriers to Cure
raquo Access to scientific resources
raquo Coordination and Collaboration
raquo Flow of Information
httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation
[Josh Sommer]
[Pettifer Attwood]
httpgetutopiacom
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware
Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160
Ince et al The case for open computer programs Nature 482 2012
Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
7 studies listed necessary details
26 no access to primary data sets broken links to home websites
31 no sw version parameters exact version of genomic reference
sequence
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practicesldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
[Pettifer Attwood]
httpgetutopiacom
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware
Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160
Ince et al The case for open computer programs Nature 482 2012
Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
7 studies listed necessary details
26 no access to primary data sets broken links to home websites
31 no sw version parameters exact version of genomic reference
sequence
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practicesldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware
Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160
Ince et al The case for open computer programs Nature 482 2012
Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
7 studies listed necessary details
26 no access to primary data sets broken links to home websites
31 no sw version parameters exact version of genomic reference
sequence
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practicesldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware
Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160
Ince et al The case for open computer programs Nature 482 2012
Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
7 studies listed necessary details
26 no access to primary data sets broken links to home websites
31 no sw version parameters exact version of genomic reference
sequence
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practicesldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflows scriptscode librariesservicessystem software infrastructure compilers hardware
Morin et al Shining Light into Black BoxesScience 13 April 2012 336(6078) 159-160
Ince et al The case for open computer programs Nature 482 2012
Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
7 studies listed necessary details
26 no access to primary data sets broken links to home websites
31 no sw version parameters exact version of genomic reference
sequence
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practicesldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Of 50 papers randomly chosen from 378 manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
7 studies listed necessary details
26 no access to primary data sets broken links to home websites
31 no sw version parameters exact version of genomic reference
sequence
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practicesldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practicesldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Software making practicesldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Record and AutomateEverything
recomputationorg
sciencecodemanifestoorg
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Nick D Kim strange-matternet
Norman Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Cross-Institutional e-Laboratory
Scattered parts Subject specific General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
httpmyexperimentorg
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Research Objects
Compound Investigations Research Products
Multi-various ProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Relate (scattered) resources Metadata Objects that carry Research Context
Research Objects
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
bull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nested
bull multi ndashtyped stewarded sited authored
bull span research researchers platforms time
bull cite resolve steward
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data (CKAN for the Farr Commons)
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
ClinicalCodesorg coded patient cohorts exchanged with NHS FARSITE system
MRC funded multi-site collaboration to support safe use of patient and research data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Focus Pivot and ProfileProfile around methods workflows scripts software data figureshellip
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Aggregated Commons Infrastructure
Consistent Comparative Reportingbull Design protocols samples
software modelshellipbull Just Enough Results Modelbull Common and specific elements
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
RO as Instrument Materials Method
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Public data sets
My algorithm
RO Workflow as Instrument
BioSTIF
My data set
Public software
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
regenerate figure
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
1 Science Changes So does the Lab
BioSTIF
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
Uncertainty
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
[Adapted Freire 2013]
transparencydependencies
stepsprovenance
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Provenance ndash the link between doing and reporting
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
The IT Crowd Series 3 Episode 4
The Internet
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
Workflows
Virtual Machines
Portable Packaging
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Metadata Objectsthe secret is the manifesthellip
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Workflow definition
Data (inputs outputs) Parameter configsProvenance log
Hettne et al Structuring research methods and data with the research object model genomics workflows as a case study 2014 httpwwwjbiomedsemcomcontentpdf2041-1480-5-41pdf
myRDM
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Depth and Coverage Profiles
NISO-JATS
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
NISO-JATS
Depth and Coverage Metadata Profiles
Zhao et al 2013
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Method Matters
Make reproducible -gt Born
Be smart about reproducibility
Think Commons not Repository
Best Practices for Scientific Computing httparxivorgabs12100530Stodden Reproducible Research Standard Intl J Comm Law amp Policy 13 2009
RARE amp FAIR Knowledge Turns with Research Objects
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
httpmerchandisethedoctorwhositecouk
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researcher Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Reality Check
Jorge Cham wwwphdcomicscom
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Credit ne AuthorshipCiting what
Research Currencies
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
httpwwwrseacuk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorg
httpmyexperimentorg
httpwwwbioveleuAlan WilliamsNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble CBE FREng FBCS
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
httpwwwmygridorguk